What is the largest file transfer you have ever done?

data1701d (He/Him) · 1 year ago

What is the largest file transfer you have ever done?

@[email protected] · 1 year ago

Upgraded a NAS for the office. It was reaching capacity, so we replaced it. Transfer was maybe 30 TB. Just used rsync. That local transfer was relatively fast. What took longer was for the NAS to replicate itself with its mirror located in a DC on the other side of the country.

Random Dent · 1 year ago

Yeah it’s kind of wild how fast (and stable) rsync is, especially when you grew up with the extremely temperamental Windows copying thing, which I’ve seen fuck up a 50mb transfer before.

The biggest one I’ve done in one shot with rsync was only about 1tb, but I was braced for it to take half a day and cause all sorts of trouble. But no, it just sent it across perfectly first time, way faster than I was expecting.

@[email protected] · 1 year ago

Never dealt with windows. rsync just makes sense. I especially like that its idempotent, so I can just run it twice or three times and it’ll be near instant on the subsequent run.

@[email protected] · 1 year ago

Yeah, shout out for rsync also. It’s awesome. Combine it with ssh & it feels pretty secure too.

@[email protected] · 1 year ago

~340GB, more than a million small files (~10KB or less each one). It took like one week to move because the files were stored in a hard drive and it was struggling to read that many files.

@[email protected] · 1 year ago

I’m currently backing up my /dev folder to my unlimited cloud storage. The backup of the file /dev/random is running since two weeks.

@[email protected] · 1 year ago

No wonder. That file is super slow to transfer for some reason. but wait till you get to /dev/urandom. That file hat TBs to transfer at whatever pipe you can throw at it…

Eager Eagle · 1 year ago

That’s silly. You should compress it before uploading.

Norah (pup/it/she) · 1 year ago

Cool, so I learned something new today. Don’t run cat /dev/random

@[email protected] · 1 year ago

Why not try /dev/urandom?

😹

Norah (pup/it/she) · 1 year ago

Ya know, if not for the other person’s comment, I might have been gullible enough to try this…

data1701d (He/Him) · 1 year ago

I’m guessing this is a joke, right?

@[email protected] · 1 year ago

/dev/random and other “files” in /dev are not really files, they are interfaces which van be used to interact with virtual or hardware devices. /dev/random spits out cryptographically secure random data. Another example is /dev/zero, which spits out only zero bytes.

Both are infinite.

Not all “files” in /dev are infinite, for example hard drives can (depending on which technology they use) be accessed under /dev/sda /dev/sdb and so on.

data1701d (He/Him) · 1 year ago

I’m aware of that. I was quite sure the author was joking, with the slightest bit of concern of them actually making the mistake.

@[email protected] · 1 year ago

It was something around 40 TB X2 . We were doing a terrain analysis of the entire Earth. Every morning for 25 days I would install two fresh drives in the cluster doing the data crunching and migrate the filled drives to our file server rack.

The drives were about 80% full and our primary server was mirrored to two other 50 drive servers. At the end of the month the two servers were then shipped to customer locations.

@[email protected] · 1 year ago

A few years back I worked at a home. They organised the whole data structure but needed to move to another Providor. I and my colleagues moved roughly just about 15.4 TB. I don’t know how long it took because honestly we didn’t have much to do when the data was moving so we just used the downtime for some nerd time. Nerd time in the sense that we just started gaming and doing a mini LAN party with our Raspberry and banana pi’s.

Surprisingly the data contained information of lots of long dead people which is quiet scary because it wasn’t being deleted.

🐍🩶🐢 · 1 year ago

No idea about which specific type of business it is, but keeping that history long term can have some benefits, especially to outside people. Some government agencies require companies to keep records for a certain number of years. It could also help out in legal investigations many years in the future and show any auditors you keep good records. From a historical perspective, it can be matched to census, birth, and death certificates. A lot of generational history gets lost.

Companies also just hoard data. Never know what will be useful later. shrug

@[email protected] · 1 year ago

80GB, it was 8 hours of (supposedly) 4k content in the MP4 format. https://www.youtube.com/watch?v=VF5JWdaJlvc Here’s the link (hoping for the piped bot to appear).

@[email protected] · 1 year ago

I work in cinema content so hysterical laughter

@[email protected] · 1 year ago

Interesting! Could you give some numbers? And what do you use to move the files? If you can disclose obvs

@[email protected] · edit-2 1 year ago

A small dcp is around 500gb. But that’s like basic film shizz, 2d, 5.1 audio. For comparison, the 3D deadpool 2 teaser was 10gb.

Aspera’s commonly used for transmission due to the way it multiplexes. It’s the same protocolling behind Netflix and other streamers, although we don’t have to worry about preloading chunks.

My laughter is mostly because we’re transmitting to a couple thousand clients at once, so even with a small dcp thats around a PB dropped without blinking

@[email protected] · 1 year ago

Ahhh thanks for the reply! Makes sense! We also use Aspera here at work (videogames) but dont move that ammount, not even close.

@[email protected] · 1 year ago

I used to work in the same industry. We transferred several PBs from West US to Australia using Aspera via thick AWS pipes. Awesome software.

@[email protected] · edit-2 1 year ago

Hahahah did you enjoy Australian Internet? It’s wonderfully archaic

(MPS, Delux, Gofilex or Qubewire?)

@[email protected] · 1 year ago

In the early 2000s I worked on an animated film. The studio was in the southern part of Orange County CA, and the final color grading / print (still not totally digital then) was done in LA. It was faster to courier a box of hard drives than to transfer electronically. We had to do it a bunch of times because of various notes/changes/fuck ups. Then the results got courier’d back because the director couldn’t be bothered to travel for the fucking million dollars he was making.

@[email protected] · 1 year ago

Fucking hell the raws woulda been gigantic

@[email protected] · 1 year ago

You legally have to tell us if that movie was Shrek.

@[email protected] · 1 year ago

Hah, nope. Shrek was made in Glendale, so they probably had everything on site or right next door.

Random Dent · 1 year ago

Oh yeah I worked in animation for a bit too. Those 4K master files are no joke lol

@[email protected] · 1 year ago

Eh, what’s a dcp?

@[email protected] · 1 year ago

Digital Cinema Package. Films come out in a buncha files that rather resemble a dvd rip. You got your video files (still called reels!) and your audio files, maybe some subtitle files and other bits and pieces and your assetmap (list of files) all in a big fat folder collectively called a DCP

@[email protected] · 1 year ago

Digital Cinema Package; basically the movie file you’re watching when you’re in a movie theater.

Random Dent · 1 year ago

Here ya go!

@[email protected] · 1 year ago

That article was a weird mix of insider info and wild inaccuracies

Random Dent · 1 year ago

Oh sorry! Here ya go!

Joelk111 · edit-2 1 year ago

When I was moving from a Windows NAS (God, fuck windows and its permissions management) on an old laptop to a Linux NAS I had to copy about 10TB from some drives to some other drives so I could re-format the drives as a Linux friendly format, then copy the data back to the original drives.

I was also doing all of this via terminal, so I had to learn how to copy in the background, then write a script to check and display the progress every few seconds. I’m shocked I didn’t loose any data to be completely honest. Doing shit like that makes me marvel at modern GUIs.

Took about 3 days in copying files alone. When combined with all the other NAS setup stuff, ended up taking me about a week just in waiting for stuff to happen.

I cannot reiterate enough how fucking difficult it was to set up the Windows NAS vs the Ubuntu Server NAS. I had constant issues with permissions on the Windows NAS. I’ve had about 1 issue in 4 months on the Linux NAS, and it was much more easily solved.

The reason the laptop wasn’t a Linux NAS is due to my existing Plex server instance. It’s always been on Windows and I haven’t yet had a chance to try to migrate it to Linux. Some day I’ll get around to it, but if it ain’t broke… Now the laptop is just a dedicated Plex server and serves files from the NAS instead of local. It has much better hardware than my NAS, otherwise the NAS would be the Plex server.

calm.like.a.bomb · 1 year ago

so I had to learn how to copy in the background, then write a script to check and display the progress every few seconds

I hope you learned about terminal multiplexers in the meantime… They make your life much easier in cases like this.

@[email protected] · 1 year ago

30 years with Linux and I know I still haven’t. Maybe this year? :-D

@[email protected] · edit-2 1 year ago

I don’t remember how many files, but typically these geophysical recordings clock in at 10-30 GB. What I do remember, though, was the total transfer size: 4TB. It was kind of like a bunch of .segd, and they were stored in this server cluster that was mounted in a shipping container for easy transport and lifting onboard survey ships. Some geophysics processors needed it on the other side of the world. There were nobody physically heading in the same direction as the transfer, so we figured it would just be easier to rsync it over 4G. It took a little over a week to transfer.

Normally when we have transfers of a substantial size going far, we ship it on LTO. For short distance transfers we usually run a fiber, and I have no idea how big the largest transfer job has been that way. Must be in the hundreds of TB. The entire cluster is 1.2PB, bit I can’t recall ever having to transfer everything in one go, as the receiving end usually has a lot less space.

data1701d (He/Him) · 1 year ago

4G?! That strikes fear into my heart!

@[email protected] · 1 year ago

The alternative was 5mbit/s VSAT. 4G was a luxury at that time.

@[email protected] · 1 year ago

At the rates I’m paying for 4G data, there are very few places in the world where it wouldn’t be cheaper for me to get on a plane and sneakernet that much data

@[email protected] · 1 year ago

Why would dd have a limit on the amount of data it can copy, afaik dd doesn’t check not does anything fancy, if it can copy one bit it can copy infinite.

Even if it did any sort of validation, if it can do anything larger than RAM it needs to be able to do it in chunks.

data1701d (He/Him) · 1 year ago

It’s less about dd’s limits and more laughs the fact that it supports units that might take decades or more for us to read a unit that size.

@[email protected] · 1 year ago

Not looking at the man page, but I expect you can limit it if you want and the parser for the parameter knows about these names. If it were me it’d be one parser for byte size values and it’d work for chunk size and limit and sync interval and whatever else dd does.

Also probably limited by the size of the number tracking. I think dd reports the number of bytes copied at the end even in unlimited mode.

Random Dent · 1 year ago

Well they do nickname it disk destroyer, so if it was unlimited and someone messed it up, it could delete the entire simulation that we live in. So its for our own good really.

@[email protected] · 1 year ago

No, it can’t copy infinite bits, because it has to store the current address somewhere. If they implement unbounded integers for this, they are still limited by your RAM, as that number can’t infinitely grow without infinite memory.

@[email protected] · 1 year ago

Multiple TB when setting up a new server to mirror an existing one. (Did an initial copy with both together in the same room, before moving the clone to a physically separate location. Doing that initial copy would saturate the network connection for a week or more otherwise)

@[email protected] · 1 year ago

I once moved ~5TB of research data over the internet. It took days and unfortunately it also turned out that the data was junk :/

Possibly linux · 1 year ago

While I haven’t personally had to move a data center I imagine that would be a pretty big transfer. Probably not dd though.

Random Dent · 1 year ago

I can’t imagine how nerve-wracking it would be to run dd on something like that lol. I still don’t trust myself to copy a USB stick with my unimportant bullshit on it with dd, let alone a server with anything important on it!

Presi300 · 1 year ago

I’ve imaged an entire 128GB SSD to my NAS…

@[email protected] · 1 year ago

In the middle of something 200tb for my Plex server going from a 12 bay system to a 36 LFF system. But I’ve also literally driven servers across the desert because it was faster than trying to move data from one datacenter to another.

@[email protected] · 1 year ago

That’s some RFC 2549 logic, right there.

Norah (pup/it/she) · 1 year ago

Just thinking about how much data you could transfer using this. MicroSD cards makes it a decent amount. Latency would be horrible, but throughput could be pretty good I think.

@[email protected] · 1 year ago

Amazon Snowball will send you a semi truck.

@[email protected] · 1 year ago

Packet loss would be quite costly though

data1701d (He/Him) · 1 year ago

Which desert? I’ve lived in the desert my entire life.

@[email protected] · 1 year ago

From LA to Vegas. Took the servers down end of business one night, drove it all night, installed it and got it back online before start of business the next day.

data1701d (He/Him) · 1 year ago

As an ex-Vegas resident, I have to ask: why were you moving stuff to Vegas?

@[email protected] · 1 year ago

It’s got a hell of a datacenter.

https://www.switch.com/las-vegas/