Thursday, January 6, 2011

What is your data range? (Survey)

I quite frequently read news stories about information stored on computers (usually the information is being stolen) and the news story always includes a line that goes something like this, "The file was X MB in size." Where X usually ranges from 20 to 200. The thing is the news story always makes it sound like that is an immense amount of data, and I guess on some scale it is. 200 MB could represent the total personal student data for the entire UNC system. But in a different setting, 200 MB is almost not worth picking up off the floor (kind of like pennies). Some people may see the number 200 MB and say, "Wow that is a lot." but I guess it is a little hard for me to feel impressed by that considering the amount of data that I deal with every day.

Just today I was talking to my adviser and I mentioned how I have 10 GB of memory on my afs space and his response was, "I don't know why they give you so little space, I mean who uses Gigabytes any more?" This is a particularly true sentiment considering the fact that this morning I ran a test code (just to see if I had made any errors) and in the 3 and 1/2 minuets that it ran it made 3.2 GB of data (in a total of 8 output files, so that's 6 files of 320 MB and two of 640 MB). That means that if I kept that up I would be out of memory in less than 11 minutes (it would actually take me about 20 minutes, but that's just a technicality). Of this data I only took one file to use and the rest I just trashed. On a serious simulation I am looking at having 1-2 TB of data, so my threshold for what makes something "a lot of data" is quite high.

So to come back to my original topic, my data range sits somewhere around 100 MB all the way up to several TB, with the MB range being the very low end. So my questions are, what is your data range? What size of files do you typically use? Or when do you start to say, "Wow that's a lot of memory."? I just want to get a sense of the size of files, or just the amount of memory people consider to be "sufficient".

For me I start to say "Wow that's a lot of memory." somewhere around 5 TB. A normal file size fore me is several MB if not more, and a typical amount of data that I use and move about is usually several GB. How about you?


  1. My data range from a few kB to a couple hundred GB, depending on the data. Typically, I deal with 100 MB - 1 GB just because of RAM limitations in manipulating the data. The particular data I am refering to cannot be used on a supercomputer and I don't have easy access to a wrokstation with >100 GB of RAM so I usually reduce the complexity of my data.

  2. I use data sets range from several MB to several GB and often I need to use many at the same time so I can't get my work done unless I have a machine with a minimum of 32 GB or RAM.

  3. Part of my job is to do maintenance on data in three large data centers. These data centers receive a few k of data every time someone clicks on our customer web sites. We have a lot of customers. We handle about 1 trillion hits per quarter and we keep two years or more of customer data. In total there are several Peta bytes of data spread across about 20,000 servers in the 3 data centers. File sizes are on the gigabyte scale and I do maintenance on data sets over a Tera byte in size at a time. Some jobs I launch may take a week or more to finish.

  4. Even my data is in the Terabyte range and I am dealing in names and images not the universe.

  5. I admit, most of my experiments don't require a ton of data-taking. The most data-intensive measurements I make are things like characterizing a gain medium crystal or trying to predict and measure the output profile of a new laser. My computer has a total of about 100 GB of storage space, of which I've only used about 50 GB. I can currently backup all my pertinent data with the 2 GB free from Mozy.

  6. When I worked at LLNL, the NIF generated around 10 TB of diagnostic data with every shot. That info was transferred to the BlueGene supercomputer in the next building over for processing.

    During my MS, I worked a lot with computer vision, so I had many large videos. Processing that data would often result in several GB of data for review, often in the form of other movies and image.

    Now, however, as a control engineer, my data sets are pretty dang small (on the KB level) so most of the space on my current HD is filled up by my iTunes library :-) .

  7. Right now, I've got dark matter only simulatiosn using about 800GB, and 3 simulations with baryons using about 500GB each.

    For my postdoc (assuming I get one...), I plan on creating about 10 TB of data, maybe a bit more. What's a MB again? Oh yeah, that tiny little speck of a tiny portion of my data.


To add a link to text:
<a href="URL">Text</a>