A collection of things we do, and an educational section for models.

  • Home
    Home This is where you can find all the blog posts throughout the site.

Data strategy for photographers

Posted by on in For photographers

     

    empty rackA popular subject amongst photographers is how to store and secure all the images you've taken. Until not too long ago, we stored data on a rack server and backed it up in very messy ways. Not anymore.

    To the left you can see our network rack with the gaping hole created by pulling out two rows of 20 - 24 drives. What have we done with all the data?

    The cost of storage has come down allowing almost anyone to have inexpensive and secure ways to manage data. We're going to explain how we do it, and also offer up a less expensive alternative.

     

     

    Our data is broken into three categories

    • Current data: Recent shoots, and those still being worked on.

    • Archived working images: Published and retouched images.

    • Archives: All the raw files that came out of the camera.

    Current data

    There are several factors that cause us to maintain our current files the way we do. One is speed, the other is security. We always want two copies of everything, even as we shoot. Backups aren't live so our primary storage is a RAID. We back the RAID up regularly.

    About RAID:

    RAID comes in several flavors including these:

    • RAID 1: Mirrored drives. Everything writes immediately to pairs of drives.
    • RAID 0: Striped drives. By spreading the data over more than one drive, striped arrays read and write faster. In theory, 2 drives are nearly twice as fast, 3 drives - three times, and so on until you reach maximum bandwidth. As the speed increases from the number of drives, so does the chance of failure as losing one drive causes loss of all the data on the entire array. Of course so does the overall storage size.
    • RAID 10: A combination of stripes that are mirrored. These will always be in even sets. You can have a 2 stripe set which will take four drives. 3 will take 6 and so on. This is very secure and not much slower than the striped RAID of half its size in physical disks (speed less overhead). But setting up a large RAID in this way can be expensive.
    • RAID 5: This requires three or more drives and is the most cost effective way to have redundant data. When writing to the drives parity data is created and stored in a way that you can lose any single drive and still have all your data. RAID 5 is slow writing, but fast reading, and faster reading the more drives there are. Until you get to very large RAIDs, the capacity of a RAID 5 array is basically the same as a stripe, less one of the drive's storage capacity (assuming all the drives are the same size).

     

    Arrays: When you create a RAID array, it can use part, or all the drive space. On 1 set of drives we use just 1 array, but on another set, we use 2 and we'll explain why later. Each array can have multiple partitions and volumes, but we use just one of each per array.

    About RAID cards.

    Not all RAIDs are alike. Many motherboards come with RAID on board. These are generally what is called "host" RAID. The thinking comes from the computer's CPU. If all you are doing is striping drives, this can work well, but in a RAID that requires more thinking (something RAID 5 does a lot of) a dedicated RAID controller with its own processing can really speed things up. We use an Adaptec 51205 which is fast and can handle up to 12 separate physical drives. These cards run about $1000 and are plenty fast for a small shop like ours. You can fit 12 drives into a number of cases that cool well and cost little.

    Our main system is set up into three arrays that span two sets of drives

    serverThe system array.

    We don't backup our system as often as our data. For the operating system and software we use four drives in a RAID 10. It's about twice as fast as using a single hard drive, and secure. Frankly, we usually use hand me down drives from our data arrays for this. Next time we rebuild it, we'll probably go to mirrored SSDs for the system.

    The data array and the faster stripe array.

    This is where we start having fun. We have 8 drives that share two arrays. The first and main array is a RAID 5 that spans all 8 drives and nets us about 9 TB, and a smaller stripe array of around 300 MB (we presently have 1.5 TB drives loaded). Update: All drives were upgraded to 2TB Seagte .14s in early 2013 with a very noticeable speed improvement.

    The RAID 5 data array is redundant, but a bit slow. However, it does hold a lot of data.

    The faster stripe array also spans all 8 physical drives. It was created last so it uses the outside of the hard drive platters that spin faster. We primarily use it only for things like swap drives and scratch files. At times, when working on a lot of large image files that get opened and closed a lot, we'll put working copies on there as well. An 8 drive stripe runs really fast reading and writing – as fast or faster than SSDs.

    This gives us the best of both worlds. Speed where we really need it and redundancy for our data. And, it's cost efficient.

    Backing up the RAID.

    seagate-goflex-driveFirst, I'm going to mention that this whole method is fairly new to us. We used to use a rack server with countless small drives that were backed up on tape, and later a variety of external devices. Then we migrated to the RAID system described above but backed it up to an external ESATA RAID. Again, not anymore.

    What's changed is the cost and speed of external drives. Now with USB 3.0 and 3TB external drives costing so little, we just do incremental backups to two 3TB external drives that we bought at Costco. When the data RAID approaches 6TB, we archive more data. More on that soon.

    Archived working data

    Archived working data is different than the archives. This is what we call all the images that have been worked on. This data space is where we keep all the PSD files that get published or otherwise distributed. It's still of a size where we can keep it in a folder on the data RAID so that when someone needs an image that's already been created, it's easily found and accessible. We may soon discontinue that practice; you'll understand why when we describe the newer archives.

    The archives

    We started shooting digital in mass around 2005 when the Canon IDs Mark 2 and the 24MP digital backs came out. Shooting film became much scarcer a couple years later around the time Hasselblad came out with the H3D. Until then we still had quite a few creative and art directors that demanded film. Today we have none, so the amount of data has increased, as has the size of the actual files.

    external drivesAs mentioned, we used to use a rack server with a tape backup. Before then we had images stored on DVDs, literally thousands of them which have since been migrated on to hard drives.

    At some point we started using 1394 drives, but now we use external USB 3.0 drives exclusively. In fact we just recently finished migrating all our old 1394 drives over and have a nice junk pile of them waiting to be washed (secured and permanent deletion of everything on the drive).

    We're about to add 4 more drives to clear up the RAID, but our present archives sits on 10 separate drives and the best part is that they are all hooked up through hubs, and all indexed in Lightroom. Everything is reasonably accessible, and the drives are in sleep mode except when needed. This is why we may be rethinking our archived working file location.

    One important mote: We strongly recommend buying external drives in pairs. They do fail, but they're inexpensive. We keep one set in house, and another offsite.

    A cheaper solution

    We use a lot of storage for mostly being a stills shop. Not everyone has the same storage needs as we do. Based on reading about what other's do online I believe out method will work well for most still photographers including those that do minor amounts of video. So here's the same method at a fraction of the cost for those that shoot less, or have smaller files.

    The computer.

    I'm a firm believer in RAID. Backing up is still required, but when things fail, they always do at the wrong time. Unless you really need the speed, you may be fine using host RAID for both your system and you data. You can pick up two good sized internal drives for just $200 and create a RAID 1 array with them. If your motherboard doesn't have a SATA RAID built in, you can get a host controller for about $100. Get one that supports four drives so that the next time you upgrade drives you can keep all four working.

    Backup.

    Buy an external USB 3.0 drive the same size as the size of just one of the drives you will be using in your mirrored RAID. If you want to be extra safe get two so you can have an offsite backup, or at least a backup of your backup in case you make a grave mistake.

    If you don't have USB 3.0 on your computer, get an add-on card. It's MUCH faster.

    Archives.

    If you already need an archive, buy the external drives in pairs. Archives usually aren't on your computer so if you only have one copy and something goes wrong, your SOL.

    Your total cost: About $400 with two backup drives. Archive drives will set you back about $200 a pair.

    If you only have a laptop.

    Laptops are not a good way for photographers to manage their images. If you don't have a choice, get a small external RAID for it and only keep data on the laptop's drive when you absolutely have to. Laptop drives fail much more often than desktops. If your laptop's fastest external port is USB 2.0, consider an ESATA, USB 3.0 or other adaptor to access the external RAID. External ESATA drives will likely work faster than the one inside your laptop.

    How we make our purchases

    Everyone has favorite brands, we're no different. Ours are because of customer support and our own reliability experiences.

    RAID cards.

    apaptec cardWe only buy Adaptec RAID products. We've strayed before, but always come back. Adaptec is the leader in RAID but more importantly, their technical support is staffed with experts that can help you when you really need it. It can get complicated, and it's not a place to make mistakes. They're also quick and easy to work with if you have any warranty needs.

    Adaptec makes 4 port host controllers that you can find under $100. If you want to get some performance, you can get faster 4 port cards in the $300 range and they continue to go up based on speed and number of drives..

    Drives.

    driveWe use Seagate for both the internal and external drives for two different reasons. In the last ten years we've gone through at least 100 drives in our shop. No matter what brand drives you buy, they do fail more often than anyone would like, but Seagate's tools, tech support and quick ship program seems to be the best. We exclusively buy Seagate drives now.

    We also use Seagate external drives, but that's because the power adaptors are the same from generation to generation and it makes things easier. Their Go-flex bases make things easier as well if you wind up changing interfaces. But frankly, when we made the last decision to upgrade, Costco had them on sale.

    The types of drives.

    We use 7200 RPM consumer SATA drives. The main reason is price. SCSI drives cost quite a bit more and the interface really becomes beneficial in larger more complex RAIDs. Drives with faster spindle speeds cost more, are smaller, and also run hotter. We've noticed a direct correlation between heat and failure rates and are unwilling to make the required environmental adjustments.

    Enterprise level SATA drives don't really make sense either. While rated to last a bit longer their real advantage is in reduced errors when writing many files, very fast. This can cause havoc in mega RAIDs, but we've only experienced that when a drive starts to get old and needs to be replaced.

    We replace our drives about every 24 – 30 months. Our next upgrade will likely be when 3TB drives hit the magic price points. Each time we upgrade we seem to pick up a little performance. Update: We upgraded to all 3TB external drives in fall 2012)

    Cost of drives:

    Consumer drives, both internal and external, seem to run close to the same. The newest and biggest drives usually cost in the high $100s, sometimes breaking $200. That seems to push the next size down to about $130 – $140. Today that's 3TB and 2TB. Update: in early 2013 3TB drives dropped to around $100.

    However, the drives that run $130 to $140 go on sale (in our area at Frys and Costco) pretty regularly for sometimes as low as $99. We usually buy in the $99 - $119 range. Although we did recently splurge with 3TB external drives because the cost per TB wasn't that much higher and we were able to consolidate a lot of old drives into our new archival and backup methods.

    We're seemingly always upgrading and changing our methods. We'd love to hear others.

     LaurensAntoine.com

    Last modified on
    Rate this blog entry:
    2

    Comments