Sunday, April 27, 2008

Intel NAS RAID Malfunction = Kill Me Now

An event that I wouldn't wish on my most hated enemy happened to me about two weeks ago. Since the initial shock and sinking into the black hole of despair feeling is starting to wear off, I can start to figure out where to go from here. Before I resorted to therapeutic counseling and mind altering prescription drugs, I wanted to give the blog a try first and see if it made me feel better. So, let me tell you about what happens when a seemingly impenetrable RAID 5 NAS goes bonkers, and hopefully we can all learn a few lessons and I can document my torment in the process.

I have a lot of data: My wife and I both edit video -- stuff for the family, some weddings, other projects, etc. All of our pictures, music, and various other documents are all digital-only. I handle our personal finances paperless; everything gets scanned and shredded. Most of my software gets imaged and the license keys get put into text files (which is why I've taken to digital distribution and Steam so well). In short, a good part of my life exists as 0's and 1's.

This is not strange to me, though. Many people might be uncomfortable with such an arrangement, but I find (found?) it to be liberating. It's easily accessible, takes up far less physical space, and is overall very convenient. But, of course there always looms that dark cloud of ...wait for it... DATA LOSS. FAILURE. THE END OF LIFE AS WE KNOW IT. And yes, I felt that I was very aware of the risk, but being the kind of person that is comfortable with technology and is willing to take the time to understand and implement (and pay for) an advanced solution, I turned to what I thought was an obvious product that could address my storage needs...

...The Intel SS4000-E NAS. A few years ago when this product came out, I was working in the Channel dealing very heavily in Intel products. I worked with it heavily around it's launch period, was very excited about the technology, attended quite a few events around the product through work which gave me the opportunity to interact directly with the team that was bringing it to market, and did very well subsequently selling the NAS into the Channel. As anyone is sales knows, its very easy to sell a product you are personally excited about, and that was true here. It wasn't, and isn't, the only product like this out there, but there's a certain quality and level of service that is associated with dealing with an Intel branded product line. That's essentially how they market themselves, and it's certainly a powerful message. How great is it to have that value-add in such a sensitive area as data storage!

Naturally, I thought this product would be the perfect solution for my own situation. I thought that a dedicated storage unit, with its own power supply, cooling, and controller, hot-swappable and easy to use, and protected by a hardware RAID 5, would be the answer to my problems. I dropped over a $grand$ at the time for unit and the drives, and ended up with a solid 1.5TB array that I could access from any computer on my network and stream content (even edit video) right off of it. It worked without issue for 2 years and I thought my investment was well spent.

Fast forward to two weeks ago, and disaster struck, silently. It started when my wife told me she couldn't get to her data, and a quick check showed me that I couldn't access the shared storage at all from my computer either (we have separate computers -- essential for harmony in our house, and makes a central storage unit that much more necessary). Using the storage console software, I found that I could see the storage unit on the network, but the mapped drives just weren't showing up. After a few standard reboots all around, just to make sure everything was kosher, I went into the admin tool to see what I could find. I have had drives fail before on this, and know what it looks like when the array is degraded. But, there was nothing like that. I was presented instead with a screen that was indicating there was NO ARRAY present at all. Completely confused and starting to catch on real fast that there was a problem, I pressed the only option on the screen: "Start".

The rest is a blur. "Start" in this case was a reseting of the BIOS and firmware, and within seconds, the system was re-initialized and the array completely nuked. I've since learned that I ran into a known issue : "Loss of System Access Issue". I was able to get into contact with Intel a few days after that weekend, and I was told that if I hadn't touched anything, they would have been able to fix it. At this point, there was nothing they could do. The workaround to prevent the problem would have been either to never turn off the unit, or to flash the firmware (which destroys your data).

So, screw-ups all around can defeat even an external RAID 5 NAS. I have to wonder what the answer is, then? I think it's a legitimate question to wonder if this product couldn't keep my data safe, than what can? The Intel rep I was in contact with asked me, "You mean you don't have copies?" Well, crap. It's 1.5TB of data... Should I have 2 NAS units? Burn a few hundred DVD's? I'm out of answers on this one. Maybe if some shmuck didn't press the "Start" button, I'd be fine? I'll give you that, but then if some other shmuck hadn't given me a error that leads me to a one-button option into oblivion, I wouldn't be here either. I personally feel that I'm a pretty savvy user in general (I am in the industry, professionally, after all), so I have to think there are more than a few other people who would run into this same scenario, be catastrophically affected, then simply be told that there is nothing that can be done. It makes one ever so slightly angry to think that the $ was spent for nothing but a ticking time bomb, just waiting to self destruct. I found the overall inability for Intel to feel responsibility or give further help rather disconcerting. The experience will stick with me.

I have to say I haven't given up yet, actually. I ran some web searches and was met by the apparently very healthy and alive data recovery industry. Part of their services seem to include reconstruction of re-initialized RAID arrays. I need to take some time to learn about the industry so I can make some educated decisions, but after some initial conversations, I've gotten quotes from $2500 to $17K by people who think the data is completely recoverable and a run-of-the-mill situation. That's comforting, because since it's a RAID 5 and I was protected from hardware failure, I take solace in knowing that somewhere in there, my life still exists. Even if it's hopelessly jumbled into a mass of digital nonsense, I know it's still there! So, I need to figure out now if the risk of sending these off is worth it, how much I can actually afford to pay to get it all back, and if there is some chance I can figure out how to do it myself without making the whole thing worse.

The moral of the story? The world is a dark place. None of you are safe. It could happen to you, mwahahahahaha! (This is where I set to making my dark villain costume and turn to a life of crime, anguish, and torment, bent on repaying the world for wronging me, yet harming only the innocent masses, become a victim to hubris, then am eventually defeated by a caped mutant do-gooder to finally find myself in a dark asylum on an island, longing for the pleasant days of yore.)

I'll post an update later on after I figure out whether I should sell myself and become an indentured servant to pay for this, or if I gamble my life and marriage on doing it myself.

---
11/25/08 - Six months later, I've posted my conclusion to this twisted drama: "Data Recovery Hacks, Kiss My NAS"

Labels:

 

11 Comments:

At 4/27/08 6:53 PM , Blogger Perrin said...

I was considering buying one of the new units (42000-e).

Would you make the purchase again? I may need to think about buying two and just investing 2500$ in two systems.

Could you have mirrored the systems as is?

 
At 4/27/08 7:03 PM , Blogger Ed Borden said...

I know the new units are a lot faster, and the feature set looks a lot nicer.

Would I make the purchase again? I don't even know - I guess what's my other option? Part of my dilemma here is that I don't know how the heck to be totally safe. I mean, at the consumer level, if you're talking $2500, that is a heck of a purchase. But yeah, if you're willing to do it, that certainly helps - then you've made the controllers/software redundant. Maybe that's the answer.

I was just unprepared for an issue LIKE THIS that would turn into such a fiasco, then a lack of real help from Intel.

I'm sure you can rig these up to mirror the data. If not through the software that comes with it, then with third party software.

 
At 4/27/08 7:14 PM , Anonymous Perrin said...

I will have to say that I do work for the folks that the -E signifies and they seem to be pricing this box pretty competitively.

The addition of the esata port on the new model has huge potential if you could sync them over that rather than saturating your ethernet.

I may end up with one in the next two weeks and if I do I will make sure to post about it on my small slice of cyberspace. Thanks for the quick reply

 
At 4/27/08 7:38 PM , Blogger Ed Borden said...

I'm not sure that that type of functionality is actually possible over a SATA bus. I guess they could rig it up to mirror the array to whatever on the other end of the SATA cable, but in that case I don't think you'd be able to connect to a controller on the other end, and I think that'd be the end of the whole idea. Interesting, though... These DO have two ethernet ports on them, though, so maybe you could rig it up to keep the mirror function on a different network entirely.

If you work for EMC, you should be a lot closer to getting answers to this stuff than me!

 
At 4/27/08 7:54 PM , Blogger Perrin said...

It is a very BIG company. I asked some questions but I don't personally know anyone working for the LifeLine product group.

The indication was that mirroring was not included with this feature set although I think you could do it with Retrospect.

 
At 4/28/08 12:33 PM , Anonymous Jon Bach said...

I think the key is to have duplicate data, which yep...is just more cost. But more than that, I think you need to have the second set offsite. I'm not following my own advice -- my enormous set of baby pictures and videos (and he's only a year old!) is not redundantly stored, but everything else is mirrored to a hard drive on my work computer. I don't feel the offsite storage needs to be another RAID5 NAS -- it can just be a 1TB hard drive or two. I use a program called Foldershare (foldershare.com) to do the mirroring in realtime, and it has worked well for me. Of course, I haven't encountered even a hard drive failure yet, so my setup isn't exactly battle-tested.

 
At 4/28/08 1:03 PM , Blogger Ed Borden said...

I suppose that's the answer: more $$$. If it was one hard drive, it'd be simpler. It's kind of annoying, though, because the point of a RAID 5 is to protect from physical failure -- so now I've got to have even more hard drives floating around, which is more time, $, and hassle to keep the mirror updated and think about putting it somewhere off-site. What a mess.

 
At 4/28/08 1:30 PM , Blogger Perrin said...

I think tiered storage is the way to go. We all have too much data to send it offsite.

Prioritizing it and sorting it is key. Office documents/PDFs are the most critical for me followed by pictures from vacations and family.

Music and Video is a close third but they are just too large to manage nicely.

 
At 4/28/08 2:14 PM , Blogger Melissa said...

Aw, sorry to hear about your loss. :-(

As someone who can use a 2G thumb drive for a complete backup and has no critical data to speak of, I'm afraid I'm having a hard time feeling your pain. But I'm sure I'll end up there eventually, and will owe you and people like you a debt of gratitude for pioneering a workable solution. Good luck my friend.

 
At 4/29/08 7:04 PM , Anonymous Anonymous said...

Why not use one of those online storage services? I'm sure the cost will easily be less than having a NAS at your house.

IMO this makes total sense for pictures. Do your video editing on your machine and then upload it to the online storage service.

Let someone have the burden.

 
At 5/1/08 12:16 AM , Anonymous Charlie said...

I'm sorry for your loss. *bows head and makes a cross motion*

I'm no expert on NAS so I'm mostly speculating from idealistic theory here but when you break it down, in RAID configurations (save for raid0) the safest part is the actual hard drives and the data. With this in mind, wouldn't it follow to make sure that the raid configuration is stored the same way as the data. I'm not completely sure if yours is set up like this but I *believe* that 3Ware cards for desktops work this way allowing you to simply swap them if they have a problem. Backing up ALL the data to a separate location seems like unneeded redundancy to me when all you lost in this case was the configuration, not your actual data. It seems to me (speculation, remember.) there are two small things that NAS (and any other RAID solution) manufacturers can do to prevent this sort of thing.

They are:

a) Store a backup of configuration data at various checkpoints available for restore, much like Windows' rollback or Apple's Time-machine. This configuration data won't take up much space at all but would be invaluable in situations like this.

b) Fix the interface. If someone as savvy as you was confused and led to click a button that was labeled "Start" but which actually meant "Effectively end my life as I know it" That is miscommunication. Bad NAS. Bad.

That was much longer than I intended. Freaking fingers just won't stop! :P

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home


Ed Borden is an entrepreneur at the crossroads of tech and gaming.  More About Me, Email Me




Subscribe in a Reader

Subscribe by Email


Recent Comments



What I'm Playing
  1. Day of Defeat

  2. Braid

  3. The Path

  4. Mirror's Edge


On My List
  1. Starcraft 2

  2. Zeno Clash

  3. ARMA 2

  4. Prototype

  5. Killing Floor

  6. Sins of a Solar Empire

  7. Empire Total War

  8. Red Alert 3

  9. Alpha Protocol

  10. Warhammer: Dawn of War 2

  11. Dragon Age: Origins

  12. Rage

  13. Demigod



View more posts by topic: