The New Deal, er… Experience

Published by:

Ok, so this is more of a recreational post, but we can have one of these once in awhile, right?  I, like many others, downloaded the much awaited fall XBOX update, dubbed the ‘NXE’, or New XBOX Experience.  I don’t care much for the new avatar system, I could take it or leave it, but it’s not too bad and not really overbearing and ‘in your face’ aside from actually forcing you to create one. If I had to like one thing about it, it’s the fact that my gamer pic is now of the avatar looking heroically upward and to the right, rather than the blue snail that I so entusiastically chose from the anemic default options of the older system.

I haven’t had a lot of time to play with it, but I like the new interface much better than the blades. It opens the functionality up. For example, the ‘Iron Man’ rental from the video marketplace shows up and is visible without being out of place or looking like it’s in a designated ad spot that we’ve all been trained to ignore, whereas before I don’t think most people even realized they could rent movies.

I look forward to trying out the Netflix streaming, too.  I may update this post with my experiences on that, since it seems that most people have been looking forward to that feature the most.

I did run into a bit of a bug, I had previously had it connected to my linux server for streaming my movie collection. It still works, but it had me download the codecs package again, which there was some weirdness with. It acted as though I needed to download it, I got prompted to install it, then it showed it was installed but did not work. Then I went into the prompt to install again, where it was checked as downloaded and installed already. I selected it, and got ‘to use this feature, please launch the game that it was intended for’, or something to that effect. Instead, I opted to reinstall, and that time it actually showed it downloading, installing, and then it worked.

In all, though, I’d say this refresh was a good move.


Invitation from NetApp

Published by:

A few of us from work were invited to meet with Vice Chairman Tom Mendoza from NetApp downtown this morning.  He seems to have sort of a side job being a fairly popular inspirational speaker in the business world. Apparently he stopped by between flights from the east to west coast and the local NetApp folks talked him into holding an informal seminar over breakfast with some of us customers.  There were only 12 or 15 of us, and it was in a lofty conference room with a wonderful city view, which gave us the feeling that it was a special event.

He spoke a lot about NetApp, but I didn’t get the feeling that he was doing it as a salesman so much as he was doing it because they were valid experiences to illustrate his point. He spoke a lot about how they’ve run their business, their culture, and the problems he’s seen in other companies.

He pointed out a few things he’s seen, such as companies saying that people are their greatest resource, but not spending any time in board meetings discussing how they show that as a company.  He talked about how NetApp has a program to let the executives know if you feel that someone has done a good job and given extra effort, and various ways in which they recognize those extraordinary people.  He even talked about the one layoff they had, where 50 of the 70 affected individuals wrote thank-you letters to the company for how they handled it, which was first to let the people know that it wasn’t their fault, that the company had to do it, second to compensate them fairly in their severance, and last to be involved in helping them find jobs elsewhere.  He said that later on, many of the individuals came back.

He also spoke of their business culture, and how they’re more than happy to save their customers money by coming up with new technologies, for example when Oracle asked them for read/write snapshots, creating what they now calll Flex Clone. It allowed Oracle to buy fewer NetApp products, but it made NetApp products better and made them money in the long run.  He spoke about the economic downturn, and how many companies throw their arms around what they’ve got and try to protect it, looking at what they need to cut to stay the same, when they should be meeting, changing, and figuring out new strategies that will allow them to adapt and grow. ‘Either you’re moving forward or you’re moving backward. If you’re standing still then you’re moving backward.’ On this same topic, he spoke about candor and about how it’s crucial to the company, that people shouldn’t be afraid to say what they think, and the productivity that comes along with that.

Last, he spoke about personal goals, and how the majority of people who become successful have them. He detailed a bit about how he thought one should go about managing their goals, and offered to e-mail any of us a more detailed outline  that he’s come up with.

In all, it was a pretty good speech and I’m glad that I went. Part of me did wonder whether it was a roundabout recruiting mechanism, since I’m pretty sure everyone there went away wishing they worked for NetApp, but at the same time  I think some of the information he shared really is valuable if  we implement what we can in our current environments.


Parity in a RAID system

Published by:

In the future I’d like to discuss some tests I’ve done with linux md-raid and hardware raid, as well as write up a few instructional documents on md-raid, but thought it might be interesting to talk a bit about parity before digging into some details on raid 5 performance, caching, and stripe size. This is a pretty basic explanation, but hopefully it will give some insight to those beginner/intermediate folks like me regarding exactly how we’re able to provide fault tolerance and redundancy simply by adding one extra drive to the array.

You may already be familiar with parity or at least heard of it, perhaps in the context of a parity bit, which works in a similar manner but is used for error detection. More on that some other time, perhaps. Most system admins have at least a general idea of what the parity data in a RAID array is, i.e. ‘extra data that can be used to rebuild an array’, but I find it interesting to go through the exercise of exactly how it works.

If you’re only somewhat familiar with how RAID 5 works, you may have at least heard something about XOR calculations or XOR hardware on your controller. XOR is a logic operation, which stands for “exclusive or”, meaning ‘I’ll take this or that but not both’. Without getting too much into what that means, it’s basically just binary addition. In other words, if we were to XOR bits ‘0’ and ‘0’, we’d get a ‘0’ (0 + 0 = 0). If we XOR bits ‘1’ and ‘0’, we get 1. It doesn’t get tricky until we XOR bits ‘1’ and ‘1’, which basically rolls the digit over and we get a binary 2, or ’10’. We’re only interested in the least significant bit, however, so in this case 1 + 1 = 0. Using the following four equations, we should be able to XOR anything we need to:

  • 0 + 0 = 0 (nothing)
  • 1 + 0 = 1 (I’ll take this)
  • 0 + 1 = 1 (I’ll take that)
  • 1 + 1 = 0 (but not both)

Now, let’s apply this to our parity striped raid set. In the following example, we’ll look at a single stripe set across three disks. Two hold data and the third holds parity for that data.

If we lose Stripe 1, we can determine what it held by reversing the equation, or solving ? + 0 = 1.  Likewise, if we lose Stripe 2 we can do the same ( 1 + ? = 1). If we lose the Parity stripe we haven’t really lost any data and can just recalculate parity. Now, the interesting thing about the math is that rather than looking at it as, for example, 1 + ? = 1 when we lose Stripe 2, we can restore Stripe 2 by doing an XOR with Stripe 1 and the Parity stripe, 1(Stripe 1) + 1(Parity) = 0(Stripe 2)

Now, in reality, the stripes are much larger than one single bit as described above, you’ll likely have a stripe of 4 kilobytes up to 256 kilobytes or maybe more, but the mechanics are the same.  Let’s upgrade the previous example to a three bit stripe just to show how it works with multiple bits in a stripe.  Stripe 1 will contain ‘101’, stripe 2 will contain ‘011’. So, to calculate the parity, we match up the bit placements of stripe 1 and 2 and then XOR each set of bits individually. It helps to stack them vertically and work on each column one by one, as the following figure shows:

And again, an example of ‘losing’ stripe 2 and recalculating from parity.

Now, that’s great, but what about four, five, six disk arrays?  Well, once you get the idea of how XOR works, it’s pretty simple. If you understand binary math you can continue to use that to add four, five, six bits together, but another rule of thumb is to think of ‘0’ as even and ‘1’ as odd. For example, 1+1+1, an odd plus an odd plus an odd will be an odd number, so parity equals 1. Another example, 0+1+0+1, an odd plus an even will be odd, plus an even will again be odd, plus an odd will be even, so parity equals 0.  If that gets too confusing, you can simply count how many ‘1’s you have. If you’ve got an odd number of ‘1’s, then parity is 1, if you have an even number of ‘1’s, parity is 0.

These parity calculations are fairly simple, yet interesting for the fact that we can provide protection for the data on any number of disks by simply adding a single disk.  No need to have a full copy of the data. The caveat, of course is that you can only lose 1 disk per parity stripe. Yes, that’s right, you can have dual parity as well, also known as RAID 6.

Another drawback is that while you’re missing one of your disks any stripe sets that had data on that disk will take a performance hit because they’ll be doing XOR calculations to read the data is lost, rather than just reading it from the now lost disk. Stripes that had parity on the lost disk won’t take a performance hit, but it will take some processing time to rebuild the parity once the disk is replaced. These performance hits are going to be an important part of later discussions regarding raid 5 performance.

Probably worst of all, in order to modify data on disk, the system has to read the *entire stripe set* in order to recalculate parity with the changed data, effectively causing a read along with every write. This is where on-card caches can be helpful.

To finish off the exersize, I’ve put together this simple RAID 5 array. Each disk has four stripe sets, each stripe set (in matching color), has three data stripes and one parity stripe. You can see that the parity stripe(???) alternates on to a different disk for each stripe, which is the primary difference between RAID 5 and RAID 4, which places all parity stripes on the same disk. If you’d like to test yourself and see just how your data is protected, first calculate the parity stripe for each set, then cover up a whole column (disk) at random and see if you can reconstruct its contents by doing the math.

As always, if anyone is reading this ;-), feel free to leave your feedback, correct me, whatever.

Sun Troubleshooting

Installed new M4000

Published by:

When installing a new Sun M4000 recently, we got a bit of a surprise when we went to perform a wanboot on it. The wanboot binary downloaded, then the miniroot, and as soon as the miniroot completed, we got the following error:

krtld: load_exec: fail to expand cpu/$CPU
krtld: error during initial load/link phase
panic – boot: exitto64 returned from client program
Program terminated

Some searching led me to this document which is a bug report for OpenSolaris. It seems that at least one cause for this error is an out of date wanboot executable. I downloaded the latest Solaris DVD for sparc, and followed this Sun document regarding creating an updated wanboot executable and miniroot, then copied them to the web server that hosts the wanboot data by moving/renaming the old files and putting the new ones in their place.  After that, we were in business.