Seekmark Update – 0.9

Published by:

Version 0.9 gains some functionality that allows it to be used as a quick and dirty random I/O generator. Seekmark does a good job of pounding your disk as hard as possible in all-or-nothing fashion, but now you can specify a delay to insert between seeks to reduce the load, in order to simulate some scenario. For example you may want to test performance of some other application while the system is semi-busy doing random I/O on a database file, or you may want to test shared storage between multiple hosts where one host has 4 processes doing 64k random reads every 20ms and another host has 2 processes, one doing busy 4k random writes as fast as possible and the other doing 128k reads every 50ms.

With this, is a new -e option, that runs seekmark in endless mode. That is, it will simply run until it’s killed.

As usual, see the seekmark page, linked at the top of the blog.


Ceph and RBD benchmarks

Published by:

Ceph, an up and coming distributed file system, has a lot of great design goals. In short, it aims to distribute both data and metadata among multiple servers, providing both fault tolerant and scalable network storage. Needless to say, this has me excited, and while it’s still under heavy development, I’ve been experimenting with it and thought I’d share a few simple benchmarks.

I’ve tested two different ‘flavors’ of Ceph, the first I believe is referred to as “Ceph filesystem”, which is similar in function to NFS, where the file metadata (in addition to the file data) is handled by remote network services and the filesystem is mountable by multiple clients. The second is a “RADOS block device”, or RBD. This refers to a virtual block device that is created from Ceph storage. This is similar in function to iSCSI, where remote storage is mapped into looking like a local SCSI device. This means that it’s formatted and mounted locally and other clients can’t use it without corruption (unless you format it with a cluster filesystem like GFS or OCFS).

If you’re wondering what RADOS is, it’s Ceph’s acronym version of RAID. I believe it stands for “Reliable Autonomous Distributed Object Store”. Technically, the Ceph filesystem is implemented on top of RADOS, and other things are capable of using it directly as well, such as the RADOS gateway, which is a proxy server that provides object store services like that of Amazon’s S3. A librados library is also available that provides an API for customizing your own solutions.

I’ve taken the approach of comparing cephfs to nfs, and rbd to both iscsi and multiple iscsi devices striped over different servers. Mind you, Ceph provides many more features, such as snapshots and thin provisioning, not to mention the fault tolerance, but if we were to replace the function of NFS we’d put Ceph fs in its place; likewise if we replaced iSCSI, we’d use RBD. It’s good to keep this in mind because of the penalties involved with having metadata at the server; we don’t expect Ceph fs or NFS to have the metadata performance of a local filesystem.

  • Ceph (version 0.32) systems were 3 servers running mds+mon services. These were quad core servers, 16G RAM. The storage was provided by 3 osd servers (24 core AMD box, 32GB RAM, 28 available 2T disks, LSI 9285-8e), each server used 10 disks, one osd daemon for each 2T disk, and an enterprise SSD partitioned up with 10 x 1GB journal devices. Tried both btrfs and xfs on the osd devices, for these tests there was no difference. CRUSH placement defined that no replica should be on the same host, 2 copies of data and 3 copies of metadata. All servers had gigabit NICs.
  • Second Ceph system has monitors, mds, and osd all on one box. This was intended to be a more direct comparison to the NFS server below, and used the same storage device served up by a single osd daemon.
  • NFS server was one of the above osd servers with a group of 12 2T drives in RAID50 formatted xfs and exported.
  • RADOS benchmarks ran on the same two Ceph systems above, from which a 20T RBD device was created.
  • ISCSI server was tested with one of the above osd servers exporting a 12 disk RAID50 as a target.
  • ISCSI-md was achieved by having all three osd servers export a 12 disk RAID50 and the client striping across them.
  • All filesystems were mounted noatime,nodiratime whether available or not. All servers were running kernel 3.1.0-rc1 on centos 6. Benchmarks were performed using bonnie++, as well as a few simple real world tests such as copying data back and forth.


The sequential character writes were cpu bound on the client in all instances; the sequential block writes (and most sequential reads) were limited by the gigabit network. The Ceph fs systems seem to do well on seeks, but this did not translate directly into better performance in the create/read/delete tests. It seems that RBD is roughly in a position where it can replace iSCSI, but the Ceph fs performance needs some work (or at least some heavy tuning on my part) in order to get it up to speed.

It will take some digging to determine where the bottlenecks lie, but in my quick assessment most of the server resources were only moderately used, whether it be the monitors, mds, or osd devices. Even the fast journal SSD disk only ever hit 30% utilization, and didn’t help boost performance significantly over the competitors who don’t rely on it.

Still, there’s something to be said for this, as Ceph allows storage to fail, be dynamically added, thin provisioned, rebalanced, snapshots, and much more, with passable performance, all in pre-1.0 code.  I think Ceph has a big future in open source storage deployments, and I look forward to it being a mature product that we can leverage to provide dynamic, fault-tolerant network storage.












SeekMark Update

Published by:

I just updated SeekMark to include a write seek test. I was initially reluctant to do this, because nobody would ever want to screw up their filesystem by performing a random write test to the disk it resides on, right?? Of course not, but occasionally you need to benchmark a disk, for the sake of benchmarking, and aren’t worried about the data. And of course, I didn’t care about that functionality until I needed it myself!

So here we have version 0.8 of SeekMark, which adds the following features:

  • write test via “-w” flag, with a required argument of “destroy-data”
  • allows for specification of io size via the “-i” flag, from 1byte to 1048576 bytes (1 megabyte). The intended purpose of the benchmark (which is to test max iops and latency) is still best fulfilled by the default io size of 512, but changing the io size can be useful in certain situations.
  • added “-q” flag per suggestions, which skips per-thread reporting and limits output to the result totals and any errors that possibly arise

Now head on over to the SeekMark page and get it!

FractMark Performance


Published by:

High quality, dual screen 3840×1080 desktop backgrounds, fresh from my fractmark utility. These are large PNG files, because I’m a sucker for detail.

-y -.6003 -x-.367 -X -.3658 -l 60000



Published by:

I just added a new page for SeekMark, a little program that I put together recently to test the number of random accesses/second to disk. It’s threaded and will handle RAID arrays well, depending on the number of threads you select. I’m fairly excited about how this turned out, it helped me prove someone wrong about whether or not a particular RAID card did split seeks on RAID1 arrays. The page is here, or linked to at the top of my blog, for future reference.  I’d appreciate hearing results/feedback if anyone out there gives it a try.

Here are some of my own results, comparing a linux md raid10, 5 disk array against the underlying disks. I’ll also show the difference in the results that threading the app made:

single disk, one thread:

  [root@server mlsorensen]# ./seekmark -t1 -f/dev/sda4 -s1000
  Spawning worker 0
  thread 0 completed, time: 13.46, 74.27 seeks/sec

  total time: 13.46, time per request(ms): 13.465
  74.27 total seeks per sec, 74.27 seeks per sec per thread

single disk, two threads:

  [root@server mlsorensen]# ./seekmark -t2 -f/dev/sda4 -s1000
  Spawning worker 0
  Spawning worker 1
  thread 0 completed, time: 27.29, 36.64 seeks/sec
  thread 1 completed, time: 27.30, 36.63 seeks/sec

  total time: 27.30, time per request(ms): 13.650
  73.26 total seeks per sec, 36.63 seeks per sec per thread

Notice we get pretty much the same result, about 74 seeks/sec total.

5-disk md-raid 10 on top of the above disk, one thread:

  [root@server mlsorensen]# ./seekmark -t1 -f/dev/md3 -s1000
  Spawning worker 0
  thread 0 completed, time: 13.09, 76.41 seeks/sec

  total time: 13.09, time per request(ms): 13.087
  76.41 total seeks per sec, 76.41 seeks per sec per thread

Still pretty much the same thing. That’s because we’re reading one small thing and waiting for the data before continuing. Our test is blocked on a single spindle!

four threads:

  [root@server mlsorensen]# ./seekmark -t4 -f/dev/md3 -s1000
  Spawning worker 0
  Spawning worker 1
  Spawning worker 2
  Spawning worker 3
  thread 1 completed, time: 15.02, 66.57 seeks/sec
  thread 2 completed, time: 15.46, 64.69 seeks/sec
  thread 3 completed, time: 15.57, 64.24 seeks/sec
  thread 0 completed, time: 15.69, 63.74 seeks/sec

  total time: 15.69, time per request(ms): 3.922
  254.96 total seeks per sec, 63.74 seeks per sec per thread

Ah, there we go. 254 seeks per second. Now we’re putting our spindles to work!