Sleepys Is Amazing, aka The Power of Twitter

About a year and a half ago we went shopping for a bed. We went to the Sleepy’s in Deer Park and tried out a bunch of the beds. They had some computer where you lay on it and it recommends firmness, and we ended up getting a Kingsdown based partly on the recommendations of this computer (it was also a really comfy mattress). I forget the model of the bed but it had different firmness on my side and my wife’s, which seemed like a cool feature.

Fast-forward to now. The bed is no longer comfortable. It feels actually like sleeping on concrete. I’ve been sleeping on the couch (an Ikea Ektorp) because that’s actually less painful than the bed. And between the two sides of the bed is a solid piece of styrofoam, which with each side sagging has become sort of a mountain down the center of the bed.

So there’s some background info. In a moment of mild despair I asked this on Twitter:

https://twitter.com/EvanHoffman/status/295291753156710400

My Internet buddy Jon Hayden responded, and we had a brief Twitter conversation about beds, both of us having purchased ours at Sleepy’s. (Twitter’s embedded tweet thing appears to include the original tweet in an @reply, so this looks like of strange.)

https://twitter.com/Jon_Hayden/status/295349033747243008

https://twitter.com/EvanHoffman/status/295351522580119552

https://twitter.com/EvanHoffman/status/295521937290051585

The next day (Jan 27th) I got this reply from Deana Ramos, @SleepysCare, to my surprise:

Being a cynical guy, here was my response:

https://twitter.com/EvanHoffman/status/295746715942797312

Today’s 1/29 and I was planning to write an email to Sleepy’s to see what (if anything) they could do, but my phone rang and the caller ID came up “Sleepys LLC.” Uh oh…

Well, I spoke to a rep from Sleepy’s and sure enough, she was calling in reference to my Tweet. She assured me that Sleepy’s is very concerned about customer satisfaction, apologized for the trouble, and said she would credit my full purchase price toward another mattress – and when I’ve picked one out at a showroom and am ready to purchase it, I should call her directly for discounted pricing. As I told her on the phone, I was STUNNED. At a high level I was pretty impressed they managed to call me based only on my Twitter name, but it IS my actual name, but I think that’s pretty resourceful of them. But on another level, WOW. I mean, that computer at Sleepy’s was part of the reason we bought the mattress, and of course the salesman wants to sell, but I’m a firm believer in caveat emptor – if you buy something and don’t like it, too bad. Plus it’s been almost two years since I bought it. This is really amazing. Assuming this really works out, I’ll be a Sleepy’s customer for life. Amazing!

XFS write speeds: software RAID 0/5/6 across 45 spindles

We’re currently building a new storage server to store low-priority data (tertiary backups, etc). One of the requirements for the project is that it needs to be on cheap storage (as opposed to expensive enterprise SAN/NAS). After some research we decided to build a Backblaze pod. Backblaze used 3TB Hitachi drives in their system, but the ones they listed in their blog post are discontinued and the reviews for all other 3TB+ drives were terrible, so we went with Samsung ST2000DL004 2TB 7200 RPM drives. Like Backblaze, we’re going with software raid, but I figured a good first step would be to figure out what RAID level we want to use, and if we want to use the mdadm/LVM mish-mosh Backblaze uses, or find something simpler. For my testing I created a RAID6 of all 45 drives and created a single XFS volume (XFS’s size limit is ~8 exabytes vs ext4’s 16TB). Ext4 may present some performance advantages, but the management overhead is probably not worth it in our case.

So, this is just a simple benchmarking comparing RAID 0 (stripe with no parity) as a baseline, RAID5 (stripe with 1 parity disk) and RAID6 (stripe with 2 parity disks) across 45 total spindles. For all tests I used Linux software RAID (mdadm).

To test, I have 3 scripts, makeraid0.sh, makeraid5.sh, and makeraid6.sh. Each one does what its name implies. The raid0 has 43 disks, raid5 has 44 disks, and raid6 has 45 disks, so there are 43 “data” disks in each test. The system is a Protocase “Backblaze-inspired” system with a Core i3 540 CPU, 8 GB memory, CentOS 6.3 x64, and 45x We’re just using this box for backup and it gives us about 79 TB usable, which is still plenty, so 2TB isn’t a big problem.

makeraid?.sh for filesystem creation:

#!/bin/bash

mdadm --create /dev/md0 --level=raid6 -c 256K --raid-devices=45 
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde 
/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj 
/dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo 
/dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt 
/dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy 
/dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad 
/dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai 
/dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan 
/dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas

Filesystem:

[root@Protocase ~]# mkfs.xfs -f /dev/md0
meta-data=/dev/md0               isize=256    agcount=79, agsize=268435392 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=21000267072, imaxpct=1
         =                       sunit=64     swidth=2752 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@Protocase ~]# mount /dev/md0 /raid0/
[root@Protocase ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdat2            289G  3.2G  271G   2% /
tmpfs                 3.9G  260K  3.9G   1% /dev/shm
/dev/sdat1            485M   62M  398M  14% /boot
/dev/md0               79T   35M   79T   1% /raid0

RAID0

[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.1944 s, 416 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.1922 s, 416 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 24.7665 s, 423 MB/s

RAID5

[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.2239 s, 416 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 24.7427 s, 424 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 24.2434 s, 433 MB/s

RAID6:

[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.9032 s, 390 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.5255 s, 395 MB/s
[root@Protocase ~]# rm -f /raid0/zeros.dat 
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.4338 s, 397 MB/s

I found it pretty strange that RAID5 seemed to outperform RAID0, but I tested it several times and RAID5 averaged 10-15 MB/s faster than RAID0. Maybe a bug in the kernel? I tried other block sizes ranging from 60KB to 4MB for dd but the results were pretty consistent. In the end it looks like I’m going to go with RAID6 of 43 drives + 2 hotspares, which still yields ~400 MB/s throughput and 75 TB usable:

#!/bin/bash

mdadm --create /dev/md0 --level=raid6 -c 256K -n 43 -x 2 
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde 
/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj 
/dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo 
/dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt 
/dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy 
/dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad 
/dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai 
/dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan 
/dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas

Update: A coworker suggested looking into write-intent bitmap to improve rebuild speeds. After adding a 256 MB-chunked bitmap, the write performance didn’t degrade much, so this looks like a good addition to the configuration:

[root@Protocase ~]# mdadm -G --bitmap-chunk=256M --bitmap=internal /dev/md0
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.8157 s, 406 MB/s
[root@Protocase ~]# rm -fv /raid0/zeros.dat
removed `/raid0/zeros.dat'
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.4233 s, 397 MB/s
[root@Protocase ~]# rm -fv /raid0/zeros.dat
removed `/raid0/zeros.dat'
[root@Protocase ~]# dd if=/dev/zero of=/raid0/zeros.dat bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 26.2593 s, 399 MB/s
[root@Protocase ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdat2            289G  3.2G  271G   2% /
tmpfs                 3.9G   88K  3.9G   1% /dev/shm
/dev/sdat1            485M   62M  398M  14% /boot
/dev/md0               75T  9.8G   75T   1% /raid0
[root@Protocase ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdas[44](S) sdar[43](S) sdaq[42] sdap[41] sdao[40] sdan[39] sdam[38] sdal[37] sdak[36] sdaj[35] sdai[34] sdah[33] sdag[32] sdaf[31] sdae[30] sdad[29] sdac[28] sdab[27] sdaa[26] sdz[25] sdy[24] sdx[23] sdw[22] sdv[21] sdu[20] sdt[19] sds[18] sdr[17] sdq[16] sdp[15] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
      80094041856 blocks super 1.2 level 6, 256k chunk, algorithm 2 [43/43] [UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
      bitmap: 2/4 pages [8KB], 262144KB chunk

unused devices: