Do You Need a SANdwich?? Part II
So, drumroll…here are the results of testing:
In the quantitative testing I tested against an internal hard drive (formatted with ext3 and then reiserfs), a partition on the Coraid formatted with ext3 and a partition on the Coraid formatted with reiserfs. I also tested with a failed hard drive in the Coraid (while it was rebuilding).
/dev/sda reiserfs (mirrored)
| /testing (sdb) reiserfs write (Kbytes/s) | /testing (sdb) reiserfs read (Kbytes/s) |
| 17408 | 64646 |
| 17118 | 64368 |
| 17140 | 64339 |
| 16347 | 64451 |
| 17080 | 64510 |
| 17055 | 64464 |
| 17024.67 (AVG) | 64463 (AVG) |
/dev/sdb1 ext3
| /testing (sdb) ext3 write (Kbytes/s) | /testing (sdb) ext3 read (Kbytes/s) |
| 41196 | 54154 |
| 40455 | 54078 |
| 40612 | 54246 |
| 40620 | 54229 |
| 41053 | 54160 |
| 40198 | 54217 |
| 40698 (AVG) | 54180.67 (AVG) |
/dev/sdb1 reiserfs
| /testing (sdb) reiserfs write (Kbytes/s) | /testing (sdb) reiserfs read (Kbytes/s) |
| 37041 | 34897 |
| 36559 | 34650 |
| 37201 | 34823 |
| 36867 | 34487 |
| 36420 | 34835 |
| 35064 | 34788 |
| 36525.33 (AVG) | 34746.67 (AVG) |
/dev/sdb1 reiserfs (noatime,notail)
| /testing (sdb) reiserfs write (Kbytes/s) | /testing (sdb) reiserfs read (Kbytes/s) |
| 37116 | 35243 |
| 36916 | 35246 |
| 36587 | 35051 |
| 36467 | 35058 |
| 37102 | 34999 |
| 36690 | 35221 |
| 36813 (AVG) | 35136.33(AVG) |
Coraid w/ext3 filesystem, switch and no jumbo frames
| /data – ext3 write (Kbytes/s) | /data – ext3 read (Kbytes/s) |
| 50266 | 59311 |
| 46901 | 77020 |
| 46478 | 76883 |
| 48248 | 76896 |
| 49829 | 76969 |
| 49381 | 74925 |
| 48517.17 (AVG) | 73667.33 (AVG) |
Coraid w/reiserfs filesystem, switch and no jumbo frames
| /data2 – reiserfs write (Kbytes/s) | /data2 – resierfs read (Kbytes/s) |
| 59311 | 73519 |
| 58776 | 74742 |
| 52935 | 73526 |
| 59458 | 75665 |
| 56600 | 75375 |
| 60726 | 75814 |
| 57967.67 (AVG) | 74773.5 (AVG) |
Coraid w/reiserfs filesystem, no switch and jumbo frames
| Jumbo Frames | Direct Connect |
| /data2 – reiserfs write (Kbytes/s) | /data2 – resierfs read (Kbytes/s) |
| 63894 | 99924 |
| 62547 | 96795 |
Coraid w/reiserfs filesystem, switch and jumbo frames
| Jumbo Frames | Switch Connect |
| /data2 – reiserfs write (Kbytes/s) | /data2 – resierfs read (Kbytes/s) |
| 62328 | 95355 |
| 64189 | 98888 |
| 60961 | 96272 |
| 63776 | 95773 |
| 64258 | 97866 |
| 62483 | 98469 |
| 62999.17 (AVG) | 97103.83 (AVG) |
Coraid w/ext3 filesystem, switch and jumbo frames
| Jumbo Frames | Switch Connect |
| /data - ext3 write (kbytes/s) | /data - ext3 read (Kbytes/s) |
| 58595 | 101549 |
| 58145 | 98725 |
| 58344 | 100478 |
| 59689 | 102201 |
| 59244 | 101434 |
| 63951 | 99432 |
| 59661.33 (AVG) | 100636.67 (AVG) |
Coraid w/ext3 filesystem, switch, jumbo frames and degraded raid
| Jumbo Frames | Switch Connect |
| /data - ext3 write (kbytes/s) | /data - ext3 read (Kbytes/s) |
| 12102 | 23846 |
| 27402 | 24224 |
| 3916 | 28248 |
| 15560 | 45066 |
| 38592 | 53528 |
| 30365 | 30616 |
| 21322.83 (AVG) | 34254.67 (AVG) |
Coraid w/reiserfs filesystem, switch, jumbo frames and degraded raid
| Jumbo Frames | Switch Connect |
| /data2 - reiserfs write (kbytes/s) | /data2 - reiserfs read (Kbytes/s) |
| 38317 | 42851 |
| 42463 | 43326 |
| 31546 | 51028 |
| 32177 | 24707 |
| 33981 | 26320 |
| 34643 | 27002 |
| 35521.17 (AVG) | 35872.33 (AVG) |
Coraid w/ocfs2 filesystem, switch and jumbo frames
| Jumbo Frames | Switch Connect |
| /data3 - ocfs2 write (kbytes/s) | /data3 - ocfs2 read (Kbytes/s) |
| 36849 | 83958 |
| 40973 | 84855 |
| 38983 | 83853 |
| 40748 | 85188 |
| 39143 | 83710 |
| 40535 | 83146 |
| 39538.5 (AVG) | 84118.33 (AVG) |
Qualitative Testing Results
For the qualitative testing I loaded up a stock binary install of MySQL 5.0. The database (icengine3_2) was loaded from a dump of a production server. I used mybench to execute a query against the database pulling random records from one of the tables. The number of connections utilized are included below with the summary data for each test run. To establish a baseline I ran the test against the MySQL server pulling data off the interal drives (mirrored reiserfs) and a single drive dedicated only to MySQL data. Then I ran the tests against data loaded onto ext3 and reiserfs Coraid partitions. Nothing else was happening to the the Coraid. This would be similar to a single server attached to the Coraid. Then I ran the test against both the ext3 and reiserfs partitions on the Coraid with a throughput test being performed on the partition not being used to hold the mysql data. As an example: if the ext3 partition was holding the data for the test than the reiserfs partition of the Coraid had a throughput (both read and write) test being performed at the same time. This would more closely simulate what it would be like with multiple servers attached to different partitions of the Coraid. Finally, these tests were repeated while one drive of the Coraid was being rebuilt.
/dev/sda drive w/reiserfs (mirrored)
| 10 connections | 20 connections | 40 connections | 100 connections | 400 connections |
| 2429 | 2862 | 2736 | 2636 | 2417 |
| 2997 | 2874 | 2747 | 2631 | 2416 |
| 3081 | 2851 | 2758 | 2634 | 2414 |
| 3069 | 2841 | 2759 | 2646 | 2413 |
| 3104 | 2869 | 2750 | 2640 | 2417 |
| 3085 | 2862 | 2773 | 2631 | 2415 |
| 3097 | 2853 | 2771 | 2632 | 2410 |
| 3068 | 2855 | 2755 | 2629 | 2414 |
| 3078 | 2858 | 2767 | 2621 | 2412 |
| 3113 | 2842 | 2753 | 2641 | 2420 |
| 3012 (AVG QPS) | 2856 (AVG QPS) | 2756 (AVG QPS) | 2634 (AVG QPS) | 2414 (AVG QPS) |
/dev/sdb1 drive w/reiserfs (noatime, notail)
| 10 connections | 20 connections | 40 connections | 100 connections | 400 connections |
| 2236 | 2876 | 2773 | 2641 | 2416 |
| 2944 | 2886 | 2776 | 2640 | 2417 |
| 3097 | 2872 | 2764 | 2644 | 2428 |
| 3085 | 2874 | 2772 | 2639 | 2420 |
| 3110 | 2862 | 2755 | 2636 | 2422 |
| 3117 | 2883 | 2780 | 2647 | 2425 |
| 3092 | 2859 | 2773 | 2639 | 2422 |
| 3101 | 2880 | 2783 | 2644 | 2418 |
| 3143 | 2837 | 2755 | 2641 | 2427 |
| 3147 | 2861 | 2762 | 2640 | 2418 |
| 3007 (AVG QPS) | 2869 (AVG QPS) | 2769 (AVG QPS) | 2641 (AVG QPS) | 2421 (AVG QPS) |
/dev/sdb1 drive w/ext3
| 10 connections | 20 connections | 40 connections | 100 connections | 400 connections |
| 2294 | 2840 | 2758 | 2636 | 2422 |
| 2986 | 2853 | 2750 | 2633 | 2413 |
| 3059 | 2857 | 2742 | 2634 | 2415 |
| 3089 | 2867 | 2769 | 2629 | 2418 |
| 3064 | 2852 | 2747 | 2629 | 2416 |
| 3080 | 2854 | 2765 | 2633 | 2416 |
| 3083 | 2848 | 2751 | 2626 | 2417 |
| 3058 | 2852 | 2740 | 2617 | 2409 |
| 3043 | 2841 | 2754 | 2640 | 2416 |
| 3096 | 2894 | 2747 | 2633 | 2418 |
| 2985 (AVG QPS) | 2855 (AVG QPS) | 2752 (AVG QPS) | 2631 (AVG QPS) | 2416 (AVG QPS) |
Coraid w/reiserfs filesystem, switch, jumbo frames
| 10 connections | 20 connections | 40 connections | 100 connections | 400 connections |
| 3170 | 2873 | 2754 | 2625 | 2410 |
| 3183 | 2862 | 2769 | 2630 | 2413 |
| 3169 | 2870 | 2759 | 2608 | 2410 |
| 3143 | 2867 | 2751 | 2621 | 2410 |
| 3165 | 2881 | 2739 | 2622 | 2411 |
| 3172 | 2900 | 2746 | 2621 | 2420 |
| 3195 | 2908 | 2758 | 2628 | 2436 |
| 3176 | 2923 | 2760 | 2631 | 2438 |
| 3223 | 2899 | 2761 | 2630 | 2439 |
| 3189 | 2920 | 2754 | 2625 | 2436 |
| 3178 (AVG QPS) | 2890 (AVG QPS) | 2755 (AVG QPS) | 2624 (AVG QPS) | 2422 (AVG QPS) |
Coraid w/ext3 filesystem, switch, jumbo frames
| 10 connections | 20 connections | 40 connections | 100 connections | 400 connections |
| 3061 | 2849 | 2714 | 2572 | 2361 |
| 3070 | 2812 | 2715 | 2573 | 2364 |
| 3051 | 2824 | 2711 | 2577 | 2363 |
| 3060 | 2831 | 2695 | 2576 | 2367 |
| 3026 | 2819 | 2713 | 2561 | 2364 |
| 3072 | 2822 | 2707 | 2567 | 2362 |
| 3088 | 2813 | 2706 | 2564 | 2363 |
| 3031 | 2815 | 2702 | 2566 | 2363 |
| 2966 | 2845 | 2704 | 2581 | 2365 |
| 3092 | 2832 | 2699 | 2568 | 2355 |
| 3051 (AVG QPS) | 2826 (AVG QPS) | 2706 (AVG QPS) | 2570 (AVG QPS) | 2362 (AVG QPS) |
Coraid w/ext3 filesystem, switch, jumbo frames
These tests were run while iozone was running against the reiserfs partition on the Coraid.
| 10 connections | 20 connections | 40 connections | 100 connections | 400 connections |
| 2989 | 2837 | 2749 | 2637 | 2418 |
| 3077 | 2850 | 2747 | 2633 | 2421 |
| 3093 | 2857 | 2750 | 2629 | 2420 |
| 3137 | 2870 | 2746 | 2635 | 2417 |
| 3083 | 2880 | 2743 | 2631 | 2422 |
| 3015 | 2853 | 2745 | 2640 | 2420 |
| 2938 | 2857 | 2749 | 2640 | 2424 |
| 3087 | 2849 | 2747 | 2635 | 2418 |
| 3063 | 2890 | 2749 | 2646 | 2421 |
| 2748 | 2841 | 2764 | 2639 | 2423 |
| 3023 (AVG QPS) | 2858 (AVG QPS) | 2748 (AVG QPS) | 2636 (AVG QPS) | 2420 (AVG QPS) |
Coraid w/ext3 filesystem, switch, jumbo frames
These tests were run while iozone was running against the reiserfs partition on the Coraid. In addition the Coraid had a degraded array.
| 10 connections | 20 connections | 40 connections | 100 connections | 400 connections |
| 3053 | 2844 | 2726 | 2621 | 2446 |
| 3060 | 2845 | 2728 | 2624 | 2421 |
| 3061 | 2837 | 2746 | 2636 | 2420 |
| 3117 | 2828 | 2734 | 2640 | 2419 |
| 3018 | 2855 | 2740 | 2625 | 2421 |
| 3001 | 2934 | 2731 | 2623 | 2420 |
| 3177 | 2792 | 2753 | 2626 | 2425 |
| 3006 | 2889 | 2744 | 2619 | 2428 |
| 3001 | 2844 | 2752 | 2622 | 2418 |
| 3052 | 2861 | 2733 | 2614 | 2412 |
| 3054 (AVG QPS) | 2852 (AVG QPS) | 2738 (AVG QPS) | 2625 (AVG QPS) | 2423 (AVG QPS) |
Conclusions
I think this conclusively proves that the Coraid is capable of handling multiple servers while maintaining reasonable throughput. Even with a degraded drive the actual (read) performance of the MySQL server does not suffer. Heavy write performance to a database would suffer as the write throughput during a rebuild of an array is roughly half of the normal throughput. As for the question about using reiserfs or ext3 as a filesystem the performance numbers are close enough that it would be wise to consider that reiserfs has better functionality under the LVM system that we use on our hard drives. Currently there are six drives (five active) in the Coraid. Increasing the drive count in the coraid will also improve the throughput. According to the numbers released by Coraid it should be a fairly dramatic increase (on the order of 100% faster write performance with a full complement of 14 drives vs the current complement of six drives). Of course increasing the drive count will increase the number of platters and spindles so it would be expected. I also tried bonding two Ethernet ports. This did not increase throughput.
With Ian’s help I did test the Oracle clustering filesystem. At this point it is really to fragile to consider using. In addition, the performance throughput testing that I did perform indicated that it was going to be significantly slower than both ext3 and the reiserfs. While we have to partition off the Coraid and dedicate each specific parition to a server - I think that this is certainly justified for more a more reliable filesystem that gives better performance.
Jumbo frames make a difference. I proved early in the testing that just configuring the switch and the ethernet card for jumbo frames increases raw throughput by around 20 Mbytes a second. NOT ALL ETHERNET CARDS SUPPORT MTUS ABOVE 1500!!! Check with the vendor before purchasing to see if this is supported.
Thanks to both Ian and Justin for their help with LVM, the ocfs and general system crap
2 Comments so far
Leave a reply
Would be interesting to see XFS results…
I did think about testing XFS. I have heard some horror stories about it corrupting data files though (if I recall..zeroing thing out essentially). Now that SGI is no longer I am really not sure what level of support there is either.
I would love to hear from other people about experiences with XFS though. If people are generally positive about it I can run the tests and see what happens. The equipment is still set up.