Johnny ( Chianing ) Wang : Storage Performance Evaluation

Since my research is kinds of lean on Storage, thus storage performance is always most important topic for me. Most of the storage research, either invent new storage media or H/W or redesign the algorithm to refine the way how data store to fulfill specific purpose.

No matter what's kinds of storage / memory research, the ultimate goal in general is reduce IOPs, response time ( latency ) and improve throughput. Of course there has other measurement from other angles such as power consumption, CPU cycle time ... etc, but as long as we are talking about the storage, can't get rid of below three major measurement indicators.

Before we jump into to explain more detail about these indicators, I would like to give general concept about the disk architecture here. There has three of major as below.

Disk Drive Architecture

1. JBOD ( just a bunch of disks )

2. RAID ( Redundant Array of Inexpensive/Independence Disks )

3. SPAN ( Spanning drive : combine multiple drives into 1 single volume )

All the explanation as below is pretty much against JBOD.

Section 1: Definition

First of all, let's give a general definition of these three indicators.

1. Throughput = data volume / second

The throughput is usually used to describe the max theoretical limit of data transfer, while throughput is used to describe a real-world measurement. For me, throughput is common understandable for general people since it gives in units of size over units of time.

2. IOPs = I/O per second = Input / Output / second

The IOPs usually can be used to figure such as the amount of I/O created by DB, or you might use it when defining the max performance of storage system in AB test.

3. Latency ( response time ) = Seconds per I/O = Seconds / ( Input / Output )

Latency appears in conversion magnetic disk regarding the disk header seek time and rotation latency. The basics: every time you need to access a block on a disk drive, the disk actuator arm has to move the head to the correct track (the seek time), then the disk platter has to rotate to locate the correct sector (the rotational latency).

Section 2: Relationship

Second, let's talk about the relationship between these indicators.

a. Throughput = IOPs * I/O size (average block size)

This is my personal interpret and feel free to share your comment with me. Like we described above , throughput = data volume / second and IOPs is (I/O: input / ouput) / second, plus we know the average block size, thus we can get throughput = IOPs * average block size. eg: IOPs = 1000 and Average Block Size = 4K then throughput = 4000K.

b. IOPs = Per Second ( 1 second ) / Latency ( head seek time + rotation latency )

In conversion disk, latency is a common indicator for performance measurement. That I/O per second which usually includes the seek time and rotational latency. eg: the average rotation latency of a 15k RPM disk is 4ms(15,000 rotations per minute = 250 rotations per second, which means one rotation is 1/250th of a second or 4ms). and we can assume 1ms latency form disk header seeking time and total is 4+1 = 5ms. The physical limit of this disk on each spindle per second is 1/(5/1000) = 200 IOPs.

Section 3: Penalty

Follow up with latency, I would say it would be like penalty in conversion disk nowadays especially when random transaction (read / write) vs sequential happen.

Let me use the most common question people ask : Why IOPs is costy on convention HDD ? It's because sequential vs random. In the real world, Magnetic Disk ( Convention HDD ) cost when do the random read/write since the head seek and rotation latency but in sequential read/read, disk head move basically free, doesn't cost head seek time and limit disk rotation because the data section is right behind each I/O.

Let's say if HDD RPM is 7200, random read/write slow down the performance since disk header and sector spin move a lot. Thus, IOPs is costy when you are on Convention HDD. Another example is when you read/write large file the latency cost less since they are one after one in sequential rather than small fraction files might cost more latency in that has higher IOPs since disk need to move a lot.

In sum, if that next block is somewhere else on the disk, you will need to incur the same penalties of seek time and rotational latency. We call this type of operation a random I/O. But if the next block happened to be located directly after the previous one on the same track, the disk head would encounter it immediately afterwards, incurring no wait time (i.e. no latency). This, of course, is a sequential I/O.

That I/O results in a certain amount of latency, as described earlier on (the seek time and rotational latency).

Section 4: Flash Offers Another Way

The idea of sequential I/O doesn't exist with flash memory, because there is no physical concept of blocks being adjacent or contiguous. Logically, two blocks may have consecutive block addresses, but this has no bearing on where the actual information is electronically stored. "You might therefore say that all flash I/O is random, but in truth the principles of random I/O versus sequential I/O are disk concepts so don't really apply". And since the latency of flash is sub-millisecond, it should be possible to see that, even for a single-threaded process, a much larger number of IOPs is possible. When we start considering "concurrent operations things" or "in-line deduplication" get even more interesting… but that topic is for another day.

This sector (Flash Offer Another Way) is most reference by (http://www.violin-memory.com/blog/understanding-io-random-vs-sequential/)

Section 5: Tools

There has lots of storage performance tools on market either it's open or needs licensing. Here I would like to introduce three of major that I am familiar with. Hopefully it helps you jump into storage domain quick and get through above indicators easier.

Linux DD

Linux DD is a default and common tool which is used for storage performance testing. dd can be used for simplified copying of data at the low level. In doing this, device files are often access directly. Since additional queries do not arrive while the device is being accessed, erroneous usage of dd can quickly lead to data loss. I absolutely recommend performing the steps described below on test systems. If dd is used incorrectly, data loss will be the result.

fio

fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 19 different types of I/O engines (sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi, solarisaio, and more), I/O priorities (for newer Linux kernels), rate I/O, forked or threaded jobs, and much more. It can work on block devices as well as files. fio accepts job descriptions in a simple-to-understand text format. Several example job files are included. fio displays all sorts of I/O performance information, including complete IO latencies and percentiles. fio is in wide use in many places, for both benchmarking, QA, and verification purposes. It supports Linux, FreeBSD, NetBSD, OpenBSD, OS X, OpenSolaris, AIX, HP-UX, Android, and Windows.

vdbench

Vdbench is a command line utility specifically created to help engineers and customers generate disk I/O workloads to be used for validating storage performance and storage data integrity. Vdbench execution parameters may also specified via an input text file. It's a intersting tool and get rid of lots of setup but require to read through the guide since all the configuration is exist in parmfile. It can be combined with Oracle swat, however as my understanding the swat is in oracle licensing and not open.

In sum, I would like to demo how to install and use these tools, however I would think it might need another posting. If you are interesting in, stay tune, I will put it on my blog later.

Reference:

http://www.violin-memory.com/blog/understanding-io-random-vs-sequential/

http://www.educity.cn/linux/1145329.html

https://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd

http://freecode.com/projects/fio

http://www.oracle.com/technetwork/server-storage/vdbench-downloads-1901681.html

Appendix:

This is what I found on line. It's a good info.

Movie Quote:

A smile doesn't always show that you're happy; sometimes, it shows that you are strong.

- (Jerry Maguire), 1996

Johnny ( Chianing ) Wang

Tuesday, April 7, 2015

Storage Performance Evaluation - General Concept