Johnny ( Chianing ) Wang : Storage Performance Evaluation using Tools

I mentioned about the storage performance evaluation concept in previous blog post, and introduce 3 tools I familiar with to the people. Today, I would like to share more detail regarding these three tools. Hopefully this post can help people who would like to star the storage performance evaluation.

===dd===

dd can be used for simplified copying of data at the low level, and giving the brief. In doing this, device files are often access directly. Since additional queries do not arrive while the device is being accessed, erroneous usage of dd can quickly lead to data loss. I recommend performing the steps described below on test systems.

Modern OS do not normally write files immediately to RAID systems or HDD (JBOD). Temporary memory will be used to cache writes and read. I/O performance measurements will not be effected by these caches, the oflag parameter can be used. Here are couple oflag examples.

direct - use direct I/O for data

dsync - use synchronized I/O for data

sync - likewise, but also for metadata (i-node)

Throughput (streaming I/O)

ceph@ceph-VirtualBox:~/dd$ dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 32.474 s, 33.1 MB/s

Clean the cache

ceph@ceph-VirtualBox:~/dd$ dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=sync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 123.37 s, 8.7 MB/s

Latency

ceph@ceph-VirtualBox:~/dd$ dd if=/dev/zero of=/root/testfile bs=512 count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.36084 s, 1.4 MB/s

Clean the cache

ceph@ceph-VirtualBox:~/dd$ dd if=/dev/zero of=/root/testfile bs=512 count=1000 oflag=sync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 11.1865 s, 45.8 kB/s

===fio===

fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 19 different types of I/O engines ( sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi, solarisaio, and more ), I/O priorities (for newer Linux kernels), rate I/O, forked or threaded jobs, and much more.

It can work on block devices as well as files, fio accepts job descriptions in a simple-to-understand text format. Several example job files are included. fio displays all sorts of I/O performance information, including complete IO latencies and percentiles. fio is in wide use in many places, for both benchmarking, QA, and verification purpose. It support Linux, FreeBSD, NetBSD, OpenBSD, OS X, OpenSolaris, AIX, HP-UX, Android, and Windows.

As current most the ubuntu version, fio is default package pre-installed. Even though, I still listed the required packages for setting up the fio environment.

Other than ubuntu, I realized RHEL might be different. such as the requirements as below might be needed for setup the test environment.

---RHEL Installation---

In a Virtual Machine, install and configure the fio.

# yum install libaio
# yum install blktrace
# yum install fio
or
# (rpm -ivh fio-2.1.11-1.el7.x86_64-2.rpm) - option

ps: If you need the btrace for debugging performance, install the blktrace package to get the btrace utility.

---Ubuntu Installation---

apt_upgrade: true

1. packages: ksh, fio ,sysstat
#sudo apt-get install ksh
#sudo apt-get install fio
#sudo apt-get install sysstat

or you can

2. download fio

--- How fio works---

Two ways to trigger fio run, 1. command line and 2, job file

1. command line, eg:

#fio --name=global --rw=randread --size=129M --runtime=120
or
#fio --name=random-writers --ioengine=libaio --iodepth=4 --rw=randwrite --bs=32k --direct=0 --size=64m --numjobs=4

2. Job file

The first step in getting fio to simulate a desired IO workload, is writing a job file describing that specific setup. A job file may contain any number of threads and/or files - the typical contents of the job file is a global section defining shared parameters, and one or more job sections descripbing the jobs involved. When run, fio parses this file and sets everything up as described. If we break down a job from top to bottom, it contains the following basic parameters as below:
command line directory
[job1]
[job2]

Same command with previous cli exampe
fio --name=random-writers --ioengine=libaio --iodepth=4 --rw=randwrite --bs=32k --direct=0 --size=64m --numjobs=4

equal

[random-writers]
ioengine=libaio
iodepth=4
rw=randwrite
bs=32k
direct=0
size=64m
numjobs=4

3. command line combine with Job file

Here is a example for general outline for fio configuration file, but like what's we explain, the cli can be combined with Job file and cli can overwrite Job file configuration if there has any need.

eg: sample
#fio --name=global --rw=randread --size=128m --name=job1 --name=job2
; -- start job file --
[global]
rw=randread
size=128m
[job1]
[job2]

eg: real example
#fio --name=global --rw=randread --size=128m --name=job1 --name=job2
job1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
job2: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.3
Starting 2 processes
job1: Laying out IO file(s) (1 file(s) / 128MB)
job2: Laying out IO file(s) (1 file(s) / 128MB)
Jobs: 2 (f=2): [rr] [11.9% done] [427KB/0KB/0KB /s] [106/0/0 iops] [eta 07m:33s]

===vdbench===

Vdbench is a command line utility specifically created to help engineers and customers generate disk I/O workloads to be used for validating storage performance and storage data integrity. Vdbench execution parameters may also specified via an input text file.

It is open source and is provided by Oracle. You can download the code from here. http://www.oracle.com/technetwork/server-storage/vdbench-downloads-1901681.html

Here are the general steps you can make the vdbench working on your box.
It includes all the required binary for both windows and linux.

download vdbench*.zip to your box
unzip vdbench*.zip
setup environment

a: windows needs java
b: linux need csh, java

prepare parmfile
vdbench test - #./vdbench -t or #./vdbench -f
run vdbench with parameter file

---run with parmfile---

Run command
-f : with parmfile location
-o : with output log directory
#./vdbench -f parmfile -o ./output

PS: You can do a very quick simple test without even having to create a parameter file:
# ./vdbench –t (for a raw I/O workload)
# ./vdbench -tf (for a file system workload)

---run with dymanic lun w/ cli---

Variable substitution allows you to code variables like $lun in your parameter file which then can be overridden from the command line. For example: sd=sd1,lun=$lun, $lun must be overridden from the command line: ./vdbench -f parmfile lun=/dev/x.

In case your parameter file is embedded in a shell script, you may also specify a '!' to prevent accidental substitution by the scripting language, e.g. sd=sd1,lun=!lun

---remote to slave ---

vdbench usually run controller against with multiple slaves, each slaves can attached single or multiple volumes depends on design.

---Sum---

How 's vdbench controller to communicate slaves ? It's using shell and shell can be rsh, ssh or vdbench ( the slave have to run daemon first via command line ./vdbench rsh )
What's kinds of workload that vdbench can provided ? It can provide raw Disk IO and File System workload.
Not only the general workload, vdbench can provide deduplication and compression workload which provide you more info regarding your disk performance in specific circumstance. eg: dedupratio=2,dedupunit=4k,dedupsets=33 in parmfile

raw disk IO

---hd: host definition---

This is how you design your master/slaves architecture.

eg: general in default and each row
hd=default,user=ubuntu,shell=vdbench,vdbench=/home/ubuntu/vdbench
hd=fio-driver-1,system=192.168.2.74
...
eg: each row (slaves) has all unique user name for ssh connection.
hd=drive-1,system=192.168.2.74,user=ubuntu,shell=ssh
hd=drive-2,system=192.168.2.75,user=ubuntu,shell=ssh
...
eg: general in default and specific slaves in each row
host=default,user=ubuntu,shell=ssh
hd=one,system=192.168.1.30
host=(localhost,one)

PS: Remote Raw Disk IO Working - add root Privilege w/o password for vdbench
#sudo visudo
add ceph ALL=NOPASSWD: /home/ceph/vdbench/shutdown -f parmfile at the last line
ctrl-X + enter for exit

---sd: storage definition---

sd : storage definition (use any: sd1, sd2 ...sdtest...) lun=/dev/vdb : I use RAW device (that mounted from storage, create LUN or Volume on Storage system and mount it to testing server. There are many kind of storage if you want to stress, disk, raw device, file system etc.)

threads: maximum number of concurrent outstanding I/O that we want to flush.

PS: 'seekpct=nn': Percentage of Random Seeks

---wd: workload definition (use any)---

xfersize: data transfer size (1M,70, 10M, 30): Generate xfersize as a random value between 1 Megabyte and 10 Megabyte with weight for random value is 70%.
rdpct: read percentage (70% is read and 30% is write).

---rd: run definition (use name any)---

iorate=max: Run an uncontrolled workload. (iorate=100 : Run a workload of 100 I/Os per second)
elapsed: time to run this test (second)
interval: report interval to your screen in second.

Sum as eg:
sd=sd1,lun=/dev/vdb,openflags=o_direct,threads=200
wd=wd1,sd=sd1,xfersize=(1M,70,10M,30),rdpct=70
rd=run1,wd=wd1,iorate=max,elapsed=600,interval=1

File System IO

---fsd: file system definition---

fsd=fsd1,anchor=/home/ubuntu/vdbench/TEST,shared=yes,depth=1,width=8,files=4,size=8k

or

fsd=fsd1,anchor=/media/20GB/TEST,depth=1,width=8,files=4,size=8k

---fwd: file workload definition---

fwd=fwd1,fsd=fsd1,operation=read,xfersize=4k,fileio=sequential,fileselect=random,threads=4
rd=rd1,fwd=fwd1,fwdrate=100,format=yes,elapsed=10,interval=1

*anchor: starting point for generate the files
*depth: directory levels
*width: folders number in the directory
*files: files under the folder
*size: each file size
*operation: mkdir, rmdir, create, delete, open, close, read, write, getattr and setattr
*xfersize: data transfer size
*fileio: sequential, random
*fileselect: how to select file name, directory name
*threads: how many concurrent threads

*fwdrate: how many file system operations per second, eg: 100: run a workload of 100 operations per second.
*format: during run, if needed, crate the complete file structure no

PS: Vdbench will first delete the current file structure and then will create the file structure again. It will then execute the run you requested in the current RD.

*interval: per second

PS: there has a pdf manual to explain how to run the vdbench. It's a good resource to guide you through the execution.

---compression and deduplication---

compression

Compratio=n
Ratio between the original data and the actually written data,
e.g.compratio=2 for a 2:1 ratio. Default: compratio=1

deduplication

Data Deduplication is built into Vdbench with the understanding that the dedup logic included in the target storage device looks at each n-byte data block to see if a block with identical content already exists. When there is a match the block no longer needs to be written to storage and a
pointer to the already existing block is stored instead. Since it is possible for dedup and data compression algorithms to be used at the same time, dedup by default generates data patterns that do not compress.

Ratio between the original data and the actually written data, eg: dedupratio=2 for a 2:1 ratio. Default: no dedup, or dedupratio=1

The size of a data block that dedup tries to match with already existing data. Default dedupunit=128k

How many different sets or groups of duplicate blocks to have. See below. Default: dedupsets=5% (You can also just code a numeric value, e.g. dedupsets=100)

For a File System Definition (FSD) dedup is controlled on an FSD level, so not on a file level.

These blocks are unique and will always be unique, even when they are rewritten. In other words, a unique block will be rewritten with a different content than all its previous versions.

All blocks within a set are duplicates of each other. How many sets are there? I have set the default to dedupsets=5%, or 5% of the estimated total amount of dedupunit=nn blocks.

A 128m SD file, for 1024 128k blocks. dedupratio = 2 and dedupset = 5%

dedupratio = 2 = 50% = 1024 * 0.5 = 512
unique blocks = 512 - 51 = 461

Dedupratio=2 ultimately will result in wanting 512 data blocks to be written to disk and 512 blocks that are duplicates of other blocks.

all read and write operations must be a multiple of that dedupunit= size. dedupunit = 8k, all data transfer will be 8k, 16k, 25k ... etc

Of course, if 48 hours is not enough elapsed time Vdbench will terminate BEFORE the last block has been written.

PS: the deduplication work load is fix chunk size but it provides two different patterns, Unique blocks and Duplicate blocks.

There are ‘nn’ sets or groups of duplicate blocks.
Example:

dedupset = 1024 * 0.05 = 51 sets
duplicate blocks = 512 + 51 = 563
total = duplicate + unique = 563 + 461 = 1024
dedupset=5%; There will be (5% of 1024) 51 sets of duplicate blocks.

===Sum of Performance Evaluation===

For measuring write performance, the data to be written should be read from /dev/zero and ideally written it to an empty RAID arrayy, hard disk or partition (such as using of=/dev/sda for the first HDD or of=/dev/sda2 for the 2nd partition on the first HDD.) If this is not possible, a normal file in the file system (such as using of=/root/testfile) can be written. PS: You should only use empty RAID arrays, HDD or partitions.

When using if=/dev/zero and bs=1G, OS will need 1GB of free space in RAM. In order to get results closer to real-life, performin the tests described several times(3 ~ 10 times)

Last, below figure shows an example conceptually of the type of graph the performance evaluation should end up with if you plot IOPS on X- and Latency on Y. The saturation point is the point where latency increases exponentially with additional load. The inflection point in the right hand portion of figure is the performance saturation point.

Here is a general phase of a performance test plan should have. This is just an example but give us a overview of key tasks with time should includes.

Reference:

https://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd
https://github.com/axboe/fio/blob/master/HOWTO
https://community.oracle.com/community/server_%26_storage_systems/storage/vdbench
http://info.purestorage.com/rs/purestorage/images/IDC_Report_AFA_Performance_Testing_Framework.pdf
http://www.brendangregg.com/sysperfbook.html

Johnny ( Chianing ) Wang

Monday, June 8, 2015

Storage Performance Evaluation using Tools