Thursday, April 9, 2015

Install and Configure Ceph RadosGateway- dumpling version on RHEL 7

This is radosgw lab steps I try. I used virtualbox and setup RHEL7 as guestOS. 

PS: before you start, please make sure you have all the required ceph repo under your /etc/yum.repo.d/


=================================
 Install Ceph Object Gateway
=================================
To run a Ceph Object Storage service, you must install Apache and Ceph Object Gateway daemon on the host that is going to provide the gateway service, i.e, the gateway host. 


install apache
=========================
#sudo yum install httpd

configure apache
=========================
1. Open the httpd.conf file:
#sudo vim /etc/httpd/conf/httpd.conf

2. Uncomment #ServerName in the file and add the name of your server. Provide the fully qualified domain name of the server machine (e.g., hostname -f):
ServerName {fqdn}

ServerName ceph-vm13

3. Edit the line Listen 80 in /etc/httpd/conf/httpd.conf with the public IP address of the host that you are configuring as a gateway server. Write Listen {IP ADDRESS}:80 in place of Listen 80.

Listen localhost:80 

4. Start httpd service
#sudo service httpd start
Or:
#sudo systemctl start http


install ceph 
=========================
#sudo yum install ceph-radosgw

PS:
ceph auth list
make sure keyring name in [] and path is same in ceph.conf

=================================
 Configuring Ceph Object Gateway
=================================

Configuring a Ceph Object Gateway requires a running Ceph Storage Cluster, and an Apache web server with the FastCGI module.

The Ceph Object Gateway is a client of the Ceph Storage Cluster. As a 
Ceph Storage Cluster client, it requires:

Assumy my hostname is ceph-vm13

Create a User and Keyring
=========================

1. Create a keyring for the gateway. ::

#sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.radosgw.keyring
#sudo chmod +r /etc/ceph/ceph.client.radosgw.keyring


2. Generate a Ceph Object Gateway user name and key for each instance. For exemplary purposes, we will use the name ``gateway`` after ``client.radosgw``:: 

#sudo ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n client.radosgw.gateway --gen-key


3 Add capabilities to the key. See `Configuration Reference - Pools`_ for details on the effect of write permissions for the monitor and creating pools. ::

#sudo ceph-authtool -n client.radosgw.gateway --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.client.radosgw.keyring


4. Once you have created a keyring and key to enable the Ceph Object Gateway with access to the Ceph Storage Cluster, add the key to your Ceph Storage Cluster. 
For example::

#sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.gateway -i /etc/ceph/ceph.client.radosgw.keyring


5. Distribute the keyring to the node with the gateway instance. ::

#sudo scp /etc/ceph/ceph.client.radosgw.keyring  root@ceph-vm13:/home/cephdeploy
#ssh ceph-vm13
#sudo mv ceph.client.radosgw.keyring /etc/ceph/ceph.client.radosgw.keyring

Create Pools
============

Ceph Object Gateways require Ceph Storage Cluster pools to store specific
gateway data.  If the user you created has permissions, the gateway will create the pools automatically. However, you should ensure that you have
set an appropriate default number of placement groups per pool into your Ceph configuration file.

.. note:: Ceph Object Gateways have multiple pools, so don't make the number of PGs too high considering all of the pools assigned to the same CRUSH hierarchy, or performance may suffer.

if you don't you can use this command to create pool, if you did before when you test osd you can skip here

#ceph osd pool create {poolname} {pg-num} {pgp-num} {replicated | erasure} [{erasure-code-profile}]  {ruleset-name} {ruleset-number}

Double check your pool
#sudo rados lspools


Add a Gateway Configuration to Ceph
===================================

Add the Ceph Object Gateway configuration to your Ceph Configuration file. The Ceph Object Gateway configuration requires you to identify the Ceph Object Gateway instance. Then, you must specify the host name where you installed the Ceph Object Gateway daemon, a keyring (for use with cephx), the socket path for  FastCGI and a log file. For example::  

[client.radosgw.{instance-name}]
host = {host-name}
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.{instance-name}.fastcgi.sock
log file = /var/log/radosgw/client.radosgw.{instance-name}.log

The ``[client.radosgw.*]`` portion of the gateway instance identifies this portion of the Ceph configuration file as configuring a Ceph Storage Cluster client where the client type is  a Ceph Object Gateway (i.e., ``radosgw``). The instance name follows. For example:: 

[client.radosgw.gateway]
host = ceph-vm13
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/radosgw/client.radosgw.gateway.log

.. note:: The ``host`` must be your machine hostname, not the FQDN. Make sure that the name you use for the FastCGI socket is not the same as the one used for the object gateway, which is ``ceph-client.radosgw. instance-name}.asok`` by default. You must use the same name in your S3 FastCGI file too. See `Add a Ceph Object Gateway Script`_ for details.


Redeploy Ceph Configuration
---------------------------

To use ``ceph-deploy`` to push a new copy of the configuration file to the hosts in your cluster, execute the following::

1 ceph-deploy config push {host-name [host-name]...}

#sudo ceph-deploy --overwrite-conf config pull ceph-vm13

#sudo ceph-deploy --overwrite-conf config push ceph-vm13

Add a Ceph Object Gateway Script
================================

Add a ``s3gw.fcgi`` file (use the same name referenced in the first line of ``rgw.conf``). For Debian/Ubuntu distributions, save the file to the ``/var/www`` directory. For CentOS/RHEL distributions, save the file to the``/var/www/html`` directory. Assuming a cluster named ``ceph`` (default), and the user created in previous steps, the contents of the file should include::

#sudo vim /var/www/html/s3gw.fcgi

#!/bin/sh
exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway

Ensure that you apply execute permissions to ``s3gw.fcgi``. ::

#sudo chmod +x /var/www/html/s3gw.fcgi

On some distributions, you must also change the ownership to ``apache``. :: 

#sudo chown apache:apache /var/www/html/s3gw.fcgi

Create Data Directory
=====================
#sudo mkdir -p /var/lib/ceph/radosgw/ceph-radosgw.gateway


Create a Gateway Configuration
==============================

On the host where you installed the Ceph Object Gateway, create an ``rgw.conf`` file. 

For CentOS/RHEL systems, place the
file in the ``/etc/httpd/conf.d`` directory. 

------------------------------
sudo vim /etc/httpd/conf.d/rgw.conf

###add###
FastCgiWrapper off
<VirtualHost *:80>
ServerName ceph-vm13
DocumentRoot /var/www/html

ErrorLog /var/log/httpd/rgw_error.log
CustomLog /var/log/httpd/rgw_access.log combined

# LogLevel debug
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
SetEnv proxy-nokeepalive 1
ProxyPass / fcgi://localhost:9000/
</VirtualHost>
-------------------------------

Adjust Path Ownership/Permissions
=================================
#sudo getenforce
Enforcing
#sudo setenforce 0
#sudo getenforce
Permissive
#sudo chown apache:apache /var/log/httpd
#sudo chown apache:apache /var/run/ceph
#sudo chown apache:apache /etc/httpd/conf.d

Restart Services and Start the Gateway
======================================
On CentOS/RHEL systems, use ``httpd``. For example:: 

#sudo systemctl restart httpd

Start the Gateway
-----------------
On CentOS/RHEL systems, use ``ceph-radosgw``. For example::

#sudo /etc/init.d/ceph-radosgw start

Tuesday, April 7, 2015

Storage Performance Evaluation - General Concept

Since my research is kinds of lean on Storage, thus storage performance is always most important topic for me. Most of the storage research, either invent new storage media or H/W or redesign the algorithm to refine the way how data store to fulfill specific purpose. 


No matter what's kinds of storage / memory research, the ultimate goal in general is reduce IOPs, response time ( latency ) and improve throughput. Of course there has other measurement from other angles such as power consumption, CPU cycle time ... etc, but as long as we are talking about the storage, can't get rid of below three major measurement indicators. 

Before we jump into to explain more detail about these indicators, I would like to give general concept about the disk architecture here. There has three of major as below.


Disk Drive Architecture
1. JBOD ( just a bunch of disks )
2. RAID ( Redundant Array of Inexpensive/Independence Disks )
3. SPAN ( Spanning drive : combine multiple drives into 1 single volume )

All the explanation as below is pretty much against JBOD.

Section 1: Definition

First of all, let's give a general definition of these three indicators.

1. Throughput = data volume / second

The throughput is usually used to describe the max theoretical limit of data transfer, while throughput is used to describe a real-world measurement. For me, throughput is common understandable for general people since it gives in units of size over units of time.

2. IOPs = I/O per second = Input / Output / second

The IOPs usually can be used to figure such as the amount of I/O created by DB, or you might use it when defining the max performance of storage system in AB test. 

3. Latency ( response time ) = Seconds per I/O = Seconds / ( Input / Output )

Latency appears in conversion magnetic disk regarding the disk header seek time and rotation latency. The basics: every time you need to access a block on a disk drive, the disk actuator arm has to move the head to the correct track (the seek time), then the disk platter has to rotate to locate the correct sector (the rotational latency).

Section 2: Relationship

Second, let's talk about the relationship between these indicators. 


a. Throughput = IOPs * I/O size (average block size)

This is my personal interpret and feel free to share your comment with me. Like we described above , throughput = data volume / second and IOPs is (I/O: input / ouput) / second, plus we know the average block size, thus we can get throughput = IOPs * average block size. eg: IOPs = 1000 and Average Block Size = 4K then throughput = 4000K.

b. IOPs = Per Second ( 1 second ) / Latency ( head seek time + rotation latency ) 

In conversion disk, latency is a common indicator for performance measurement. That I/O per second which usually includes the seek time and rotational latency. eg: the average rotation latency of a 15k RPM disk is 4ms(15,000 rotations per minute = 250 rotations per second, which means one rotation is 1/250th of a second or 4ms). and we can assume 1ms latency form disk header seeking time and total is 4+1 = 5ms. The physical limit of this disk on each spindle per second is 1/(5/1000) = 200 IOPs.

Section 3: Penalty

Follow up with latency, I would say it would be like penalty in conversion disk nowadays especially when random transaction (read / write) vs sequential happen.


Let me use the most common question people ask : Why IOPs is costy on convention HDD ? It's because sequential vs random. In the real world, Magnetic Disk ( Convention HDD ) cost when do the random read/write since the head seek and rotation latency but in sequential read/read, disk head move basically free, doesn't cost head seek time and limit disk rotation because the data section is right behind each I/O. 

Let's say if HDD RPM is 7200, random read/write slow down the performance since disk header and sector spin move a lot. Thus, IOPs is costy when you are on Convention HDD. Another example is when you read/write large file the latency cost less since they are one after one in sequential rather than small fraction files might cost more latency in that has higher IOPs since disk need to move a lot.

In sum, if that next block is somewhere else on the disk, you will need to incur the same penalties of seek time and rotational latency. We call this type of operation a random I/O. But if the next block happened to be located directly after the previous one on the same track, the disk head would encounter it immediately afterwards, incurring no wait time (i.e. no latency). This, of course, is a sequential I/O.


That I/O results in a certain amount of latency, as described earlier on (the seek time and rotational latency). 



Section 4: Flash Offers Another Way


The idea of sequential I/O doesn't exist with flash memory, because there is no physical concept of blocks being adjacent or contiguous. Logically, two blocks may have consecutive block addresses, but this has no bearing on where the actual information is electronically stored. "You might therefore say that all flash I/O is random, but in truth the principles of random I/O versus sequential I/O are disk concepts so don't really apply". And since the latency of flash is sub-millisecond, it should be possible to see that, even for a single-threaded process, a much larger number of IOPs is possible. When we start considering "concurrent operations things" or "in-line deduplication" get even more interesting… but that topic is for another day.


This sector (Flash Offer Another Way) is most reference by (http://www.violin-memory.com/blog/understanding-io-random-vs-sequential/)

Section 5: Tools

There has lots of storage performance tools on market either it's open or needs licensing. Here I would like to introduce three of major that I am familiar with. Hopefully it helps you jump into storage domain quick and get through above indicators easier.

Linux DD

Linux DD is a default and common tool which is used for storage performance testing. dd can be used for simplified copying of data at the low level. In doing this, device files are often access directly. Since additional queries do not arrive while the device is being accessed, erroneous usage of dd can quickly lead to data loss. I absolutely recommend performing the steps described below on test systems. If dd is used incorrectly, data loss will be the result.


fio 

fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 19 different types of I/O engines (sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi, solarisaio, and more), I/O priorities (for newer Linux kernels), rate I/O, forked or threaded jobs, and much more. It can work on block devices as well as files. fio accepts job descriptions in a simple-to-understand text format. Several example job files are included. fio displays all sorts of I/O performance information, including complete IO latencies and percentiles. fio is in wide use in many places, for both benchmarking, QA, and verification purposes. It supports Linux, FreeBSD, NetBSD, OpenBSD, OS X, OpenSolaris, AIX, HP-UX, Android, and Windows.


vdbench

Vdbench is a command line utility specifically created to help engineers and customers generate disk I/O workloads to be used for validating storage performance and storage data integrity. Vdbench execution parameters may also specified via an input text file. It's a intersting tool and get rid of lots of setup but require to read through the guide since all the configuration is exist in parmfile. It can be combined with Oracle swat, however as my understanding the swat is in oracle licensing and not open.


In sum, I would like to demo how to install and use these tools, however I would think it might need another posting. If you are interesting in, stay tune, I will put it on my blog later.

Reference:

Movie Quote:

A smile doesn't always show that you're happy; sometimes, it shows that you are strong.
- (Jerry Maguire), 1996