Saturday, January 30, 2016

JBox - Multi-Clients In-line Deduplication Archival Storage Service over Swift

What's JBox

I spent my leisure time to play around some of ideas and develop a Archival Storage System with in-line deduplication over internet, call JBox, intended to backup data into Object Storage(Swift) and to sync with multiple clients on the fly.

JBox adopts 2-tier metadata structure in order to effectively operate file system and allows to sync with multiple clients. During the file syncing, copy on write(CoW) makes sure metadata can be updated mutually exclusive and reference counter supports object purge to save more storage space. JBox reduces upload bandwidth and storage consumption by chunk compression and variable chunk deduplication which allows delta sync and versioning (snapshot) feature. JBox has dedup-map to make archive configurable to fit different kinds of backup stream. It does not only control the dedup-anchor for numbers of the chunks per file but also provide different kinds of deduplication skins, to try to balance between efficiency and performance.

JBox is my prototype to demonstrate functional features as above and to prove variable chunk deduplication can lead to good performance. JBox uses native OpenStack Swift and KeyStone API, resulting in transaction time and size that is comparable to other non-archival Cloud Storage Services on market.
In this post, I would like to explain the logics about the skills I used in JBox especially on file sync via 2-Tier Metadata Structure, File Syncing, Reference Contuer and Copy on Write (CoW). With these skills, I could be able to watch the file operation and handle file sync between difference clients then control a most update version in OpenStack ObjectStorage Swift.

Last, the code are all opensource, feel free to check it out on my github: https://github.com/chianingwang/JBox

Metadata with Multi-Client

How's sync work with multiple clients ? It's pretty much rely on metadata. When File V2 ( Version 2 ) update, the metadata will upload to Repository, then other clients will sync with repository and download the metadata and aware it's missing chunk 4, then download chunk4, and recompose file via replace chunk 3 to chunk 4. The diagram as below shows the basic logic. 


2-Tier Metadata Structure

First of all, in your Swift, each person has a container, and container actually refer to your username. Under your container(username), there has 3 major file structure in the system, File Level Metadata, Chunk Level Metadata and Compressed Chunk. The example is as below and I highlight each of them in ( ).

USERMETAFILE ( File Level Metadata: FLM )
fbd1ec0de6be540aa8e074fc239732376 ( Chunk Level Metadata: CLM )
c1108c6e5e55825382526ca5bc4b70d702 ( Compressed Chunk )

File Level Metadata(FLM) - 1st Tier

FLM basically maintain all the directories/files under specific user, it includes some of the basic info against with each file I just list some of them roughly as below.

Timestamp
FileName
FileGUID
FileContentHash
NoClient=2

===FLM Start===
2016-01-21 20:46:01+00:00 var 2.0 00000000000000000000000000000000 6fc418c0261e45afb65094d75c2ee17e 000200002016-01-21 20:33:39+00:00 2016-01-21 20:33:47+00:00 000000000000/home/johnny/JBox bc4bafbb18d94905bdee30afca728ac8 62e5b8b7f5834d43a0e59861b229cd52 000000002016-01-21 20:45:49+00:00 2016-01-21 20:45:49+00:00 a49ccd82312e779d86ef5f72c1097cb5 000000624640/home/johnny/JBox/TEST
===FLM End===

Chunk Level Metadata(CLM) - 2nd Tier

CLM basically maintain all the chunk with version under specific file. It's include file level info which include File Content Hash and all the chuck ( Start , Length, indicator for compression and Chuck Content Hash ). The version sort by time and list with timestamp such as 2016-01-21 15:38:04+00:00. Here is one of the example fyi.
===CLM Start===
2016-01-21 15:38:04+00:00
file 0 624640 15 a49ccd82312e779d86ef5f72c1097cb5
chunk 0 65535 1 d073f1a20678d56d0b35cee8544dfeb6
chunk 65536 131071 1 e06bd88ecbbdd055d72953d861650d03
chunk 131072 196607 1 c89a6de9d5e0980182bc90447d8d550c
chunk 196608 262143 1 39482d0736b5323fcc4b3b85b4e73452
chunk 262144 327679 1 8cef30ae43782d45cc3dd4a400221e6c
chunk 327680 393215 1 eaa489a017366c74034a7d030663b48b
chunk 393216 454165 1 108c6e5e55825382526ca5bc4b70d702
chunk 454166 519701 1 d5b2fc861eaef1fdce97b42532a84de8
chunk 519702 563711 1 ddcaeaa2f448253cebd5f034c05da4da
chunk 563712 608282 1 d377c03bbdc6aa4ea6dff3003240320b
chunk 608283 624639 1 f3cf91ea8820fc58140e909f7eaf196c
---------------------------------------------
2016-01-21 20:33:39+00:00
file 0 624645 15 e971327b210a5e058e4ea9395adf06f4
chunk 0 65535 1 d073f1a20678d56d0b35cee8544dfeb6
chunk 65536 131071 1 e06bd88ecbbdd055d72953d861650d03
chunk 131072 196607 1 c89a6de9d5e0980182bc90447d8d550c
chunk 196608 262143 1 39482d0736b5323fcc4b3b85b4e73452
chunk 262144 327679 1 8cef30ae43782d45cc3dd4a400221e6c
chunk 327680 393215 1 eaa489a017366c74034a7d030663b48b
chunk 393216 454165 1 108c6e5e55825382526ca5bc4b70d702
chunk 454166 519701 1 d5b2fc861eaef1fdce97b42532a84de8
chunk 519702 563711 1 ddcaeaa2f448253cebd5f034c05da4da
chunk 563712 608282 1 d377c03bbdc6aa4ea6dff3003240320b
chunk 608283 624644 1 6f15fcf94e40be34f70ad4b2b1631b04
---------------------------------------------
===CLM End===

Compressed Chunk

Compressed chunk is very straight forward, it's a raw data name by Chunk Content Hash with compressed raw data.

e.g. c1108c6e5e55825382526ca5bc4b70d702

File Syncing

Sync the files in JBox is kinds of complicated but I try my best to make it cleaier, I use diagram and table to try to make it easier to understand. Hopefully it helps.

In file operate in OS File System, the most command operations we are new, update, copy, move/rename and delete. JBox design parameters in FLM, which allow JBox to control file operation properly. 
The table as below shows what's action JBox is doing, when these operation happen. Top columns is file operations which are new, update, copy, rename/move and delete and most left row is parameters value in FLM.



File Level Metadata
Chunk Level Metadata
12345 - Client 1
(TestFile3)
5 - Client 2
(TestFile3)
N:New, U:Update, X: NoneNewUpdateCopyRename/
Move
Delete
TimestampNUNUU
FileNameNXNNX
FileGUIDNXNXX
FileContentHashNUXXX
NoClient=20000Set NoClient=2If NoClient<>0 then
NoClient=NoClient-1
if NoClient=1 then 
Purge File Level Metadata

Regarding above, JBox can catch most of the file system transaction in all kinds of OS. 

The most important thing is JBox doesn't need to adopt any extra library to do the thing like Linux inotify which means JBox doesn't need to reference specific file system monitor library such as FileSystemWatcher in Windows for C# or JNotify in Linux for Java. 

I break down each file operations into the diagrams as below. You can see the relationship between FLM, CLM and Chunks, and the relation link-key to tie up each other. 

When New file in JBox directory, JBox will use random generator to generate a GUID assign to a file and it can be found in FLM and CLM. At mean time, JBox hash the content as a file fingerprint for identify the whether the file is changed or not. CLM is named by FileGUID (fxxxxx) includes file content hash ( file fingerprint ) and breakdown file content into multiple chunks and each chuck has as fingerprint list in CLM. These chunk fingerprint is the id for each chunk which is same with chunk object name (c0xxxxx or c1xxxxx) and content is compressed raw data. Here is the diagram to show when JBox new a file.


In General: w/o Chunk Reference Counter




Then when file is changed, the file content hash (file fingerprint) is different. JBox will update file content hash the FLM, then add a new version separate by "-------" in CLM, and upload a new chunk with compressed raw data to ObjectStorage, Swift.


When copy happen, JBox will generate another new FileGUID which means add a new record in FLM, then upload a new CLM with new name from new FileGUID. However the chunk list in CLM is the latest version from source CLM. In this case, JBox only update FLM and upload a new CLM but won't upload any chunks since there is no any new chunk happen.


When the rename/move happen, JBox only update the FLM replace the old name to new name since JBox maintain full path ( directory + filename ) in FLM, thus move is same with rename. In this case, JBox only update a record in FLM.


JBox has to configure how many clients are running JBox. When file got deleted, JBox will replace reference counter from default 0 to number of clients. e.g. 2 in below case. In this case, since client 1 delete local file , JBox found reference counter=0 which this client is the first client delete the file, then set 0 to 2 ( Number of Client: NoClient ). In this case JBox update FLM only.



Here is example when there has one of the client deleted the file, and JBox sync with other client. JBox found reference counter is not 0, then remove local file and reduce 1 from reference counter. In below case, 2 - 1 = 1, thus JBox found the reference counter = 1 which means all the clients are purged already. JBox will delete that file record in FLM, then update CLM metadata's "X-Delete-At" parameter with deletion lead time ( JBox configured before e.g. : 10 minutes from now 's unix time ). After 10 minutes, Swift will purge the object automatically. In this case at 2nd client, JBox will update reference counter in FLM, update "X-Delete-At" with unix time in CLM metadata, and delete the record in FLM eventually.
 


For Chunk Purge: w/ Chunk Reference Counter

JBox provides an option to turn on the chunk ( object ) reference counter. With this feature, JBox can support purge object automatically, and provide virtual storage tiering to reduce chunk deduplication time.

When file new, turn on object reference counter is pretty much same with w/o turn on in previous section. However, as you can see as below, JBox give a starting reference counters for each chunk ( object ). The case as below is give 01 for each.
In Update case, it's same with New, for the new object we give the initiate reference counter, for the object be referenced, we add 1 in reference counter. The case as below we can see some object got 2 since we has 2 version and some object actually be referenced in 2 version, only the new chunk we add since the file content update, we give the initiate value 01 and same with the chunk only be referenced in the 1st version.

Follow the logic, whenever the chunk be referenced, JBox increase. In the case as below, since we copy 2nd version from the  file 1 to file 2, chunk abcxxxxxx1, abcxxxxxx2 and abcxxxxxx4 be referenced again.


Rename/Move, the chunk doesn't been referenced. JBox doesn't do anything here.




When TestFile3 deletion happened, 1st client like we mentioned in above, only update the FLM's reference counter and set it to # of client.

When 2nd client local file get deletion and JBox aware reference counter in FLM = 1, then FLM record get removed. and CLM "X-Delete-At" set leadtime and ready for purge. In this situation since chunk still get reference since reference counter is still >= initiate value. JBox only reduce reference counter against these chunks.


In the last case as below, the TestFile1 is deleted as well which means JBox repeat above 2 deletion steps. 

  • a. remove record in FLM
  • b. set "X-Delete-At" with lead time in CLM (object) metadata, and reduce chunk's reference counter. 
  • c. If JBox found the reference counter < initiate value which means these chunks don't be referenced by any files, then set "X-Delete-At" with lead time in chunk (object) metadata.
  • d. ObjectStack Swift will purge these objects ( CLMs and Chunks) when the unix time up. 


Turn on chunk reference counter can achieve chunk purge automatically for saving more storage space but it will increase the transaction time since in some of the file operations, as long as the chunk increase or decrease the reference, JBox need to update the Swift Object Metadata "X-Delete-At". This won't increase storage space but it will increase JBox transaction time.

Pseudo Code in Table

As a programmer, I would like to explain the JBox file system monitor more detail in pseudo code way. 

The metadata is the key for file sync, basically we have 3 copies of metadata, when JBox is doing sync, 1st of all will take a snapshot at local sync folder as a "snapshot metadata". Snapshot metadata will be merged with local metadata to understand what's file is new, update, copy, rename/move or delete which get file system operation notification. In this process, I call it "local merge". 

The local merge pseudo code concept is as below from top to down match with transactions (new/update/copy/rename/move/delete). 

Frankly speaking, snapshot and local metadata will be import into a "fileinfo" Java class  which include some major variables such as Timestamp, FileName and FileContentHash at the most left raw in the table as below. 

Here is fileinfo Java Class example, the code can  be found in git: https://github.com/chianingwang/JBox.

===example fileInfo java class===
public class fileInfo  implements Comparable<fileInfo> {
    /** the id in sqlite db */
    public int dbid;
    /** The filename. */
    public String filename;
    /** The object guid. */
    public String guid;
    /** The object parent guid. - obsoleted */
    public String parentguid;
    /** Status. <br>
     * <b>0</b>: Not shared   <br>
     * <b>1</b>: Shared
     * */
    public int status;
    /** Shared Status. <br>
     * <b>0</b>: file   <br>
     * <b>1</b>: folder <br>
     * <b>2</b>: base folder
     * */
    public int type;      
    /** The file time stamp. */
    public Date dt;
    /** The last action time stamp. */
    public Date lastaction;
    /** The file required operation.
     * @see FOP
     */
    public FOP fop;
    /** The file hash. */
    public String filehash;
    /** The file owner. */
    public String owner;
    /** The file byte length. */
    public long bytelength;
    /** The version flag.  <br>
     * <b>0</b>: prelocal   <br>
     * <b>1</b>: curlocal   <br>
     * <b>2</b>: mergedlocal <br>
     * <b>3</b>: remote <br>
     * <b>4</b>: merged
     * */
    public int versionflag;
    public int mod;
    public fileInfo(String f, String gid, String pgid, int ss, int t, Date d, Date la)
    {
        filename = f;
        guid = gid;
        parentguid = pgid;
        status = ss;
        type = t;
        dt = d;
        lastaction = la;
        fop = FOP.NONE;
        owner = "";
        filehash = "";
        bytelength = 0;
        dbid = -1;
        versionflag=1;
        //mod = m;
    }
...
===example fileInfo java class===

Now, let me quick explain how to read through this table. Here I use update as an example, you can see I highlight it in light green the section in table as below.

  1. JBox find FileName is the same, then check FileContentHash.
  2. If FileContentHash is different, check Timestamp
  3. If Timestamp local metadata ( local (0) fileinfo java class ) is elder than snapshot metadata ( FS snapshot (1) fileinfo java class )
  4. JBox will identify the file is updated

Loca merge Snapshot => Local Merge
Local (0)vsFS Snapshot (1)TransactionNoteMemo
TimestampN(elder)Y(newer)
Update
FileNameYY
FileContentHashN(Different)Y
TimestampNY
New
FileNameNY
FileContentHashNY
Scan exist and CAN'T find same FCH
TimestampNY
FileNameNY
FileContentHashNY
Scan exist and
FOUND same FCH
"<="
Get OldFileName1
with same FCH
"=>"if Scan snapshot
and CAN'T find
same FileName
MoveStopMove from OldFileName1 => New FileName
"<="if Scan snapshot
and FOUND
same FileName
possible
be Copy
Copy from OldFileName1 => New FileName
Scan exist and FOUND same FCH
Get OldFileName2
with same FCH
"=>"if Scan snapshot
and CAN'T find
same FileName
MoveStopMove from OldFileName2 => New FileName
if Scan snapshot
and FOUND
same FileName
possible
be Copy
Copy from OldFileName2 => New FileName
End of Scan,
Can't find any
further
"<="StopCopy from OldFileName1 => New FileName
TimestampYN
Delete
FileNameYN
FileContentHashYN

Local merge(+) Snapshot => Local Merge"=>"Local merge(+) Server => Final Merge

After Local Merge ready then JBox will download the remote metadata on ObjectStorage Swift, then the merge from local to remote(server). The logic is as below table. 

PS: Because it's more difficult to identify when file is deleted by other clients or it's new in local, we have to leverage the reference counter from "Number of Clients" as we mentioned in previous section.

Loca merge Server => Final Merge
Local Merge (2)vsServer (3)TransactionNoteMemo
TimestampN(elder)Y(newer)
Update Local
FileNameYY
FileContentHashN(Different)Y
TimestampY(newer)N(elder)
Update Remote
FileNameYY
FileContentHashN(Different)Y
TimestampNY
New Local
FileNameNY
FileContentHashNY
TimestampYN
New Remote
FileNameYN
FileContentHashYN
TimestampN(elder)Y(newer)
Update Local
Remote Overwrite Local
FileNameYY
FileContentHashYY
StatusMoveUpdate File Level Metadata only
TimestampY(newer)N(elder)
Update Remote
FileNameYY
FileContentHashYY
StatusMoveUpdate File Level Metadata only
TimestampN(elder)Y(newer)
Update Local
FileNameYY
FileContentHashYYRemote Overwrite Local
StatusCopyUpdate File Level Metadata only
TimestampY(newer)N(elder)
Update Remote
FileNameYY
FileContentHashYYNew Chunk Level Metadata
StatusCopyUpdate File Level Metadata
TimestampYY
Delete Local
FileNameYY
FileContentHashYY
NoClientnot 0After delete local then NoClient "-1"
TimestampNY
Delete Local and Remote
FileNameNY
FileContentHashNY
NoClientset NoClientlocal was deleted
then set NoClient = default no e.g. 2

Copy On Write(CoW)

JBox leverage CoW to maintain FLM been update only by one client a time which avoid record been overwritten by other client when one client is still updating.

e.g.:
Client 1 -- > update metadata for rename file 1 to file 2
Client 2 -- > update metadata for file 1 changed.
Client 2 -- > update done.
Client 1 --> update done.
Then metadata for the client 2 was doing will be overwrite by client 1.

Here is the diagram to show how CoW working in JBox, but I think it's very general and I won't explain too much here.


Prove of Concept Experiment

In order to compare with other non-archival Cloud Storage Service on market to verify file sync feature whether JBox can gain any advantage regarding above design logic. I did a POC experiment via Upload/Update/Copy/Rename/Move/Delete 10946kb random write file against with different Cloud Storage Services. I leverage wireshark to monitor network bandwidth and transaction time and summarize as below.

In POC, JBox uses native OpenStack Swift and KeyStone API, resulting in transaction time and size that is comparable to other non-archival Cloud Storage Services on market. The feasibility of the JBox works well in POC.

Here is the POC result, as you can see I present it into two dimensions which is Size ( transaction bandwidth ) and Time ( transaction time )


Size ( transaction bandwidth )

BytesNewUpdateCopyMoveDelete
JBox2176432877332433720485
DropBox26236244531234131841500
GoogleDrive22780802285496229236634442048
OneDrive11846050118485691184182761061500


As you can see above, JBox cost fewest transaction size, but when update happen ( in POC, I appended "Test" at the end of file ). since JBox is chunk base which need to be the base element for deduplication. Chunk base has chunk boundary limitation which are between 222822.4 ~ 524288 bytes in my case, even it's the last chunk thus 87733 smaller than 222822.4. JBox can't compete with Dropbox since Dropbox adopt Rolling Checksum with fine-grain delta sync.

Other than update case, JBox leads in rest of cases.

Something interesting I observe, GoogleDrive and OneDrive re-upload in update and copy case. OneDrive doesn't compress the file before upload.


Time ( transaction time )

TimeNewUpdateCopyMoveDelete
JBox1222317120882264256
DropBox28434218851387313760256
GoogleDrive22780836741311295221680256
OneDrive2013922262459861247713256


Time is very tricky in the experiment since there has too many factors will affect the transaction time such as bandwidth in share pipeline, available cache for current running process/thread and other running applications were taking the CPU loading in OS. Thus you can see the time in table and chart is more fluctuate. Even though, we still can notice some interesting comparisons in transaction time.

As you can see in above table and chart, JBox costs shorter time than other Cloud Storage Services since I think JBox leverages KeyStone totally and didn't encrypt the chunk which avoid extra hand-shaking during the transactions. And compressed chunks before upload does help to expedite the transactions.

Sum of above

In Sum, using above logic, JBox can watch the file system via "merge" metadata and identify the file system operations w/o reference extra library in code. With 2-Tier metadata structure, JBox can also execute the file sync between the clients, then eventually backup a newest copy in ObjectStorage, Swift. I think it's a light-weight, easy to setup and fully Open Source tool. I has mentioned the GitHub link above or you can find source code in JBox Git project in the first reference link below. Feel free to share your thought and welcome to join the development with me.

Quote for Sharing:
Ever Tried, Ever Failed. No matter. 
Try Again. Fail Again. Fail better.
- Samuel Beckett ( Tattooed on Stan Wawrinka's left forearm )


Reference: