Opendedup first impression

14. August 2011 10:17 by Ron in   //  Tags:   //   Comments

Usually I show some findings about gadgets I bought, mostly from China, but this time it is a software review..

 

Some time ago I found the Opendedup project on http://opendedup.org/. The software package is called SDFS, SDFS is java based and runs on Linux or Windows (both 64-bit).

Some notes from the Opendup website:

Using deduplication has two big advantages over a normal file system.

  • Reduced Storage Allocation - Deduplication can reduce storage needs by up to 90%-95% for files such VMDKs and backups. Basically situations where you are storing a lot of redundant data can see huge benefits.
  • Efficient Volume Replication - Since only unique data is written disk, only those blocks need to be replicated. This can reduce traffic for replicating data by 90%-95% depending on the application.

Using SDFS adds a couple additional advantages based on its achitecture.

  • Scalability : SDSF can store huge amounts of data (over a Petabyte) and can deduplicated at block sizes as small as 4k.
  • IO Performance : SDFS can be setup in a Redundant Array of Inexpensive Nodes (RAIN) configuration. In a RAIN configuration SDFS can stream reads and writes, in parallel, across multiple nodes allowing for huge IO performance increases.

I have tested SDFS on CentOS 6.0 because the needed Fuse 2.8 is supported on this..

I used a virtual machine on VMware with a 20 GB disk for the OS and mounted an extra 135 GB disk to /opt, SDFS stores the dedupe chuncks in /opt/sdfs/volumes/<volumename>

The needed packages; Opendup Binaries and Java JDK 7
Just download them to an empty directory and run “yum install jdk-7-linux-x64.rpm SDFS-1.0.7-2.x86_64.rpm” all needed dependencies will be installed.

After installation you can create a volume with the following command; “mkfs.sdfs --volume-name=sdfs_vol1 --volume-capacity=100GB” you could change the block size, I tested the default 128k, 64k and 4k blocksize but with the 4k blocksize java crashes.

After creation you can mount the virtual volume to any path you want but it has to be empty. I mounted it to the /home directory; “mount.sdfs -v sdfs_vol1 –m /home

All files stored in /home/<user> directory is inline deduplicated.

To see what the status is of your volumes use; “sdfscli --volume-info” it shows the deduplication ratio etc.

Something not mentioned on the opendedup website is that when you copy a large amount of files to the virtual volume java could encounter a “too many open files” error and your chucks can get corrupted. To fix this problem I edited /etc/security/limits.conf and added the following 2 lines:

  1. * soft nofile 4096
  2. * hard nofile 10240

This changes the default max open files amount.

I added the line “mount.sdfs -v sdfs_vol1 -m /home/” to the /etc/rc.local file to make sure the virtual volume is mounted at every boot.

If you follow above steps you should be able to build a storage device with a dedupe filesystem, I added SAMBA-Server myself to use the dedupe filesystem as a Windows fileserver..

 

Some stats:

[root@backup-01 ~]#  sdfscli --dse-info
DSE Max Size : 136 GB
DSE Current Size : 15.2 GB
DSE Percent Full : 11.17%
DSE Page Size : 131072
DSE Blocks Available for Reuse : 0

[root@backup-01 ~]#  sdfscli --volume-info
Volume Capacity : 135 GB
Volume Current Size : 21.8 GB
Volume Max Percentage Full : Unlimited
Volume Duplicate Data Written : 25.7 GB
Volume Unique Data Written: 432.6 MB
Volume Data Read : 9.1 KB
Volume Virtual Dedup Rate (Dup/Total Bytes Written) : 98.38%
Volume Real Dedup Rate (DSE Size/Total Bytes Written) : 41.73%
Volume Actual Storage Savings (Unique Blocks Stored/Current Size) : 30.26%

As you can see I have a 135 GB volume but the stats are a bit confusing Smile