Skip navigation

Ok. This is second part of EMC Celerra Unified, CLARiiON storage with Vmware – Deduplication and Compression Part 1. In my last post, I explained how EMC block compression works and where it will applies. In this post, I’m going to talk about another important part of EMC Celerra Unified storage technology. Celerra file level Deduplication& compression. AKA, Celerra Data deduplication.

Considering Celerra Unified storage is combination of Celerra (NAS) and CALRiiON (block storage, FC). The essential Celerra is actually focusing on NAS parts. It is not only has compression function(but it’s file level), but also has file level deduplication. Let’s talk about how it works.

The sequence of Celerra Data Deduplication is to look for cold data, compress first, then deduplicate later.

Please be aware although NAS is basically a file system. but Celerra is not exactly operating on file leve.  Let’s see how it works.

Celerra Compression

If you check out latest version of Celerra docs, you may notice Celerra compression is not longer a file compression. It’s actually just labeled a “compression”.

Plus, EMC used to claim that do not use deduplication on NFS for Vmware, but recentely, EMC just released the Celerra Vmware Plug-in which allow  you to compress single VM. First of all, let’s talk about “Initial compression and deduplication”

Initial and scheduled compression & deduplication

In the Celerra, the compression and deduplication  is done together. Celerra periodically scan file system and compress “cold” (not recently active files) and hash them into metadatabase. If there are same hash exist in the database, this file is deduplicated.

So the process is:

Compress->hash->compare meta database->Copy to hidden space of this LUN (or not)->Update meta database.

Reading Compressed file

Compress Read is not much difference from EMC CLARiiON read. Celerra directly load compressed data into memory and extract data in the memory. Nothing is writing back to disk. Therefore, in some cases, reading compression data is even faster than uncompression data but with some CPU cycles cost.

Writing to a compressed & deduplicated file

Writing a compressed file is a long procedure because it involves writing to disk. A write to or a modification of a deduplicated file cause a copy of the requested portion of the file data to be reduplicated for this particular instance while preserving the deduplicated data for the remaining references to this file.

The entire file is not decompressed and reduplicated on the disk until the sum of the number of individual changed blocks and the number of blocks in the corresponding deduplicated file are larger than the logical file size.

Compression with Vmware environment

Celerra Vmware Plug-in just be released not long ago. It can only work on NAS file system. It has “Thin provisioning”, “Fast/Full Clone”, “Compression/Decompression” features. Just like CLARiiON system, you can use this plug-in to off load those operations from host to SAN. One of operation is compression. It allows you to compress a VM regardless it’s think or thin disk. VMs that are selected for storage optimization are compressed by using the same compression algorithm that is used in Celerra Data Deduplication. The algorithm comresses the VMDK file (virtual disk) and leaves intact other files that make up the VMs. When a compressed VMDK is read, only the data blocks containing the actual data are decompressed. Likewise, when data is written to the VMDK, instead of compressing it “on the fly”, data is written to a set-aside buffer, which then gets compressed as a background task. In most situations, the amount of data read or written is a small percentage of total amount of data stored.  VM performance will typically be impacted maginally (less than 10%) when the corrsponding VMDK file is compressed.

I still think there are plenty of things EMC can do. Celerra plug – in is just a beginning. I will keep eye on it and post more later.

Advertisements

One Comment

  1. FYI, the EMC Celerra Plug-in for VMware was recently replaced with the EMC Unified Storage Plug-in for VMware. This new plugin includes varioud added features including VMFS/RDM storage provisioning on CLARiiON. See more in: http://blog.scottlowe.org/2010/10/16/emc-unified-storage-plugin-version-2-now-available/


One Trackback/Pingback

  1. […] This post was mentioned on Twitter by Mauro Ayala Oyanedel, Silver Chen. Silver Chen said: EMC Celerra Unified, CLARiiON storage with Vmware – Deduplication and Compression Part 2: http://wp.me/pVbEv-9i […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: