Skip navigation

Tag Archives: Compression


Ok. This is second part of EMC Celerra Unified, CLARiiON storage with Vmware – Deduplication and Compression Part 1. In my last post, I explained how EMC block compression works and where it will applies. In this post, I’m going to talk about another important part of EMC Celerra Unified storage technology. Celerra file level Deduplication& compression. AKA, Celerra Data deduplication.

Considering Celerra Unified storage is combination of Celerra (NAS) and CALRiiON (block storage, FC). The essential Celerra is actually focusing on NAS parts. It is not only has compression function(but it’s file level), but also has file level deduplication. Let’s talk about how it works.

The sequence of Celerra Data Deduplication is to look for cold data, compress first, then deduplicate later.

Please be aware although NAS is basically a file system. but Celerra is not exactly operating on file leve.  Let’s see how it works.

Celerra Compression

If you check out latest version of Celerra docs, you may notice Celerra compression is not longer a file compression. It’s actually just labeled a “compression”.

Plus, EMC used to claim that do not use deduplication on NFS for Vmware, but recentely, EMC just released the Celerra Vmware Plug-in which allow  you to compress single VM. First of all, let’s talk about “Initial compression and deduplication”

Initial and scheduled compression & deduplication

In the Celerra, the compression and deduplication  is done together. Celerra periodically scan file system and compress “cold” (not recently active files) and hash them into metadatabase. If there are same hash exist in the database, this file is deduplicated.

So the process is:

Compress->hash->compare meta database->Copy to hidden space of this LUN (or not)->Update meta database.

Reading Compressed file

Compress Read is not much difference from EMC CLARiiON read. Celerra directly load compressed data into memory and extract data in the memory. Nothing is writing back to disk. Therefore, in some cases, reading compression data is even faster than uncompression data but with some CPU cycles cost.

Writing to a compressed & deduplicated file

Writing a compressed file is a long procedure because it involves writing to disk. A write to or a modification of a deduplicated file cause a copy of the requested portion of the file data to be reduplicated for this particular instance while preserving the deduplicated data for the remaining references to this file.

The entire file is not decompressed and reduplicated on the disk until the sum of the number of individual changed blocks and the number of blocks in the corresponding deduplicated file are larger than the logical file size.

Compression with Vmware environment

Celerra Vmware Plug-in just be released not long ago. It can only work on NAS file system. It has “Thin provisioning”, “Fast/Full Clone”, “Compression/Decompression” features. Just like CLARiiON system, you can use this plug-in to off load those operations from host to SAN. One of operation is compression. It allows you to compress a VM regardless it’s think or thin disk. VMs that are selected for storage optimization are compressed by using the same compression algorithm that is used in Celerra Data Deduplication. The algorithm comresses the VMDK file (virtual disk) and leaves intact other files that make up the VMs. When a compressed VMDK is read, only the data blocks containing the actual data are decompressed. Likewise, when data is written to the VMDK, instead of compressing it “on the fly”, data is written to a set-aside buffer, which then gets compressed as a background task. In most situations, the amount of data read or written is a small percentage of total amount of data stored.  VM performance will typically be impacted maginally (less than 10%) when the corrsponding VMDK file is compressed.

I still think there are plenty of things EMC can do. Celerra plug – in is just a beginning. I will keep eye on it and post more later.

Advertisements

First of all, I need to point out that I ain’t work for EMC nor VMWARE. I won’t do anything like Chad in Virtual Geek to say, “Yes, it’s true. EMC really IS #1 for VMware.” What I’m going to to talk about is purely from a customer point of view. A tech who doesn’t favorite either EMC nor Netapps. I just want to draw a picture of EMC storage and Vmware related technologies in front of you so we can put everything on the table and discuss it.

As usual, any comments and discussions are very welcome here.

EMC Storage – Celerra, CLARiiON and Celerra Unifed Storage

Well, back in years ago, There are three production lines in the EMC storage. Celerra, CLARiiON and Symmetrix. After years of Virtualization development, the line between Celerra and CLARiiON is getting really blurry. Celerra used to dedicate to NAS file system. It provides NAS only. With CLARiiON(like CX3 series), it mainly focus on Block level storage like FC or iSCSI. Now, with new product line Celerra Unified storage coming out, I don’t really think anyone would buy old system any more. Because new Unified Storage provides both Celerra & CLARiiON in one box. EMC call them block enabled Celerra system (NS-120,480,960,etc). However, as you may know the technology using in NAS are quite different from technology using in the block storage. If you are as confused as me, please read rest of article and hope it can help you clear your mind. This article is focusing on Celerra Unified storage only.

EMC Deduplication and Compression

As everyone knows, one of key elements of storage is disk capacity. How to utilize disk space and tier down unused data and files and compress them to small space becomes the major reason when we select Storage. Even after years and years research, EMC still insists deduplication should only happens on file level instead of block level. so what does that mean to customers who bought Celerra Unified Storage? It means you can only use block compression if you use block storage (like FC, iSCSI, this is as part of CLARiiON technology) and you can only use file level Compression and file level deduplication when you use NAS (as part of Celerra, if you use NFS for VM or NFS,CIFS for file systems). In other word, how  you divide your LUNs and what kind of block or file system you use will dramatically impact your system. Let’s break down those technology and see what they are.

EMC compression

As what I mentioned above, depends on what kind of system (NAS or block) you use, Celerra will use different ways to compress the data. Let’s talk about block level compression first.

Block level compression

As what this name indicates, this compression should only work on FC or iSCSI LUNs. The block size compression works on is 64KB. Each block is processed independtly. The typical result of compression is as much as 2x while it use modest CPU resource.

Note:

In default, CX4-480 can have 8 concurrent threads on compression. When all threads running at same time, the consumption of CPU will be compression rate(speed) as Low (15% CPU), medium (30~50% CPU) and high(60%~80% CPU).

How block compression works

1.Initial compression- This occurs when you enable compression. It compress entire LUN and can’t be paused in the middle. But it can be disabled during the procedure. No damage will be done.

2. Compression of new data – When new data is written, it is written uncompressed and compressed asynchronously. It keeps doing that until you disable compression. In default, when 10% of user capacity of LUN or 10GB new data are written, and total amount new data is larger than 1GB, compress starts. It does use some of SP cache memory for swapping. When you compress a LUN, that LUN will automatically migrate to a thin LUN in different pool if LUN is a normal RAID lun. If it’s a think LUN, it will reminds in the same pool.

3. Decompression when compression is disabled- if the original LUN is a thin LUN, it will reminds thin LUN. If the original LUN is a thick LUN or RAID group LUN, it will write zeros to unallocated capacity till full while it reminds a thin LUN. System will pause at 90% and stop at 98% if the LUN has filled up too much.

Limits of block compression

  • The following cannot be compressed:
  • Private LUNs (including write intent logs, clone private LUNs, reserved LUNs, meta LUNs, and component LUNs)
  • Snapshot LUNs
  • Celerra iSCSI or file system LUNs ( Personally, I don’t think that’s right. I’m confirming with EMC now)
  • A LUN is already being migrated and expanding or shrinking.
  • A mirrored LUN replicating to a storage system running pre-29 FLARE code.

Interactive of compression with other functionalities

Basically, a compressed LUN is transparent to other operations like replication or migration. But by saying that, it’s better not migrating or copying while compress at same time. It’s always easy for SAN to enable compress after migration.

How to setup compression?

All what you need to do is to connect Celerra Unified storage control station with your Internet Browser. You will have Unisphere Manager running directly from SAN or you can install Unified Manager on a windows server and connect to your box. Compression function is a licensed feature and you should have it directly from console. Unlike Celerra NAS part, there is no VMWARE plug-in availabe for compression so you need to use Unisphere to do the job.

There is no Vmware plug-in?

It is very interesting that Celerra Unified NAS part got a vmware plug-in while CLARiiON reminds nothing. I reckon vSphere may use VAAI API to offload clone from host to SAN but why it doesn’t work for Celerra NAS part? If anyone can answer this question, it will be appreciated.

To be continued……

Reference: