Accelerate Your Backups with Erasure Coding

October 12, 2017

Erasure coding (EC) is a very effective method for protecting data whereby data is broken into fragments, which are then expanded and encoded with redundant data pieces and stored across a set of nodes/servers. One of the main benefits of EC is that it allows you to save a lot of physical space while still protecting your data against loss or disasters. However, many users have yet to use EC due to a perceived negative impact that it will have on performance. Specifically, when compared to a mirroring scheme, EC may show decreased performance, however the impact depends on specific use cases. Through a series of tests here at Virtuozzo, we have proven that when using erasure coding for backup scenarios, it actually outperforms mirroring and can deliver numerous advantages over traditional backup approaches.

Even though software defined storage has built-in software redundancy, when you are writing a file, your storage will split the file into a set of chunks and each chunk will be replicated across the cluster. Depending on your settings, the system will always maintain 2 or 3 replicas of your data – this is a common best practice for data protection. As a result, you’ll have at least two copies of data in your local cluster. But do you need to use 2-3 times more physical space to provide the same level of redundancy? With erasure coding, the answer is no. You can get the same redundancy level with significantly less overhead. For example, consider a 10 + 2 encoding scheme. In this scenario, every ten chunks of a user’s data storage will have only two “checksums.” As a result, instead of 200% overhead (2 extra copies) you will have only 20% overhead (2/10), reducing your extra physical disk space usage by at least ten times.

On the other hand, erasure coding is a bit more CPU-intensive during data rebuild operations. This is because the data must be rebuilt or calculated from the surviving elements, as opposed to just a simple read of a data replica. Virtuozzo Storage uses highly optimized erasure coding processes in low level assembler language utilizing SSE CPU instructions. One CPU core can process 2-4 Gb/s of traffic flow into an erasure coding scheme. Therefore, you have minimal additional CPU load per client node, typically just few percentage points.

Back-Up Scenario

In the case of sequential writes, erasure coding also performs much better than mirroring. In this case, the total amount of data internally written matters. EC will internally write 1.2x compared to the original file, as compared to mirroring which will write 3x. In this instance, network performance is critical, and EC will show a much better result. As you can see below, erasure coding presents a faster way to replicate data to protect against loss and requires less physical data. The following graphs represent the results of our testing of EC for common backup scenarios:

Cluster #1: 6 nodes
42 chunk servers each CS over single HDD 7200 RPM
Journals over NVME (32 GB per chunk = 224 GB per node)
Client cache over NVME (64 GB per mount point)
Replicas 3:2

For a bigger cluster, the results are even better.

Cluster #2: 12 nodes
132 chunk servers each SAS 15K RPM
Journals over NVME (16 GB per chunk = 176 GB per node)
Client cache over NVME (64 GB per mount point)
Replicas 3:2

For the use case above, erasure coding provides both increased cost efficiency and decreases the time required time for backups.

Get Erasure Coding with Virtuozzo Storage – It’s Already Included!

Despite the obvious advantages, erasure coding is still a premium feature for many vendors. However, Virtuozzo Storage makes EC is available to all customers at no extra charge. As a Virtuozzo Storage customer, you can use it today and optimize your backup processes immediately.

Here are some general recommendations:

  • Plan your backup storage tiers in advance, including sizing, drives, etc. You can find more more detailed information on this process here in Virtuozzo docs.
  • Use erasure coding to improve backup time and efficiency – it’s easy!
  • Plan to move your backup to a remote site or the cloud using S3 object storage to simplify the data transfer. You can even replicate buckets to a remote cluster in a different region.

To learn more about Virtuozzo Storage, get a demo, or sign up for free trial, click here.

Get more information today!