Reduce Storage Costs with Erasure Coding!
April 11, 2017
In digital storage environments, there has always been a tradeoff between storage capacity, costs, and data reliability. The challenge has always been to make data highly available while keeping costs low and performance high. Various Redundant Array of Independent Disks (RAID) implementations were developed to offer users a choice of designs based on their individual storage requirements. Each design has pros and cons, however, and one size does not fit all.
RAID 1, for example, writes 2 copies of the data to two separate disks. This is often referred to as mirroring. The advantages of RAID 1 include high availability (since the data exists on two disks), and good read performance (since applications can read the data from either disk at the same time). Data recovery is also very quick because even if the data is corrupted on one disk, it will most likely be accessible on the other disk without the need for a data repair computation. The disadvantages of RAID 1 are storage capacity and related costs–two copies of data require twice as much storage capacity and write performance, since data must be written twice to complete a write operation.
Other RAID designs spread data across multiple disks, commonly referred to as striping. Striped data also includes one or two parity bits written to one or more disks in the storage cluster. Parity bits are used for error detection and help determine if data is missing or has been corrupted, and they can be used to help reconstruct corrupted data. RAID levels 5 and 6 are commonly used implementations of striping and parity. RAID 5 and RAID 6 are also forms of erasure coding implemented in hardware. This is less flexible than erasure coding implemented in software, which is offered by Virtuozzo.
RAID 1+0 or RAID 10 combines disk mirroring and disk striping to protect data. It’s a good option for I/O intensive workloads since it provides both redundancy and performance.
The initial version of Virtuozzo Storage allowed users to define one or more replicas for data redundancy. Depending on the size of the data written, and if the data needs to be divided up into multiple chunks, this form of replication most closely matches RAID 1 or RAID 10. Assuming two replicas, if a client wrote data that is smaller than the maximum chunk size, then the data would be written to one node and then replicated to another node. This resembles a RAID 1 design. If the data was larger than the maximum chunk size, then the data would be broken up into multiple smaller chunks, and it would be written to multiple nodes. Replicas of those smaller chunks would then be written to other nodes. This closely resembles a RAID 10 design.
Erasure Coding Comes with Virtuozzo Storage
Today, Virtuozzo Storage features support for erasure coding. This functionality was added in response to our customers’ requests for more choices in how they store their data. There is also a forward-thinking aspect to the feature, allowing further development of asynchronous data replication across data center locations. Still, the primary benefit of erasure coding data protection is that, compared to replicas, it reduces the overall storage capacity needed protect the amount of data, which helps you reduce costs.
Virtuozzo Storage customers like the fact that multiple copies of their data are evenly distributed across multiple nodes in the storage cluster, and that the data is highly protected and available. If a storage node or drive fails, data will still be available to the application via the second, third, fourth copies, etc. Users have the capability to define how many replicas or copies of data they want to exist for a particular segment of data. This is just like it is in a RAID 1 mirroring implementation, however there is a tradeoff between redundancy/availability and cost and performance. Having two replicas within a storage cluster means that twice as much data storage capacity must be purchased and configured, and the capacity overhead is 100%. If a particular workload required 500TB of usable storage capacity, then 1,000TB must be purchased and configured across the nodes in the storage cluster. Maintaining three replicas increases the capacity overhead by 200% and so on... Availability increases, but so does capacity and costs. Erasure coding offers another option to customers by allowing them to define their own individual erasure coding configuration–the best combination of redundancy/performance/cost can be chosen for their particular workloads.
The Erasure coding configuration can be described with the formula M+N[/X], where M is the number of data blocks (stripe-depth), N is the number of parity blocks, and X is the write-tolerance (how many storage nodes can be down provided that the client is still allowed to write a file).
The minimum number of nodes supported in Virtuozzo storage is 5 (M=3).
Using a 5 + 2 erasure coding design as an example, the capacity overhead amount in a Virtuozzo Storage cluster is reduced to just 40% (2GB of parity data for every 5GB of application data).
That means that for the same workload that requires 100TB of usable storage, only 140TB of raw storage capacity must be allocated. This reduction in raw storage capacity can help you lower costs and handle budget constraints, or allow enough storage capacity to be allocated within storage cluster environments where total storage capacity is limited by the number of storage nodes, drive slots, or drive capacity sizes. High availability is also maintained because in a 5 data elements+2 erasure code elements coding implementation, if any two elements are lost, the remaining elements can rebuild the data without the application experiencing an interruption or data loss.
Another advantage of software erasure coding in Virtuozzo is that while RAID unites disks into RAID volumes, Virtuozzo erasure coding unites files. This makes it possible to set various erasure coding schemas and replication for various files on top of one cluster (set of disks), balancing redundancy and performance requirements applicable for that data.
The following table shows the replication options available in Virtuozzo Storage. In a larger cluster, it is possible to decrease the overhead of the redundancy very significantly. For example, a 17+3 erasure coding design uses just 18% storage overhead. Virtuozzo Storage users have the option of using replicas or erasure coding for data protection without reconfiguring their storage clusters.
|Redundancy mode||Minimum number of nodes required||How many nodes can fail without data loss||Storage overhead, %||Raw space required to store 100GB of data|
Erasure Coding Features Infinite Journaling
One of the key technology benefits of erasure coding is that it’s design and implementation makes use of infinite journaling. During a write operation, existing data is not overwritten. Instead, the changes are appended to the end of a log file. Infinite journaling allows the writes to be repeated if necessary. This feature is especially useful for asynchronous replication, because writes can be shipped over slower links without losing storage consistency or waiting for a response from the remote side to indicate the write completion. Therefore, the introduction of this scheme in Virtuozzo Storage opens some promising opportunities for further product development, such as data replication to remote datacenters. In the erasure coding scheme, the performance may be a bit slower, however greater storage efficiency without the loss of redundancy makes it a perfect fit for storing data where top level performance is less of a concern.
Erasure Coding - The Pros and Cons
A drawback of erasure coding designs is that they are a bit more CPU-intensive during data rebuild operations. This is because the data must be rebuilt or calculated from the surviving elements, as opposed to just a simple read of a data replica. As a result, Virtuozzo Storage recommends reserving some CPU capacity (about 1 CPU core per 4 hard drives) for storage related services.
When choosing between proper redundancy schemes for your data, take the following into consideration:
- Erasure coding is more efficient (in terms of required raw capacity) than RAID or replication. Greater storage efficiency = less overall storage capacity = lower overall costs.
- Erasure coding can be configured to be more resilient than RAID or replication, with less underlying storage required.
- Erasure coding offers greater flexibility/choice in redundancy levels.
- Erasure coding is better suited for large, fairly static data sets.
- Files located on the storage using erasure coding support “punch hole” operations (discarding the area in the middle of a file to free up available disk space). Virtuozzo VMs use it for online compacting of virtual machine disks, performing this operation automatically when data is removed from inside the VMs.
- Erasure coding is more CPU-intensive than replication, which can lead to slower performance when CPU resources are low.
- Erasure coding may require more nodes in a cluster to fully benefit from it.
Examples of the data types where erasure coding is a good fit include:
- VM and container volumes storing data which does not change very often, and/or has moderate expectations to IO performance (file servers, not highly loaded web servers, etc)
- Scenarios where storage cost is more important than performance
- Cold data such as backup or archives
Storage with erasure coding can be a better and more efficient alternative over traditional RAID for data protection and recovery. With our latest release of Virtuozzo Storage, you have more choice than ever for moving to software-defined storage – whether integrating as part of a hyperconverged infrastructure solution, using it to provide storage for Docker, Kubernetes and Rancher containers, or using it with leading virtualization solutions. This increased choice can help you lower overall your storage costs and move to a modern infrastructure today.
To learn more about Virtuozzo Strorage, click here.