Diversification is a favorable word for anyone in finance management. And as data becomes more valuable, the same can be said for the cloud—at least according to researchers in Beijing whose new multi-cloud storage framework could improve storage reliability and reduce spending on cloud services by 20 percent.

The scheme is called “CHARM” (cost-efficient data hosting scheme with high availability in heterogenous multi-cloud). If applied computationally, it could help companies—especially ones that manage large amounts of unstructured data, such as e-commerce retailers—skip the difficult and confusing process of choosing a cloud vendor. In other words, it offers to automatically leverage the best aspects of multiple commercial clouds, at the lowest cost.

The coding constantly chooses the best pricing combinations from best-in-class cloud vendors. To us that’s the definition of optimizing.

The proposed strategy is a welcome departure from the one-and-done approach companies typically take when selecting cloud vendors. As data grows and attitudes toward information-exchange shift to favor speed and volume rather than privacy, organizations are increasingly turning to cloud services such as Amazon S3, Windows Azure or Google Cloud Storage. These services are fine, but choosing just one leads to what the researchers describe as “vendor lock-in risk.” This means companies are at the mercy of rapidly evolving cloud business models that often experience abrupt price fluctuations, technical issues, and unexpected bankruptcy (one example is Nirvanix, which suddenly shut its doors in September 2013 despite having thousands of customers including Top 500 organizations). Furthermore, once a company has committed to a cloud vendor, it’s extremely expensive to migrate data over to another.

The researchers’ solution to cloud vendor lock-in? Build a heterogeneous cloud that intelligently distributes data into multiple clouds, diversifying risk and creating an automated ability to go with the most cost-efficient portfolio. The concept of multi-cloud hosting isn’t new to academia, but the researchers felt it needed to be streamlined into a more financially practical model for real-world business and IT managers, who often struggle to determine the best cloud-hosting strategy in a fiercely competitive and unstable market.

They built CHARM by combining replication and erasure coding into a single architecture, and designing an algorithm which chooses optimal storage modes based on pricing and data access patterns. The scheme is made to continually monitor changing cloud vendor policies to adaptively trigger data storage migration, as necessary. Additionally, it’s designed to meet service providers’ availability requirements, and it guarantees data redundancy in the event one cloud fails.

CHARM

CHARM Architecture (R is replication, E is erasure coding)

“Although our algorithm hasn’t been implemented commercially, a company could easily implement our scheme and input their data into the system for high availability and cost-efficiency,” said Quanlu Zhang, lead researcher. “Beyond inherently ensuring redundancy and high reliability for data access, the coding constantly chooses the best pricing combinations from best-in-class cloud vendors. To us that’s the definition of optimizing.”

Of course, a holistic storage system involves several other factors beyond cost and data availability requirements, such as cache strategies, geographical data consistency, etc., and the team plans to address a more complete storage ecosystem in future work.

For now, CHARM is on a promising track. The team evaluated the scheme by conducting trade-driven simulations and prototype experiments. They replayed model samples for an entire month involving four mainstream commercial clouds: Amazon S3, Windows Azure, Google Cloud Storage and Aliyun OSS. The results showed that CHARM can save about 20 percent of monetary cost associated with cloud services, and exhibits sound adaptability to data and price adjustments. A charming prospect, indeed.

Read more about multi-cloud data storage in IEEE Xplore.