Cost-Efficient Repair For Cloud Storage Systems Using Progressive Engagement

Tech ID: 24805 / UC Case 2014-611-0

Brief Description

The technology is a coding process which facilitates efficient data failure recovery in cloud storage systems.It features greater flexibility in choosing subset of storage nodes for recovery and reduces amount of data that must be transferred upon recovery.

Full Description

Researchers at UCI have developed a new process that optimizes maximum distance separable (MDS) codes by examining two major factors that contribute to the cost of data recovery: accessing cost, or the cost associated with accessing selected nodes and repair bandwidth cost or the cost associated with the amount of data that must be downloaded from surviving nodes to repair a failure.

The MDS coding technique proposed by UCI researchers functions with both aforementioned cost metrics in mind and contains a property the researchers coin “progressive engagement”. This property provides greater flexibility in engaging more surviving nodes in favor of reducing the repair bandwidth cost without redesigning the code structure and changing the content of existing nodes, making this type of MDS code a suitable solution for data recovery in dynamic cloud storage systems.

An MDS code is said to have “progressive engagement” if it:

  1. Provides flexibility in engaging any number of surviving nodes, without redesigning the code structure and changing the content of other nodes.

  2. Requires less repair bandwidth by engaging more surviving nodes. Traditional MDS codes fail to satisfy the second condition of the definition above, while other recently proposed MDS codes that yield minimum repair bandwidth fail to meet the first condition.

This new MDS coding technique can be utilized as part of a new coding structure or to modify the recovering scheme of existing MDS code to incorporate the “progressive engagement” property defined above.

Fast and efficient data recovery is a challenge for cloud storage systems with a large number of storage nodes (or disks).Currently, cloud storage systems utilize erasure codes for data recovery. Erasure codes provide high failure tolerance with relatively low redundancy, leading to a more energy efficient file storage system than mirroring, which works well on smaller, local storage systems.Among erasure codes, maximum distance separable (MDS) codes are optimal in terms of redundancy-reliability trade-off.

When a failure occurs in storage nodes, the lost data must be recovered and regenerated at a new node in order to maintain data integrity.The amount of data that must be downloaded from a set of surviving nodes to repair a single failure is called repair bandwidth.The downside of traditional MDS codes is that the entire data must be downloaded before recovering any lost data, resulting in high repair bandwidth. New MDS codes have been proposed to achieve minimum repair bandwidth, but the minimum possible repair bandwidth is achieved only when all the surviving nodes are engaged during data recovery.However, engaging all nodes can be too costly, if not infeasible, for many storage systems.Moreover, none of the current MDS codes provide flexibility in choosing the number of participating nodes without changing the code structure and content of the nodes. In other words, for these codes, the number of participating nodes as well as the total number of nodes must be known when designing the codes.This information may not be provided for dynamic storage system, where the number of available nodes is subject to change or the cost of accessing them can be too high.

This process solves many of the aforementioned issues. In addition to minimizing repair bandwidth, it can be reused when the number of participating nodes varies without having to change the entire code structure and content of the existing nodes while reducing the repair bandwidth by engaging more nodes.

Suggested uses

Data recovery in cloud file storage systems.


  • MDS code with progressive engagement can be reused without changing entire code structure and content of existing nodes.
  • Reduces the repair bandwidth by engaging more nodes
  • Allows for adding new storage nodes without changing content of existing nodes

Patent Status

Country Type Number Dated Case
United States Of America Published Application 20150303949 10/22/2015 2014-611

Lead Inventor

Hamid Jafarkhani
Department of Electrical Engineering and Computer Science
Henry Samueli School of Engineering
University of California, Irvine


Learn About UC TechAlerts - Save Searches and receive new technology matches


  • Hajiaghayi, Mahdi
  • Jafarkhani, Hamid

Other Information

Categorized As