Efficient Translation From Edit Distance To Hamming Distance, Allowing Faster And More Precise Analysis, Search And Indexing Of Data

Tech ID: 29990 / UC Case 2005-431-0

Summary

UCLA researchers have developed a new approach to more efficiently search and compare strings that approximately match, thus allowing for much more efficient search for applications.

Background

Many data-intensive applications require computationally intensive algorithms for approximate string matching.  Examples include text editors, database archiving, internet search-engines, and bioinformatics applications.  For example, sequences of DNA or proteins are routinely searched against one another to determine biological similarity.  The edit distance between two strings, the minimum number of character changes, insertions and deletes to map from one string to another, is usually hailed as the one of the best measures for accuracy.  Unfortunately, calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient.

Many heuristic algorithms such as BLAST and FASTA have been developed to overcome this inefficiency.  However, the innovation disclosed here provides a faster way to handle edit distance (by transforming into a much simpler form) thus potentially speeding up a host of applications that need approximate matching using the edit distance.

Innovation

UCLA researchers have developed a new approach to more efficiently search and compare strings that approximately match, thus allowing for much more efficient search for applications.  They created a novel method for more efficient searching and comparing of letter sequences.  The innovation lies in a method for mapping the sequences to a new set of strings, whose similarity can be compared using hamming distances instead of edit distances.  The hamming distance similarity between these new strings is proportional the similarity of the original sequences, and more importantly, there are already existing far more efficient algorithms that can search and index strings using hamming distances.

Applications

  • More efficient data and string searching
  • Potentially faster music or video search engines
  • Improved data archiving routines
  • Shorter algorithm running times for proteomic and genomic searches
  • Potentially faster and more accurate ways to match internet viewable pages to appropriate advertisement pages that appear on the same page
  • Other biocomputing applications that require fast approximate matching
  • National security applications, where approximate patterns need to be discovered

Advantages

  • More efficient than edit distance
  • More efficiently streamlined with other search algorithms

Patent Status

Country Type Number Dated Case
United States Of America Issued Patent 8,060,808 11/15/2011 2005-431
 

Contact

Learn About UC TechAlerts - Save Searches and receive new technology matches

Inventors

  • Ostrovsky, Rafail

Other Information

Keywords

bioinformatics, music search, video search, data archive, HIS

Categorized As