UCLA researchers have developed a new approach to more efficiently search and compare strings that approximately match, thus allowing for much more efficient search for applications.
Many data-intensive applications require computationally intensive algorithms for approximate string matching. Examples include text editors, database archiving, internet search-engines, and bioinformatics applications. For example, sequences of DNA or proteins are routinely searched against one another to determine biological similarity. The edit distance between two strings, the minimum number of character changes, insertions and deletes to map from one string to another, is usually hailed as the one of the best measures for accuracy. Unfortunately, calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient.
Many heuristic algorithms such as BLAST and FASTA have been developed to overcome this inefficiency. However, the innovation disclosed here provides a faster way to handle edit distance (by transforming into a much simpler form) thus potentially speeding up a host of applications that need approximate matching using the edit distance.
UCLA researchers have developed a new approach to more efficiently search and compare strings that approximately match, thus allowing for much more efficient search for applications. They created a novel method for more efficient searching and comparing of letter sequences. The innovation lies in a method for mapping the sequences to a new set of strings, whose similarity can be compared using hamming distances instead of edit distances. The hamming distance similarity between these new strings is proportional the similarity of the original sequences, and more importantly, there are already existing far more efficient algorithms that can search and index strings using hamming distances.
|United States Of America||Issued Patent||8,060,808||11/15/2011||2005-431|