More efficiently search and compare strings for approximate matching according to so-called edit distance, allowing for much more efficient search for applications.
About
Summary: UCLA researchers have developed a new approach to more efficiently search and compare strings for approximate matching according to so-called edit distance, allowing for much more efficient search for applications such as bioinformatics, music and video search, and data backup and retrieval. Background: Many data-intensive applications require computationally intensive algorithms for approximate string matching. Examples include text editors, database archiving, internet search-engines, and bioinformatics applications. For example, sequences of DNA or proteins are routinely searched against one another to determine biological similarity. The edit distance between two strings, the minimum number of character changes, insertions and deletes to map from one string to another, is usually hailed as the one of the best measures for accuracy. Unfortunately, calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient.Many heuristic algorithms such as BLAST and FASTA have been developed to overcome this inefficiency. However, the innovation disclosed here provides a faster way to handle edit distance (by transforming into a much simpler form) thus potentially speeding up a host of applications that need approximate matching using the edit distance. Innovation: UCLA researchers have developed a novel method for more efficiently searching and comparing sequences. The innovation lies in a method for mapping the sequences to a new set of strings, whose similarity can be compared using hamming distances instead of edit distances. The hamming distance similarity between these new strings is proportional the similarity of the original sequences, and more importantly, there are already existing far more efficient algorithms that can search and index strings using hamming distances. Applications: More efficient data and string searching. Potentially faster music or video search engines. Improved data archiving routines. Shorter algorithm running times for proteomic and genomic searches. Potentially faster and more accurate ways to match internet viewable pages to appropriate advertisement pages that appear on the same page. Other biocomputing applications that require fast approximate matching. National security applications, where approximate patterns need to be discovered. State of Development: The mathematical method has been proven, and accepted by Symposium on Theory of Computing (STOC) 2005. Provisional patent has been filed.ABOUT THE LAB This innovation was created by the researchers associated with the Center for Information and Computation Security (CICS) at UCLA which is involved in a wide range of research involving cryptography and computer security. The web site for the lab is http://www.cs.ucla.edu/security/INVENTORS Dr. Rafail Ostrovsky is a Professor in the Department of Computer Science, UCLAs Henry Samueli School of Engineering and Applied Science, and is the director of the Center For Information And Computation Security (CICS) at UCLA. - See more at: https://techtransfer.universityofcalifornia.edu/NCD/20150.html#sthash.00QO04nR.dpuf