Researchers in the UCLA Department of Computer Science have developed and reduced to practice algorithms and methods for obtaining information, primarily medical information, from free text sources, such as patient medical records.
A large number of medical information systems have emerged on the web with comprehensive coverage of medical literature and teaching materials. However, the search interfaces of these web sites hinder users from fully utilizing them in health care applications. Users must manually search for medical literature and teaching materials, which is labor-intensive and time-consuming. Content correlation allows for the automatic creation of semantic links among documents from different collections, which results in easier navigation between documents for the user. Compared to manual searching, content correlation provides a more convenient method for the user to efficiently navigate among various information sources. However, challenges exist in designing a content correlation engine, including vocabulary mismatch (the use of different expressions for the same concept) and the need for scenario-specific information.
The inventors have developed an approach for scenario-specific content correlation via a combination of phrase-based indexing and knowledge-based query expansion. The technique involve three innovations: (1) keyword extraction and indexing (2003-358), (2) query expansion (described here) and (3) phrase based vector space models of document retrieval (2003-510).
Query expansion is the technique wherein the user free form query cascades into further, more general or more detailed concepts using known word associations.For a search of “cancer cure,” the query engine automatically expands the query to“chemotherapy” or “radiation therapy” as well. This provides substantially stronger queries than simple “Google” type inverted files, in which there must be an exact word match.The algorithms use novel indexing structures for matching text to UMLS concept as well as novel storage and searching techniques.
This innovation takes a knowledge-based approach to scenario-specific content correlation by utilizing the Unified Medical Language System (UMLS) Metathesaurus as the knowledge source. Medical concepts such as “Surgery, Lung” have identifiers in the Thesaurus (in this case 38903).With proper indexing, query expansion techniques can be used to map query phrases to a variety of UMLS concept identifiers.
Query expansion has been implemented with superior results for document relevance and precision recall.
|United States Of America||Issued Patent||7,548,910||06/16/2009||2003-357|
keyword extraction, keyword indexing, query expansion, phrase-based vector space, document retrieval, unified medical language system, query phrase mapping, scenario-specific content correlation