A Method For Discovering And Downloading Hidden Web Content

Tech ID: 29989 / UC Case 2004-656-0

Summary

Researchers in the Computer Science Department at UCLA have developed a method for searching hidden web content that has previously been difficult to gather for the end user.

Background

Current internet search engines are limited in their ability to search through web-based databases that are only accessible by directly querying them. Typical search engines like Google use a crawling system to search web content.  This means that the search engine recursively explores webpage links to discover more links from subsequent webpages until a condition is met.  Although other search engines and algorithms can query these databases, they do not actually download the database content that the end user wants.  The ability to download this hidden web content can have high value for companies like personalized web searching, creation of web directories, integration of hidden content in e-commerce stores, and providing key information to make business decisions.

Innovation

Researchers in the Computer Science Department at UCLA have developed a method for searching hidden web content that has previously been difficult to gather for the end user.  They have designed a system for searching the internet that is able to interact with web-based databases by automatically generating queries for their search pages.  Compared to other web-based databases searches, this invention will be able to download the content of the databases as opposed to just getting a summary of the databases or identifying their structure.  It can also use its previous queries or searches to generate a query that will return the most relevant content.

Applications

  • Internet information searching

Advantages

  • Automatic generation of queries for web databases with no human interaction
  • Generated queries are more effective in discovering content within the web databases
  • Robust system design can handle partial list returns from the web databases
  • Maximizes amount of downloadable web content and uses resources efficiently

Patent Status

Country Type Number Dated Case
United States Of America Issued Patent 7,685,112 03/23/2010 2004-656
 

Related Materials

Contact

Learn About UC TechAlerts - Save Searches and receive new technology matches

Inventors

  • Ntoulas, Alexandros

Other Information

Keywords

searching, search engine, Google, databases, query, algorithm

Categorized As