|
Metis focused Crawler
The system allows intelligent, ontology-focused discovery of distributed RDF-based metadata and Internet documents in parallel.
In its classical sense a crawler is a program that retrieves Web pages by wandering around the Internet following one link to another, an approach commonly used by search engines. Focused crawling is a technique which is able to crawl particular topical portions of the World Wide Web quickly and efficiently by following only the most interesting links and not having to explore all Web pages.
The developed focused crawler uses the knowledge provided by an instantiated ontology, which will be defined. This knowledge enables the crawler to get a better notion of interesting links which it can follow, by setting up sets of relevant entities and matching these against the Web documents. Matching is done both on basis of RDF-metadata and natural speech words, with different matching measures. The approach focuses mainly on the content of the documents, in contrast to other work utilizing the Internet connectivity structure. Another key point is the integration into an ontology-based system allowing maintenance and extension of the ontology. The system was implemented to allow an empirical evaluation of the framework.
|