|
|
Metis focused Crawler
The system allows intelligent, ontology-focused
discovery of distributed RDF-based metadata and Internet documents
in parallel.
In its classical sense a crawler is a program that retrieves Web
pages by wandering around the Internet following one link to
another, an approach commonly used by search engines. Focused
crawling is a technique which is able to crawl particular topical
portions of the World Wide Web quickly and efficiently by
following only the most interesting links and not having to
explore all Web pages.
The developed focused crawler uses the knowledge provided by an
instantiated ontology, which will be defined. This knowledge
enables the crawler to get a better notion of interesting links
which it can follow, by setting up sets of relevant entities and
matching these against the Web documents. Matching is done both on
basis of RDF-metadata and natural speech words, with different
matching measures. The approach focuses mainly on the content of
the documents, in contrast to other work utilizing the Internet
connectivity structure. Another key point is the integration into
an ontology-based system allowing maintenance and extension of the
ontology. The system was implemented to allow an empirical
evaluation of the framework.
|