Metis Crawler 0.2a
Quickstart -
Installation
Download the appropriate installer for your operating system from the Metis Website http://www.ontoware.org/projects/metis and run it like you would run any other application.
If you would like to build your own version the sourcecode will be made available at a later time.
Usage

When you start the application you will see the empty KAON Workbench.
The first thing you will have to do is open an Ontology that will be used for the weighting of the crawlers hits.
To do this go to the File menu and select the „Open OIModel“ Entry. You will now be asked to select the KAON file containing the Ontology. You should get an OIModeler Frame looking something like this:

The upper left window shows an tree view of the ontology. To expand the nodes you have to click on the small icons of the nodes. You´ll get something like this:

By left clicking on an entity in this view you can mark it as selected and the crawler can later get this selection.
Next you have to create a new Ontology that will contain the temporary data created while crawling. Simply select „Create new OIModel“ from the File menu.

Click
the „Browse“ button to bring up a selection dialogue
and either create a new KAON file or select a previously created
one.
!Attention! This file must be different from the first selected Ontology because it would be messed up pretty bad otherwise.
Now your set to go, select „New Crawler Window“ from the „Metis“
Menu.
You will be greeted by a screen similar to this

There are a couple of things to do here
Tell Metis what Ontology to use for weighting
To do this click on the selector „OIModel for ranking“ and select the Ontology you opened earlier
![]()
Tell Metis what Ontology to use for saving data
To do this click on the selector „OIModel for data“ and select the Ontology you created earlier
![]()
Tell Metis what entity for which to crawl and the weight to use
To do this click on the „Get selected Entities“ Button. This will query the Ontology specified by the „OIModel for ranking“ for its selected entities and display them in the table at the bottom of the frame.

The „Set all to...“ button will set the weight for all entities in the table to the value to its right. Alternatively you can also specify individual weights by editing the appropriate value in the table.
When everything is set up to your satisfaction you may start the crawler by pressing the „Start Crawler“. The „Show Results“ button will open another tab with an table showing the crawl results ordered by relevance. If you want to view a page from this table you can right click on the appropriate entry in the first column to get an context menu allowing you to open this URL in a web browser.
If you want to save the results you may click on the „Save Results“ button.
The table will then be saved in CSV- text format.
The „Options“ tab will present you with this view:

Here you can configure the behaviour of the crawler. All information is stored in Hydra-XML format, by default Metis loads the file „Metis_Config.xml“, but by using the buttons at the bottom you may load and store different configurations.
Any changes you make to the configuration will only be applied if you click on the „Apply Changes“ button.
The value(s) used by Metis for the selected option is displayed in the lower right window.
The upper right window shows the currently available values stored in the configuration file.
By clicking the „New...“ button you may add a new value to the list.
By highlighting a value from the list you may edit or delete it from the list by clicking the appropriate button.
You may select the highlighted value to be the active value by pressing the „Select“ button. (Sometimes multiple values may be selected by highlighting more than one by left clicking on the list entries while holding down the „Strg/Crtl“ key)
Some options explained:
StartURLs
Here you can edit and select the URLs from which the crawling shall start. (multiple selections allowed)
DeniedPatterns
This specifies patterns to be excluded from crawling (e.g. cgi-scripts) (multiple selections allowed)
ProxyHost/Port
Here you may specify the URL and Port of a proxy to use
Delay
specifies the time (in ms) to wait for the reply form a web page
StayOnHost
specifies whether the crawler should restrict its crawling to the domain given by the StartURL(s)
NumPages
specifies the number of pages to crawl before the crawler will stop running. Setting this value to „-1“ will switch the crawler in „infinite mode“ meaning it will only stop when you click the „Stop Crawler“ button.