-
Notifications
You must be signed in to change notification settings - Fork 0
HTML Spider Protocol
This protocol is an alternative to OAI-PMH, it can be used for either small or ad hoc repositories or when, for some reasons, the use of OAI-PMH is not possible.
It consists of a single plain HTML file containing links to metadata record files. The location of the metadata record files does not matter as long as they are remotely accessible. The HTML Spider will read the file and only consider the hyperlinks, more specifically, the content of the href attribute of the a elements, and ignore the rest. It will assume that all the links are pointing to a metadata record file. In the case where a link points to a different kind of resources, the Spider will not be able to parse it (as it's not a metadata record) and output an exception in the harvest report. These exceptions are harmless and can be ignored.
The name of the HTML file does not matter as long as it's the default file that will be served by the repository url. Typically, a Linux or Windows server will use index.html or index.htm, respectively. This is important because when the administrator specifies the information of a HTML repository in the Harvest Definition, the url of the repository must not be a file but a directory.
To determine the datestamp of a metadata record, its content is parsed and the following values are considered:
-
//lom:lom/lom:lifeCycle/lom:contribute/lom:date/lom:dateTime(2.3.3) -
//lom:lom/lom:metaMetadata/lom:contribute/lom:date/lom:dateTime(3.2.3) -
//lom:lom/lom:annotation/lom:date/lom:dateTime(8.2)
A more recent date in these elements will indicate that the record must be updated in the system.