Determining What Constitutes a Document |
||||
Timothy expresses with Philip that one of the harder decisions to make is in determining what constitutes a document. Philips explains this to Timothy: SES has the concept of a document, which is one unit of indexing. It doesnt necessarily have to be a document in the traditional sense of a Word document or complete PowerPoint presentation. It might, for example, be all the collected information about a single person from an HR system. When you perform a search, a document is what is returned as one entry in the hitlist. For some systems, what constitutes a document will be simple and obvious. For other systems it will be far less so. Associated with this is the consideration of Display URLs. A Display
URL is provided by a crawler plugin as a metadata item, along with the
actual data to be indexed. A Display URL is the URL, which appears in
the hitlist for the user to click on, and will normally take the user
directly to the source of the document information. Display URLs must
be unique for each document.
|