Tell Me Glossary
 

Determining What Constitutes a Document

Previous previous|next Next Page

Timothy expresses with Philip that one of the harder decisions to make is in determining what constitutes a document.

Philips explains this to Timothy:

SES has the concept of a document, which is one “unit of indexing”. It doesn’t necessarily have to be a document in the traditional sense of a Word document or complete PowerPoint presentation. It might, for example, be all the collected information about a single person from an HR system. When you perform a search, a document is what is returned as one entry in the hitlist.

For some systems, what constitutes a document will be simple and obvious. For other systems it will be far less so.

Associated with this is the consideration of Display URLs. A Display URL is provided by a crawler plugin as a metadata item, along with the actual data to be indexed. A Display URL is the URL, which appears in the hitlist for the user to click on, and will normally take the user directly to the source of the document information. Display URLs must be unique for each document.