The Content Acquisition System can write output from a crawl to
either a Record Store instance (the default) or to a record output file. The
configuration of the Deployment Template varies depending on whether a crawl is
configured to write output to a Record Store instance or configured to store
crawl output in record output files. This topic describes the main differences
between the storage types as they affect the operations of the Deployment
Template.
Note: Although both storage types are fully supported, Oracle recommends
configuring a CAS crawl to write to a Record Store instance rather than a
record output file. This approach simplifies both the operational model and
Deployment Template configuration.
Characteristics of a Record Store instance versus record output
files
- Files - In a Record
Store instance, there are no individual files to manipulate. For crawls that
write to record output files, there are one or more files to manipulate for
both full and incremental crawls. You do not need to fetch files for a baseline
crawl that is configured to write Endeca records to a Record Store instance. In
crawls that write to a Record Store instance, a baseline pipeline uses a custom
record adapter to read records directly from a Record Store instance. There is
no fetching or copying record output files. For details about configuring a
custom record adapter to read from a Record Store instance, see the
CAS Developer's Guide.
- Operational
instructions - A crawl that writes output to a Record Store instance does
not require Deployment Template configuration for output destinations, file
names, or require instructions to move, copy, or fetch files. A crawl that
writes to record output files requires configuration output destinations, file
names, and instructions to move, copy, or fetch files.
- Configuration
properties - For crawls that write to a Record Store instance, the CAS
server configuration properties in the
custom-component need to specify the host and port
of the CAS Server. No properties are required for output destinations. For
crawls that write to a record output file, the CAS server configuration
properties in the
custom-component need to specify the host and port
of the CAS Server, and output destinations for full and incremental crawl
files.