About storage types for CAS crawls

The Content Acquisition System can write output from a crawl to either a Record Store instance (the default) or to a record output file. The configuration of the Deployment Template varies depending on whether a crawl is configured to write output to a Record Store instance or configured to store crawl output in record output files. This topic describes the main differences between the storage types as they affect the operations of the Deployment Template.

Note: Although both storage types are fully supported, Oracle recommends configuring a CAS crawl to write to a Record Store instance rather than a record output file. This approach simplifies both the operational model and Deployment Template configuration.

Characteristics of a Record Store instance versus record output files

  • Files - In a Record Store instance, there are no individual files to manipulate. For crawls that write to record output files, there are one or more files to manipulate for both full and incremental crawls. You do not need to fetch files for a baseline crawl that is configured to write Endeca records to a Record Store instance. In crawls that write to a Record Store instance, a baseline pipeline uses a custom record adapter to read records directly from a Record Store instance. There is no fetching or copying record output files. For details about configuring a custom record adapter to read from a Record Store instance, see the CAS Developer's Guide.
  • Operational instructions - A crawl that writes output to a Record Store instance does not require Deployment Template configuration for output destinations, file names, or require instructions to move, copy, or fetch files. A crawl that writes to record output files requires configuration output destinations, file names, and instructions to move, copy, or fetch files.
  • Configuration properties - For crawls that write to a Record Store instance, the CAS server configuration properties in the custom-component need to specify the host and port of the CAS Server. No properties are required for output destinations. For crawls that write to a record output file, the CAS server configuration properties in the custom-component need to specify the host and port of the CAS Server, and output destinations for full and incremental crawl files.