This topic describes the files you modify in order to integrate and run CAS crawls as part of the Deployment Template.
Deployment Template files for CAS crawls that write to a Record Store instance
For crawls that write output to a Record Store instance, there are no additional configuration or script files. This is because the operational model interacting with a Record Store is relatively simple: there are no files to fetch, move, copy and so on. You edit AppConfig.xml to specify the required CAS Server host, the pipeline, and the baseline or incremental crawl that you want to run.
Deployment Template files for CAS crawls that write to record output files
For crawls that write output to record output files, the associated configuration and script files are in the
[appdir]/config/script directory and their purpose is as follows:
- The [appdir]/config/script/fetchCasCrawlDataConfig.xml file is the global CAS crawl configuration for the application. The file provides two major functions. First, it provides set of global configuration settings that are used for all file system and CMS crawls, such as the location of the CAS Server. Second, it provides two scripts for fetching baseline and incremental output files (i.e., transferring the crawl output files to the source data destination directories).
- There is a [crawlname]CasCrawlConfig.xml document for each configured CAS crawl. For example, the sample EndecaCasCrawlConfig.xml is for a crawl that was created with a name of "Endeca". The document contains two scripts: one for baseline crawls and one for incremental crawls.