Running a baseline or incremental CAS crawl to record output files

Use one of the scripts located in the [appdir]/control/cas directory to run a CAS crawl.

To run a baseline or an incremental CAS crawl:

  1. To run a baseline, run the baseline_cas_crawl script and specify the name of the crawl to run. Note that a baseline crawl will remove any existing incremental crawl output files. After running a baseline crawl, you can run multiple incremental crawls, because an incremental crawl will not remove any previous incremental data.
  2. To run an incremental crawl, run the incremental_cas_crawl script and specify the name of the crawl to run.
Note: Note that you cannot specify multiple crawl names. To run multiple crawls, you will need to invoke the command multiple times. Multiple crawls can be run simultaneously, sequentially, or completely independently with this approach.
When a baseline crawl runs, the following happens:
  1. The script deletes the previous baseline and incremental output files from the crawl output directory configured for the CAS crawl.
  2. The CAS Server first writes the output file to the output directory configured for the crawl. The filename uses the naming convention specified for the crawl (for example, CrawlerOutput-FULL-sgmt000.bin.gz).
  3. The script prepends the crawl name to the output filename (for example, the name becomes MyCrawl_CrawlerOutput-FULL-sgmt000.bin.gz).
  4. The script deletes the previous baseline and incremental output files (i.e., the files in the casCrawlFullOutputDestDir and casCrawlIncrementalOutputDestDir directories).
  5. The script copies the baseline output file to the directory specified by the casCrawlFullOutputDestDir property.
When an incremental crawl runs, the following happens:
  1. The script deletes the previous baseline and incremental output files from the crawl output directory configured for the CAS crawl.
  2. The CAS Server first writes the output file to the output directory configured for the crawl. The filename uses the naming convention specified in the crawl configuration (for example, CrawlerOutput-INCR-sgmt000.bin.gz).
  3. The script prepends both the crawl name and a timestamp to the output filename (for example, MyCrawl_2008.02.26.04.49.39_CrawlerOutput-INCR-sgmt000.bin.gz).
  4. The script copies the incremental output file to the directory specified by the casCrawlIncrementalOutputDestDir property.
Note: Because the incremental output files are timestamped, previous incrementals are not deleted.
The following example shows a baseline crawl being run on a Windows machine:
C:\Endeca\Apps\WineApp\control\cas>baseline_cas_crawl MyCrawl
[07.17.08 14:42:40] INFO: Checking definition from AppConfig.xml,
MyCrawlCasCrawlConfig.xml, fetchCasCrawlDataConfig.xml against 
existing EAC provisioning.
[07.17.08 14:42:47] INFO: Setting definition for host 'CASHost'.
[07.17.08 14:43:17] INFO: Setting definition for script
'MyCrawl_baselineCasCrawl'.
[07.17.08 14:43:18] INFO: Setting definition for script
'MyCrawl_incrementalCasFileSystemCrawl'.
[07.17.08 14:43:20] INFO: Setting definition for custom component 'CAS'.
[07.17.08 14:43:20] INFO: Updating provisioning for host 'CASHost'.
[07.17.08 14:43:20] INFO: Updating definition for host 'CASHost'.
[07.17.08 14:43:22] INFO: [CASHost] Starting shell utility 
'mkpath_-data-complete-cas-crawl-output-incremental'.
[07.17.08 14:43:27] INFO: [CASHost] Starting shell utility 
'mkpath_-data-complete-cas-crawl-output-full'.
[07.17.08 14:43:30] INFO: Setting definition for script
'fetchFullCasCrawlData'.
[07.17.08 14:43:32] INFO: Setting definition for script
'fetchIncrementalCasCrawlData'.
[07.17.08 14:43:32] INFO: Definition updated.
[07.17.08 14:43:33] INFO: Starting full CAS crawl 'MyCrawl'.
[07.17.08 14:43:33] INFO: Acquired lock 'crawl_lock_MyCrawl'.
[07.17.08 14:43:33] INFO: Starting baseline CAS crawl with id 'MyCrawl'.
[07.17.08 14:45:05] INFO: Acquired lock
'complete_cas_crawl_data_lock'.
[07.17.08 14:45:06] INFO: [CASHost] Starting copy utility
'copy_MyCrawl_crawl_output_to_dest_dir'.
[07.17.08 14:45:07] INFO: Released lock
'complete_cas_crawl_data_lock'.
[07.17.08 14:45:08] INFO: Released lock 'crawl_lock_MyCrawl'.
[07.17.08 14:45:08] INFO: Finished full CAS crawl 'MyCrawl'.