This section describes the high-level steps for integrating and
running a CAS crawl that writes output to a record output file.
This section assumes you have done the following:
- Installed and started the
CAS Service and CAS Console.
- Installed an EAC Agent and
started it on the server running the CAS Service.
- Deployed an application
using the template.
- Configured an application by
editing the
AppConfig.xml.
The steps below are described in their own topics.
- Create a CAS crawl.
- Specify a CAS Server host in
AppConfig.xml.
- Specify a CAS Server as a
custom component for any CAS crawl that writes to record output files.
- Specify a pipeline to run in
AppConfig.xml.
- Edit
fetchCasCrawlDataConfig.xml to reflect the details
of your crawling environment.
- Create a CAS crawl script,
for the crawl you created in step 1, by running the
[appdir]/control/cas/make_cas_crawl_scripts
script.
- Run a baseline CAS crawl (a
full crawl) using the sample CAS crawl pipeline. If the baseline update runs
without failing, you can start to make further modifications to your
deployment, such as using your custom pipeline.
- Optionally, run an
incremental CAS crawl. These steps verify that your configuration files are
correct.
- Load the crawl files
generated in the previous step to be processed by the sample CAS crawl
pipeline.
- Run a baseline update using
the sample CAS crawl pipeline with the new crawl record output files.
Note: The instructions provided in this section apply to the Dgraph
deployment type. If your are using an Agraph, all of the file system crawl
integration components are deployed and work the same way, but you need to
customize the
cas_crawl_pipeline (or create your own pipeline)
to process the crawl data into an Agraph. Essentially, the crawl functionality
works exactly the same way, but the Deployment Template does not provide a
sample pipeline for the Agraph case.