After running a CAS crawl, you run a baseline or partial update that incorporates the records from a Record Store instance. How you configure the coordination between your CAS crawls and baseline or partial updates depends upon how complicated your environment is. This topic describes several scenarios.
<!--
########################################################################
# Baseline update script
#
-->
<script id="BaselineUpdate">
<log-dir>./logs/provisioned_scripts</log-dir>
<provisioned-script-command>./control/baseline_update.bat</provisioned-script-command>
<bean-shell-script>
<![CDATA[
log.info("Starting baseline update script.");
// obtain lock
if (LockManager.acquireLock("update_lock")) {
// call the baseline crawl script to run a full CAS
// crawl.
CAS.runBaselineCasCrawl("MyCrawl");
if (ConfigManager.isWebStudioEnabled()) {
// get Web Studio config, merge with Dev Studio config
ConfigManager.downloadWsConfig();
ConfigManager.fetchMergedConfig();
} else {
ConfigManager.fetchDsConfig();
}
// clean directories
Forge.cleanDirs();
PartialForge.cleanCumulativePartials();
Dgidx.cleanDirs();
// fetch extracted data files to forge input
Forge.getIncomingData();
LockManager.removeFlag("baseline_data_ready");
// fetch config files to forge input
Forge.getConfig();
// archive logs and run ITL
Forge.archiveLogDir();
Forge.run();
Dgidx.archiveLogDir();
Dgidx.run();
// distributed index, update Dgraphs
DistributeIndexAndApply.run();
// if Web Studio is integrated, update Web Studio with latest
// dimension values
if (ConfigManager.isWebStudioEnabled()) {
ConfigManager.cleanDirs();
Forge.getPostForgeDimensions();
ConfigManager.updateWsDimensions();
}
// archive state files, index
Forge.archiveState();
Dgidx.archiveIndex();
// (start or) cycle the LogServer
LogServer.cycle();
// release lock
LockManager.releaseLock("update_lock");
log.info("Baseline update script finished.");
} else {
log.warning("Failed to obtain lock.");
}
]]>
</bean-shell-script>
</script>
You run the baseline update by running baseline_update in the apps/[appDir]/control directory.
C:\Endeca\apps\DocApp\control>baseline_update.bat
<!--
########################################################################
# Partial update script
#
-->
<script id="PartialUpdate">
<log-dir>./logs/provisioned_scripts</log-dir>
<provisioned-script-command>./control/partial_update.bat</provisioned-script-command>
<bean-shell-script>
<![CDATA[
log.info("Starting partial update script.");
// obtain lock
if (LockManager.acquireLock("update_lock")) {
// call the partial crawl script to run an incremental
// CAS crawl.
CAS.runIncrementalCasCrawl("MyCrawl");
// archive logs
PartialForge.archiveLogDir();
// clean directories
PartialForge.cleanDirs();
// fetch config files to forge input
PartialForge.getConfig();
// run ITL
PartialForge.run();
// timestamp partial, save to cumulative partials dir
PartialForge.timestampPartials();
PartialForge.fetchPartialsToCumulativeDir();
// distribute partial update, update Dgraphs
DgraphCluster.cleanLocalPartialsDirs();
DgraphCluster.copyPartialUpdateToDgraphServers();
DgraphCluster.applyPartialUpdates();
// archive partials
PartialForge.archiveCumulativePartials();
// release lock
LockManager.releaseLock("update_lock");
log.info("Partial update script finished.");
} else {
log.warning("Failed to obtain lock.");
}
]]>
</bean-shell-script>
</script>
You run the partial update by running partial_update in the apps/[appDir]/control directory.
C:\Endeca\apps\DocApp\control>partial_update.bat
There is a more complicated case where multiple CAS crawls are running on their own schedules, and updates are running on their own schedules. To coordinate this asynchronous workflow of CAS crawls and baseline or partial updates, you add code that calls methods in ContentAcquisitionServerComponent.
For details about ContentAcquisitionServerComponent, see EAC Component API Reference for CAS Server (Javadoc) installed in <Endeca installation path>\CAS\<version>\doc\cas-dt-javadoc.
In your AppConfig.xml code, the main coordination task is one of determining how you time running CAS crawls and how you time running baseline or partial updates that consume records from those crawls. For example, suppose you have an application that runs three full CAS crawls and those records are consumed by a single baseline update. In that scenario, each of the three full crawls has its own full crawl script in AppConfig.xml that runs on a nightly schedule. And the AppConfig.xml file contains a baseline update that runs nightly to consume the latest generation of records from each of the three crawls. The Forge.isDataReady check is not required in the baseline update script because the source data is not locked.
<!--
########################################################################
# full crawl script
#
-->
<script id="endeca_fullCasCrawldoc">
<log-dir>./logs/provisioned_scripts</log-dir>
<provisioned-script-command>./control/runcommand.bat endeca_fullCasCrawldoc</provisioned-script-command>
<bean-shell-script>
<![CDATA[
crawlName = "endeca";
log.info("Starting full CAS crawl '" + crawlName + "'.");
// obtain lock
if (LockManager.acquireLock("crawl_lock_" + crawlName)) {
CAS.runBaselineCasCrawl(crawlName);
LockManager.releaseLock("crawl_lock_" + crawlName);
}
else {
log.warning("Failed to obtain lock.");
}
log.info("Finished full CAS crawl '" + crawlName + "'.");
]]>
</bean-shell-script>
</script>
<!--
########################################################################
# Baseline update script
#
-->
<script id="BaselineUpdate">
<log-dir>./logs/provisioned_scripts</log-dir>
<provisioned-script-command>./control/baseline_update.bat</provisioned-script-command>
<bean-shell-script>
<![CDATA[
log.info("Starting baseline update script.");
// obtain lock
if (LockManager.acquireLock("update_lock")) {
if (ConfigManager.isWebStudioEnabled()) {
// get Web Studio config, merge with Dev Studio config
ConfigManager.downloadWsConfig();
ConfigManager.fetchMergedConfig();
} else {
ConfigManager.fetchDsConfig();
}
// clean directories
Forge.cleanDirs();
PartialForge.cleanCumulativePartials();
Dgidx.cleanDirs();
// fetch extracted data files to forge input
Forge.getIncomingData();
LockManager.removeFlag("baseline_data_ready");
// fetch config files to forge input
Forge.getConfig();
// archive logs and run ITL
Forge.archiveLogDir();
Forge.run();
Dgidx.archiveLogDir();
Dgidx.run();
// distributed index, update Dgraphs
DistributeIndexAndApply.run();
// if Web Studio is integrated, update Web Studio with latest
// dimension values
if (ConfigManager.isWebStudioEnabled()) {
ConfigManager.cleanDirs();
Forge.getPostForgeDimensions();
ConfigManager.updateWsDimensions();
}
// archive state files, index
Forge.archiveState();
Dgidx.archiveIndex();
// (start or) cycle the LogServer
LogServer.cycle();
// release lock
LockManager.releaseLock("update_lock");
log.info("Baseline update script finished.");
} else {
log.warning("Failed to obtain lock.");
}
]]>
</bean-shell-script>
</script>