Agraph with parallel Forge baseline update script

The baseline update script defined in the AppConfig.xml document for an Agraph with parallel Forge deployment is included in this section, with numbered steps indicating the actions performed at each point in the script.

<script id="BaselineUpdate">
    <![CDATA[ 
  log.info("Starting baseline update script.");
  1. Obtain lock. The baseline update attempts to set an "update_lock" flag in the EAC to serve as a lock or mutex. If the flag is already set, this step fails, ensuring that the update cannot be started more than once simultaneously, as this would interfere with data processing. The flag is removed in the case of an error or when the script completes successfully.
      // obtain lock
      if (LockManager.acquireLock("update_lock")) {
    
  2. Validate data readiness. Check that a flag called "baseline_data_ready" has been set in the EAC. This flag is set as part of the data extraction process to indicate that files are ready to be processed (or, in the case of an application that uses direct database access, the flag indicates that a database staging table has been loaded and is ready for processing). This flag is removed as soon as the script copies the data out of the data/incoming directory, indicating that new data may be extracted.
        // test if data is ready for processing
        if (ForgeSplitData.isDataReady())
        {
    
  3. If Workbench integration is enabled, download and merge Oracle Endeca Workbench configuration. The ConfigManager copies all Developer Studio config files to the complete_index_config directory. Then, all Workbench-maintained configuration files are downloaded. Any files that are configured in the ConfigManager component to be maintained by Oracle Endeca Workbench are copied to the complete_index_config directory, overwriting the Developer Studio copy of the same file, if one exists. The final result is a complete set of configuration files for Forge to use. If Workbench integration is not enabled, the ConfigManager copies all Developer Studio config files to the complete_index_config directory.
          if (ConfigManager.isWebStudioEnabled()) {
            // get Web Studio config, merge with Dev Studio config
            ConfigManager.downloadWsConfig();
            ConfigManager.fetchMergedConfig();
          } else {
            ConfigManager.fetchDsConfig();
          }
    
  4. Clean processing directories. Files from the previous update are removed from the data/processing, data/forge_output, data/temp, data/forges/[ForgeID]/processing, data/forges/[ForgeID]/forge_output, data/forges/[ForgeID]/temp, data/dgidxs/[DgidxID]/dgidx_output, and data/agidx_outputdirectories.
          // clean directories
          ForgeSplitData.cleanDirs();
          ForgeCluster.cleanDirs();
          IndexingCluster.cleanDirs();
    
  5. Copy data to processing directory. Extracted data in data/incoming is copied to data/processing.
          // fetch extracted data files to forge input
          ForgeSplitData.getIncomingData();
    
  6. Release Lock. The "baseline_data_ready" flag is removed from the EAC, indicating that the incoming data has been retrieved for baseline processing.
          LockManager.releaseLock("baseline_data_ready");
  7. Copy config to processing directory. Configuration files are copied from data/complete_index_config to data/processing.
          // fetch config files to forge input
          ForgeSplitData.getConfig();
    
  8. Archive Forge logs. The logs/forges/ForgeSplitData directory is archived, to create a fresh logging directory for the Forge process and to save the previous Forge's logs.
          // archive logs and run ITL
          ForgeSplitData.archiveLogDir();
    
  9. Forge. The Forge process that splits the extracted data executes.
          ForgeSplitData.run();
  10. Copy split data to processing directory. Split data in data/forge_output is copied to data/forges/[ForgeID]/processing for each Forge in the Parallel Forge cluster.
          ForgeCluster.getData();
  11. Archive Parallel Forge logs. The logs/forges/[ForgeID] directories are archived, to create a fresh logging directory for the Forge processes and to save the previous Forge processes' logs.
          ForgeCluster.archiveLogDir();
  12. Forge in parallel. The Forge server and Forge client processes execute in parallel.
          ForgeCluster.run();
  13. Copy data to dgidx_input directories. Forged data in data/forges/[ForgeID]/forge_output is copied to data/dgidxs/[DgidxID]/dgidx_input directories.
          IndexingCluster.getDgidxIncomingData();
  14. Copy config to dgidx_input directories. Configuration files in data/forges/[ForgeID]/forge_output are copied to data/dgidxs/[DgidxID]/dgidx_input directories.
          IndexingCluster.getDgidxConfig();
  15. Archive Dgidx and Agidx logs. The logs/dgidx/[DgidxID] and logs/agidx/[AgidxID] directories are archived to create a fresh logging directory for the indexing processes and to save the previous processes' logs.
          IndexingCluster.archiveLogDir();
  16. Dgidx and Agidx. The Dgidx processes and Agidx process execute.
          IndexingCluster.run();
  17. Distribute index to each server. A single copy of the new index is distributed to each server that hosts a graph. If multiple graphs are located on the same server but specify different srcIndexDir attributes, multiple copies of the index will be delivered to that server.
  18. Update MDEX Engines. The graphs are updated. Engines are updated according to the restartGroup property specified for each graph. The update process for each graph in the restart group is as follows:
    1. Create agraph_input_new and create a local copy of the new index in agraph_input_new for each Agraph.
    2. Create dgraph_input_new and create a local copy of the new index in dgraph_input_new for each Dgraph.
    3. Stop each Agraph.
    4. Stop each Dgraph.
    5. Rename dgraph_input to dgraph_input_old for each Dgraph.
    6. Rename dgraph_input_new to dgraph_input for each Dgraph.
    7. Rename agraph_input to agraph_input_old for each Agraph.
    8. Rename agraph_input_new to agraph_input for each Agraph.
    9. Archive each Dgraph logs (e.g. logs/dgraphs/Dgraph1) directory.
    10. Archive each Agraph logs (e.g. logs/agraphs/Agraph1) directory.
    11. Start each Dgraph.
    12. Start each Agraph.
    13. Remove dgraph_input_old for each dgraph.
    14. Remove agraph_input_old for each Agraph.
    This somewhat complex update functionality is implemented to minimize the amount of time that a graph is stopped. This restart approach ensures that the graphs are stopped just long enough to rename two directories for each Dgraph.
          // distributed index, update graphs
          DistributeIndexAndApply.run();
    
      <script id="DistributeIndexAndApply">
        <bean-shell-script>
          <![CDATA[ 
        AgraphCluster.cleanDirs();
        AgraphCluster.copyIndexToAgraphServers();
        AgraphCluster.copyIndexToDgraphServers();
        AgraphCluster.applyIndex();
          ]]>
        </bean-shell-script>
      </script>
    
  19. If Workbench integration is enabled, upload post-Forge dimensions to Oracle Endeca Workbench. The latest dimensions generated by the Forge process are uploaded to Oracle Endeca Workbench, to ensure that any new dimensions (including autogen dimensions and external dimensions) are available to Oracle Endeca Workbench for use in, for example, dynamic business rule triggers.
          // if Workbench is integrated, update Workbench with latest 
          // dimension values
          if (ConfigManager.isWebStudioEnabled()) {
            ConfigManager.cleanDirs();
            ForgeServer.getPostForgeDimensions();
            ConfigManager.updateWsDimensions();
          }
    
    
  20. Archive index and Forge state. The newly created index and the state files in Forge's state directory are archived on the indexing servers.
          // archive state files, index
          ForgeCluster.archiveState();
          IndexingCluster.archiveIndex();
    
  21. Cycle LogServer. The LogServer is stopped and restarted. During the downtime, the LogServer's error and output logs are archived.
          // cycle LogServer
          LogServer.cycle();  
        }
        else
        {
          log.warning("Baseline data not ready for processing.");
        }
    
  22. Release Lock. The "update_lock" flag is removed from the EAC, indicating that another update may be started.
        // release lock
        LockManager.releaseLock("update_lock");
    
        log.info("Baseline update script finished.");
      }
      else {
        log.warning("Failed to obtain lock.");
      }
        ]]>
      </bean-shell-script>
    </script>