Dgraph partial update script

The partial update script defined in the AppConfig.xml document for a Dgraph deployment is included in this section, with numbered steps indicating the actions performed at each point in the script.

<script id="PartialUpdate">
  <bean-shell-script>
    <![CDATA[
  1. Obtain lock. The partial update attempts to set an "update_lock" flag in the EAC to serve as a lock or mutex. If the flag is already set, this step fails, ensuring that the update cannot be started more than once simultaneously, as this would interfere with data processing. The flag is removed in the case of an error or when the script completes successfully.
        log.info("Starting partial update script.");
          // obtain lock
          if (LockManager.acquireLock("update_lock")) {
  2. Validate data readiness. Test that the EAC contains at least one flag with the prefix "partial_extract::". One of these flags should be created for each successfully and completely extracted file, with the prefix "partial_extract::" prepended to the extracted file name (e.g. "partial_extract::adds.txt.gz"). These flags are deleted during data processing and must be created as new files are extracted.
        // test if data is ready for processing
        if (PartialForge.isPartialDataReady()) {
    
  3. Archive partial logs. The logs/partial directory is archived, to create a fresh logging directory for the partial update process and to save the previous run's logs.
        // archive logs
        PartialForge.archiveLogDir();
    
  4. Clean processing directories. Files from the previous update are removed from the data/partials/processing, data/partials/forge_output, and data/temp directories.
        // clean directories
        PartialForge.cleanDirs();
    
  5. Move data and config to processing directory. Extracted files in data/partials/incoming with matching "partials_extract::" flags in the EAC are moved to data/partials/processing. Configuration files are copied from config/pipeline to data/processing.
        // fetch extracted data files to forge input
        PartialForge.getPartialIncomingData();
    
        // fetch config files to forge input
        PartialForge.getConfig();
    
  6. Forge. The partial update Forge process executes.
        // run ITL
        PartialForge.run();
    
  7. Apply timestamp to updates. The output XML file generated by the partial update pipeline is renamed to include a timestamp, to ensure it is processed in the correct order relative to files generated by previous or following partial update processes.
        // timestamp partial, save to cumulative partials dir
        PartialForge.timestampPartials();
    
  8. Copy updates to cumulative updates. The timestamped XML file is copied into the cumulative updates directory.
        PartialForge.fetchPartialsToCumulativeDir();
  9. Distribute update to each server. A single copy of the partial update file is distributed to each server specified in the configuration.
        // distribute partial update, update Dgraphs
        DgraphCluster.copyPartialUpdateToDgraphServers();
    
  10. Update MDEX Engines. The Dgraph processes are updated. Engines are updated according to the updateGroup property specified for each Dgraph. The update process for each Dgraph is as follows:
    1. Copy update files into the dgraph_input/updates directory.
    2. Trigger a configuration update in the Dgraph by calling the URL admin?op=update.
        DgraphCluster.applyPartialUpdates();
  11. Archive cumulative updates. The newly generated update file (and files generated by all partial updates processed since the last baseline) are archived on the indexing server.
        // archive partials
        PartialForge.archiveCumulativePartials();
    
  12. Release Lock. The "update_lock" flag is removed from the EAC, indicating that another update may be started.
        // release lock
        LockManager.releaseLock("update_lock");
        log.info("Partial update script finished.");
          }
          else {
            log.warning("Failed to obtain lock.");
          }
        ]]>
      </bean-shell-script>
    </script>
    

Preventing non-nillable element exceptions

When running the partial updates script, you may see a Java exception similar to this example:
INFO: Starting copy utility 'copy_partial_update_to_host_MDEXHost1'.
Oct 20, 2008 11:46:37 AM org.apache.axis.encoding.ser.BeanSerializer serialize
SEVERE: Exception:
java.io.IOException: Non nillable element 'fromHostID' is null.
...
If this occurs, make sure that the following properties are defined in the AppConfig.xml configuration file:
<dgraph-defaults>
  <properties>
      ...
      <property name="srcPartialsDir" value="./data/partials/forge_output" />
      <property name="srcPartialsHostId" value="ITLHost" />
      <property name="srcCumulativePartialsDir" value="./data/partials/cumulative_partials" />
      <property name="srcCumulativePartialsHostId" value="ITLHost" />
      ...
    </properties>
  ...
</dgraph-defaults>

The reason is that the script is obtaining the fromHostID value from this section.

Running partial updates with parallel Forge

If you have a configuration with two MDEX Engine servers each of which hosts two Dgraphs, you may want to run partial updates on each of these servers in parallel. This would require you to customize jobs for the EAC to make this happen. If you do this, keep in mind that the EAC Server expects all jobs sent to it to be unique across all servers.

Therefore, if you are customizing more than one job to the EAC Server, ensure that the jobs are created with a different name. This is because, even though each of these jobs runs on a separate MDEX Engine server and is unique on that server, the EAC Server expects all jobs to be unique across all servers.