Sending records to the data store

The BulkIngester class is the primary entry point for the client-side Bulk Load Interface for loading data into an Endeca data store.

BulkIngester makes a socket connection to the Endeca data store and spawns a thread to handle replies. Its sendRecord() method sends the provided record over the wire to the data store.

Clients to this interface must:
  1. Define classes that implement the four callback interfaces (ErrorCallback, FinishedCallback, AbortCallback, and StatusCallback), and perform the appropriate action when their handler methods are called (which happens in the response thread).
  2. Instantiate a BulkIngester object with the appropriate parameters required by the constructor.
  3. Call the begin() method to start the response thread. If this is not called, an IOException will be thrown.
  4. Call sendRecord() repeatedly to send Record objects to the Endeca data store.
  5. When finished sending records, call endIngest() to terminate the response thread and close the socket.

Defining callback interfaces

The BulkIngester constructor requires the four callback interfaces as parameters:
  • ErrorCallback handles error conditions. The handleError() method
  • FinishedCallback is called when the Dgraph reports that it has finished with the ingestion. No further records will be accepted without calling begin() again.
  • AbortCallback handles abort conditions. An abort condition can happen either in BulkIngester or on the Dgraph.
  • StatusCallback handles status updates, including the number of successfully ingested records and the number of rejected records.
ErrorCallback is especially useful as it reports the reason that a record was rejected. The sample program defines this callback as:
ErrorCallback errorCallback = new ErrorCallback() {
    void handleError(String reason, Record reject) {
        System.out.println("Record "
                + reject.getSpec().getName()
                + " rejected: " + reason);
    }
};

Instantiating a BulkIngester object

The BulkIngester constructor requires ten parameters, in this order:
  • host – the name (a String) of the machine on which the Endeca data store is running.
  • port – the bulk load port (an int) of the Endeca data store.
  • useSSL – a boolean to specify whether to use SSL for the connection.
  • doFinalMerge – a boolean that specifies whether a merge is forced immediately after ingest.
  • doUpdateDictionary – a boolean that specifies whether the aspell dictionary is updated immediately after ingest.
  • timeout – the timeout in milliseconds (an int) for connecting to the Endeca data store.
  • errorCallback – the ErrorCallback object.
  • finishedCallback – the FinishedCallback object.
  • abortCallback – the AbortCallback object.
  • statusCallback – the StatusCallback object.
The sample program constructs the BulkIngester as follows:
BulkIngester ingester("endecaserver.example.com",
        1234,        // port
        false,       // useSSL
        true,        // doFinalMerge
        true,        // doUpdateDictionary
        90000        // timeout in ms
        errorCallback,
        finishedCallback,
        abortCallback,
        statusCallback);

Beginning and ending the ingest

After the client program has made a connection to the Endeca data store, the ingest process requires the use of these BulkIngester methods in this order:
  1. The begin() method starts the ingest process.
  2. A series of sendRecord() calls actually sends the Record objects to the data store.
  3. The endIngest() method terminates the ingest process.
The sample program, which ingests only two records, is coded as follows:
Record widget = makeProductRecord("Widget", 12, 99.95);
Record thing = makeProductRecord("Thing", 110, 3.14); 
        
ingester.begin();
ingester.sendRecord(widget);
ingester.requestStatusUpdate();
ingester.sendRecord(thing);    
ingester.endIngest();

Note that the requestStatusUpdate() method is used to retrieve the status of the ingest operation.