Skip Headers

Oracle Internet File System Developer Reference
Release 9.0.1.1.0

Part Number A90093-02
Go To Table Of Contents
Contents
Go To Index
Index

Go to previous page Go to next page

9
Creating a Custom Parser

This chapter discusses the creation and use of custom parsers in Oracle 9iFS. Topics include:

What Is a Parser?

A parser is a program that accepts a string of information and breaks it into its constituent elements. In Oracle 9iFS, a parser is a Java class that extracts structured data elements from a file as it is being inserted into the repository. In the case of a document, a parser:

In some cases, a parser doesn't work with the data stream at all, and can be thought of as a preprocessor for the document as it is inserted to Oracle 9iFS. An example of this would be the ClassSelectionParser, described in "Using the ClassSelectionParser".

In most cases, the most convenient way to insert structured data into Oracle 9iFS is to use an XML file and use the out-of-the-box XML parsing framework to extract the data elements from the document. This is covered in Chapter 10, "XML and the Oracle Internet File System".

Using a Custom Parser

If you want to parse non-XML documents, such as .doc or .xls documents, or if you have defined a custom type, you must write a custom parser to create database objects from these documents. To create a custom parser, you can either subclass an existing Oracle 9iFS parser, or create a custom class implementing the oracle.ifs.beans.parsers.Parser interface.

The Parser class creates one or more objects. In most cases, the Parser class is used to create the following objects:

A parser determines which type of object to create based on the InputStream or Reader object passed to it. If the InputStream or Reader describes more than one type of object, the parser can either:

Overview of a Parser Application

A parser application includes four components:

Table 9-1 describes each component.

Table 9-1 Parser Components

Component  Description/Sample 

Application 

The application creates an instance of the parser required, then calls the parser, specifying the document representation (required), the name of the ParserCallback object (optional), and the Options object (optional). 

Parser 

The parser executes whatever custom code is needed to create the parsed object, then stores the parsed object in the repository. 

ParserCallback 

The application may optionally specify a ParserCallback object. The ParserCallback object's preOperation() or postOperation() methods specify additional processing that is executed before, after, or both before and after the parsing operation takes place. 

ParserLookupBy
FileExtension
PropertyBundle 

Oracle 9iFS looks up the name of the parser for this document class in the ParserLookupByFileExtension PropertyBundle. 

Writing a Parser Application

Writing a parser application includes the following tasks:

  1. Write the Parser Class

  2. Deploy the Parser

  3. Invoke the Parser (in the parser application)

  4. Write a ParserCallback (optional)

Write the Parser Class

The purpose of a parser is to identify data elements in a file, and use the elements to populate attributes in a database object.

When creating a custom parser, you can choose from two approaches:

Whichever approach you choose, writing a custom parser means implementing the Parser interface, either directly or indirectly. The Parser interface includes one overloaded method, parse(), which accepts two types of input:

Once the parse() method has been called, the balance of the code of the parser itself examines each line and places its content into the appropriate attribute of the object the parser is creating. The syntax and arguments for parse() are described below.

To write a custom parser, you must write two methods:

Write a Constructor

Every parser must implement the standard constructor for a parser. The standard constructor takes one parameter, as shown in Table 9-2.

Table 9-2 Constructor Parameters

Parameter  Datatype  Description 

session 

LibrarySession 

The LibrarySession of the current user. 

Example 9-1 A Constructor

public SimplestParser(LibrarySession session) throws IfsException

Write a parse() Method

Table 9-3 describes the parameters of the parse() method.

Table 9-3 Parse() method Parameters

Parameter  Datatype  Description 

stream 

InputStream 

An InputStream for the parser to read. Use an InputStream for data that is not character-based, such as audio and video data. 

reader 

Reader 

Alternatively, a Reader for the parser to read. A Reader should be used for character-based data. 

callback 

ParserCallback 

Optional parameter. May be null. If specified, the ParserCallback object includes methods that specify processing to be implemented before parsing, after parsing, or both.  

options 

Hashtable 

Optional parameter. May be null. If specified, the Options parameter further controls the behavior of the parser through a set of optional name/value pairs. Commonly used to specify character encoding. 

For sample code for the parse() method, see "Sample Code: A Custom Parser" in this chapter.

Deploy the Parser

For protocol servers and other standard Oracle 9iFS components to access your custom parser, the folder tree containing the class for the parser must reside in the Oracle 9iFS CLASSPATH. Oracle 9iFS includes a special directory for this purpose. This directory, called custom_classes, is defined in the CLASSPATH environment variable that the Oracle 9iFS server software uses.

To deploy a parser:

  1. Compile the parser, creating a .class file.

  2. Place the folder tree that contains the resulting .class file in the directory $ORACLE_HOME/ifs/custom_classes on the server where Oracle 9iFS is installed.



    Note:

    The compiled Java code must be copied to the native file system of the server, not to the Oracle 9iFS repository. 


Register the Parser

The purpose of registering a parser is to map a certain file extension to a specific parser. Once this mapping is created, whenever a file with that extension is uploaded by an Oracle 9iFS client or protocol, the file will be passed to the custom parser before it is stored in the repository. You can register a parser in either of two ways:


Facility  Advantages/Restrictions 

Oracle 9iFS Manager 

Use Oracle 9iFS Manager for simplicity and ease-of-use. Using Oracle 9iFS Manager, you can only register a parser that exists on the same instance of Oracle 9iFS as the Oracle 9iFS Manager facility. 

XML 

Use XML if you prefer to register a parser using a script, or if you need to deploy the parser on a separate Oracle 9iFS instance. 

Each registered parser has two attributes:


Attribute  Datatype  Description  Example 

Extension 

String 

File extension. 

cus 

ClassName 

String 

Fully qualified classname of the parser. 

ifs.demo.SimplestParser.
parser.SimplestParser
 

The underlying mechanism for storing the mappings between file extensions and parsers is a PropertyBundle object called "ParserLookupByFileExtension." A PropertyBundle is a list of name/value pairs stored as an array of Property objects. Each Property object in this PropertyBundle stores the mapping between a file extension and a parser as a Name/Value pair:

Registering a Parser Using Oracle 9iFS Manager

To register a parser using Oracle 9iFS Manager, follow these steps:

  1. From the Oracle 9iFS Manager Object menu, choose Register.

  2. From the Select Object Type window, choose Parser Lookup.

  3. From the Parser Lookup Registry window, choose Add.

  4. In the Parser Lookup Entry window, fill in the text boxes for the attributes.

  5. Click OK.

Registering a Parser Using XML

To register a parser using XML, write an XML file to add a new Property object to the ParserLookupByFileExtension PropertyBundle, specifying the file extension and class name of the parser.

<?xml version="1.0" standalone="yes"?>
<!--SimplestParser.xml-->
<PROPERTYBUNDLE>
   <UPDATE RefType="valuedefault">ParserLookupByFileExtension</UPDATE>
   <PROPERTIES>
      <PROPERTY ACTION="add">
         <NAME>cus</NAME>
         <VALUE DataType="String">
           ifs.demo.SimplestParser.parser.SimplestParser
         </VALUE>
      </PROPERTY>
   </PROPERTIES>
</PROPERTYBUNDLE>

Invoke the Parser

In the SimplestParser example ("Registering a Parser Using XML"), the protocol server automatically calls the parser when a document with the correct extension (.cus) is inserted to Oracle 9iFS.

When an application program inserts content into the repository, the application is responsible for invoking the appropriate parser, either a standard Oracle 9iFS parser or a custom parser.

In order to parse a document, an application must:

Write a ParserCallback

When a custom application calls a parser, the application may, optionally, pass in a ParserCallback object. A ParserCallback allows an application to provide additional processing before or after the data stream is processed.

The ParserCallback interface specifies three methods that allow an application to interact with a parser:

The preOperation() Method


The application can use preOperation() to alter the LibraryObjectDefinition before the parser uses it to update the repository, in the following ways:

Example 9-2 preOperation()

public LibraryObjectDefinition preOperation (LibraryObject lo,
LibraryObjectDefinition def)
throws IfsException

The postOperation() method


The application can use postOperation() to access the repository object that was created or updated by the parser.

Parameter Name  Datatype  Description 

lo 

LibraryObject 

The LibraryObject that was created or updated by the parse operation.  

Example 9-3 postOperation()

public void postOperation (LibraryObject lo)
throws IfsException

The signalException() method


The application can implement signalException() to intercept any exceptions that occur during parsing. The options are:

Example 9-4 signalException()

public void signalException(IfsException e)
throws IfsException

Example 9-5 ParserCallback Implementation

Example 9-5 provides a brief example of implementing the ParserCallback interface:

Sample Code: A Custom Parser

This SimplestParser extracts the text between the <TITLE> tags of an HTML document and stores that information in a custom field. This requires that a subclass of Document, named CUSTOM with the attribute TITLE, be registered on the server with the file extension .cus. This example happens to use a custom file extension, but this is not required: you could register this parser with the file extensions .htm and .html, and have the parser process all HTML documents.

This is a simplified example and does not take into consideration versioned documents. Nor does it address issues concerning local character sets.

package oracle.ifs.examples.documentation.parser;

// These classes provide the building blocks for a 9iFS document.

import oracle.ifs.beans.Attribute;
import oracle.ifs.beans.Document;
import oracle.ifs.beans.DocumentDefinition;
import oracle.ifs.beans.Format;
import oracle.ifs.beans.LibraryObject;
import oracle.ifs.common.Collection;
import oracle.ifs.common.AttributeValue;

// These classes are used to instantiate a folder object to store the document.

import oracle.ifs.beans.Folder;
import oracle.ifs.beans.FolderPathResolver;

// These classes are used to obtain information about the user at runtime.

import oracle.ifs.beans.DirectoryUser;
import oracle.ifs.beans.PrimaryUserProfile;
import oracle.ifs.beans.LibrarySession;

// These classes are the base classes for creating a parser.

import oracle.ifs.beans.parsers.Parser;
import oracle.ifs.beans.parsers.ParserCallback;
import java.util.Hashtable;

// These are standard Java objects used to process the document content.

import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.BufferedReader;

// This class is used to report exceptions in iFS methods.

import oracle.ifs.common.IfsException;

public class SimplestParser implements Parser
{
  private String title;
  private LibrarySession m_librarySession;
  private Document newDoc;
  private Folder currentFolder;
  private Folder homeFolder;

// The constructor argument captures the current library session, which is
// used to pass information about the user and environment at runtime.

  public SimplestParser(LibrarySession session) throws IfsException
  {
    m_librarySession = session;
  }

/* This parser is called by the host protocol at runtime, passing a Reader
 * object with the contents of the document being parsed. The callback is
 * an optional argument that enables the parser to respond to the calling
 * method. The Hashtable is used to store three key parameters:
 * CURRENT_PATH_OPTION: the current working directory.
 * CURRENT_NAME_OPTION: the name of the file being parsed.
 * UPDATE_OBJECT_OPTION: indicates if the document being parsed is
 *                       replacing an object that already exists.
 */
  public LibraryObject parse(Reader htmlStream, ParserCallback callback,
        Hashtable options) throws IfsException
  {
  try
  {

/*  Instantiate a FolderPathResolver, then pass it the CURRENT_PATH_OPTION as
 *  a string. Set the currentFolder variable to the path where the document
 *  to be parsed was inserted.
 */

    FolderPathResolver fpr = new FolderPathResolver(m_librarySession);
    fpr.setRelativePath(options.get(Parser.CURRENT_PATH_OPTION).toString());
    currentFolder = fpr.getCurrentDirectory();

/*  Instantiate the string variable documentContent.
 *  Instantiate a BufferedReader object named dataStream and populate it
 *  with the document content passed in to the method.
 */

    String documentContent = "";
    BufferedReader dataStream =
                           new BufferedReader(htmlStream);

// Read the buffered data into the documentContent variable one line at a time.

    for (String line = dataStream.readLine();line != null;
                           line = dataStream.readLine())
    {
      documentContent = documentContent + line + "\n";
    }

//  Send the resulting string to the parseTitle method to extract the title.

    String docTitle = parseTitle(documentContent);

//  Instantiate a DocumentDefinition object.

    DocumentDefinition docDef = new DocumentDefinition(m_librarySession);

//  Instantiate a Collection object and populate it with the list of
//  format extensions. Set the format in the document definition.

    Collection allFormats = m_librarySession.getFormatExtensionCollection();
    docDef.setFormat((Format) allFormats.getItems("cus"));

//  The Classname is the name of the subclass we've defined (CUSTOM).

    docDef.setClassname("Custom");

//  Set the Name attribute in the document definition to the variable passed
//  to the parser in the options Hashtable.

    docDef.setAttribute("NAME", AttributeValue.newAttributeValue
                    (options.get(Parser.CURRENT_NAME_OPTION)));

//  Set the custom attribute "TITLE" to the docTitle variable returned by the
//  parseTitle method.

    docDef.setAttribute("TITLE", AttributeValue.newAttributeValue(docTitle));

//  Set the content of the document to the String documentContent.

    docDef.setContent(documentContent);

/*  Check to see if the UPDATE_OBJECT_OPTION variable is set. If so, update
 *  the document (update). If not, create a new document (addItem).
*/
    if(options.get(UPDATE_OBJECT_OPTION) != null)
    {
      Document currentDoc = (Document) currentFolder.findPublicObjectByPath
                      (docDef.getAttribute("NAME").toString());
      currentDoc.update(docDef);
    }
    else
    {
      
      //  Instantiate a new Document using the DocumentDefinition just defined.

      Document newDoc = (Document) m_librarySession.createPublicObject(docDef);
      currentFolder.addItem(newDoc);
    }
  }

// Catch any exceptions. Set VerboseMessage to true to get a more complete
// report of the methods that threw the exception.

  catch (IfsException ifsExceptionCaught)
  {
    ifsExceptionCaught.setVerboseMessage(true);
    ifsExceptionCaught.printStackTrace();
  }
  catch (Exception exceptionCaught)
  {
    exceptionCaught.printStackTrace();
  }
  return newDoc;
  }

/* parse method called when the protocol sends the file content as an
 * InputStream. This method converts the InputStream to a BufferedReader
 * and forwards it to the first parse method (keeps code concise).
*/

  public LibraryObject parse(InputStream htmlStream, ParserCallback callback,
                             Hashtable options)
  {

// Convert the InputStream htmlStream to the BufferedReader named redirect.

    BufferedReader redirect =
        new BufferedReader(new InputStreamReader(htmlStream));

// Send the resulting BufferedReader to the first parse method.

    try {
      Document newDoc = (Document) parse(redirect,callback,options);
    }

// Catch and report (in verbose mode) any exceptions.

    catch (IfsException ifsExceptionCaught)
    {
      ifsExceptionCaught.setVerboseMessage(true);
      ifsExceptionCaught.printStackTrace();
    }
    catch (Exception exceptionCaught)
    {
      exceptionCaught.printStackTrace();
    }
    return newDoc;
  }

/*  This is the actual custom parsing routine. It searches the text String
 *  for the tag <TITLE>, starts at the 7th character (the length of the
 *  <TITLE> tag and extracts a substring of all the information through
 *  the last character before the </TITLE> tag.
 */

  private String parseTitle (String parseString){
    try
    {
      title = parseString.substring((parseString.indexOf("<TITLE>")+ 7),
                  parseString.indexOf("</TITLE>"));
    }
    catch (Exception e)
    {
      title = "Untitled";
      e.printStackTrace();
    }
    return title;
  }
}

 

Using the ClassSelectionParser

The ClassSelectionParser uses a PropertyBundle to identify the correct ClassObject to use when a document is inserted into Oracle 9iFS. It is unique, in that it doesn't process the content of the document but its metadata (the name). To use the ClassSelectionParser you need to perform these tasks:

  1. Create a Class Definition

  2. Register the Parser

  3. Register the Class

Create a Class Definition

The first step in a ClassSelectionParser is to create the custom class definition. This example uses an XML configuration file to define a custom class for presentation slides, and describes one additional attribute, NumberOfSlides, to be added to all files of the new type Presentation.

<?xml version = '1.0' standalone = 'yes'?>
<!--Presentation.xml-->
<ClassObject>
  <Name>Presentation</Name>
   <Superclass RefType='name'> Document </Superclass> 
  <Description>Custom Class for Presentations</Description>
  <Attributes>
    <Attribute>
      <Name>NumberOfSlides</Name>
      <DataType>INTEGER</DataType>
    </Attribute>
  </Attributes>
</ClassObject>

Register the Parser

Once the custom ClassObject has been created, you then associate the file extension (in this example, .ppt) with the ClassSelectionParser in the ParserLookupByFileExtension ValueDefault PropertyBundle. You can register the parser using Oracle 9iFS Manager or XML.

<?xml version = '1.0' standalone = 'yes'?>
<!--RegisterPPTParser.xml-->
<PropertyBundle>
  <update reftype='valuedefault'>  ParserLookupByFileExtension </update>
  <Properties>
    <Property action = 'add'>
      <Name> ppt </Name>
      <Value datatype='String'>
oracle.ifs.beans.parsers.ClassSelectionParser </Value> </Property> </Properties> </PropertyBundle>

Register the Class

Once the parser has been registered, you must register the custom class by adding an entry to the IFS.PARSER.ObjectTypeLookupByFileExtension PropertyBundle. In this case, the registration process associates the file extension (ppt) with the custom class (Presentation).

Registering the class completes the process necessary to invoke the ClassSelectionParser. If this step is omitted, the class associated with the ClassSelectionParser defaults to Document; no parsing will occur.

<?xml version = '1.0' standalone = 'yes'?>
<!--RegisterPPTObjectType.xml-->
<PropertyBundle>
  <update reftype='valuedefault'>  IFS.PARSER.ObjectTypeLookupByFileExtension 
</update>
  <Properties>
    <Property action = 'add'>
      <Name> ppt </Name>
      <Value datatype='String'>Presentation</Value>
    </Property>
  </Properties>
</PropertyBundle>

Once you have registered the file extension with the corresponding file type, documents with that extension will be created thereafter as instances of the ClassObject you define.


Go to previous page Go to next page
Oracle
Copyright © 2001 Oracle Corporation.

All Rights Reserved.
Go To Table Of Contents
Contents
Go To Index
Index