| Oracle Internet File System Developer Reference Release 9.0.1.1.0 Part Number A90093-02 |
|
This chapter discusses the creation and use of custom parsers in Oracle 9iFS. Topics include:
A parser is a program that accepts a string of information and breaks it into its constituent elements. In Oracle 9iFS, a parser is a Java class that extracts structured data elements from a file as it is being inserted into the repository. In the case of a document, a parser:
In some cases, a parser doesn't work with the data stream at all, and can be thought of as a preprocessor for the document as it is inserted to Oracle 9iFS. An example of this would be the ClassSelectionParser, described in "Using the ClassSelectionParser".
In most cases, the most convenient way to insert structured data into Oracle 9iFS is to use an XML file and use the out-of-the-box XML parsing framework to extract the data elements from the document. This is covered in Chapter 10, "XML and the Oracle Internet File System".
If you want to parse non-XML documents, such as .doc or .xls documents, or if you have defined a custom type, you must write a custom parser to create database objects from these documents. To create a custom parser, you can either subclass an existing Oracle 9iFS parser, or create a custom class implementing the oracle.ifs.beans.parsers.Parser interface.
The Parser class creates one or more objects. In most cases, the Parser class is used to create the following objects:
A parser determines which type of object to create based on the InputStream or Reader object passed to it. If the InputStream or Reader describes more than one type of object, the parser can either:
A parser application includes four components:
Table 9-1 describes each component.
Table 9-1 Parser Components
Writing a parser application includes the following tasks:
The purpose of a parser is to identify data elements in a file, and use the elements to populate attributes in a database object.
When creating a custom parser, you can choose from two approaches:
oracle.ifs.beans.parsers.Parser interface.)
oracle.ifs.beans.parsers.Parser.
Whichever approach you choose, writing a custom parser means implementing the Parser interface, either directly or indirectly. The Parser interface includes one overloaded method, parse(), which accepts two types of input:
Once the parse() method has been called, the balance of the code of the parser itself examines each line and places its content into the appropriate attribute of the object the parser is creating. The syntax and arguments for parse() are described below.
To write a custom parser, you must write two methods:
Every parser must implement the standard constructor for a parser. The standard constructor takes one parameter, as shown in Table 9-2.
Table 9-2 Constructor Parameters
| Parameter | Datatype | Description |
|---|---|---|
|
|
LibrarySession |
The LibrarySession of the current user. |
public SimplestParser(LibrarySession session) throws IfsException
Table 9-3 describes the parameters of the parse() method.
Table 9-3 Parse() method Parameters
For sample code for the parse() method, see "Sample Code: A Custom Parser" in this chapter.
For protocol servers and other standard Oracle 9iFS components to access your custom parser, the folder tree containing the class for the parser must reside in the Oracle 9iFS CLASSPATH. Oracle 9iFS includes a special directory for this purpose. This directory, called custom_classes, is defined in the CLASSPATH environment variable that the Oracle 9iFS server software uses.
To deploy a parser:
.class file.
.class file in the directory $ORACLE_HOME/ifs/custom_classes on the server where Oracle 9iFS is installed.
The purpose of registering a parser is to map a certain file extension to a specific parser. Once this mapping is created, whenever a file with that extension is uploaded by an Oracle 9iFS client or protocol, the file will be passed to the custom parser before it is stored in the repository. You can register a parser in either of two ways:
Each registered parser has two attributes:
| Attribute | Datatype | Description | Example |
|---|---|---|---|
|
|
String |
File extension. |
|
|
|
String |
Fully qualified classname of the parser. |
|
The underlying mechanism for storing the mappings between file extensions and parsers is a PropertyBundle object called "ParserLookupByFileExtension." A PropertyBundle is a list of name/value pairs stored as an array of Property objects. Each Property object in this PropertyBundle stores the mapping between a file extension and a parser as a Name/Value pair:
cus.
ifs.demo.simplestparser.parser.SimplestParser.
To register a parser using Oracle 9iFS Manager, follow these steps:
To register a parser using XML, write an XML file to add a new Property object to the ParserLookupByFileExtension PropertyBundle, specifying the file extension and class name of the parser.
<?xml version="1.0" standalone="yes"?> <!--SimplestParser.xml--> <PROPERTYBUNDLE> <UPDATE RefType="valuedefault">ParserLookupByFileExtension</UPDATE> <PROPERTIES> <PROPERTY ACTION="add"> <NAME>cus</NAME> <VALUE DataType="String"> ifs.demo.SimplestParser.parser.SimplestParser </VALUE> </PROPERTY> </PROPERTIES> </PROPERTYBUNDLE>
In the SimplestParser example ("Registering a Parser Using XML"), the protocol server automatically calls the parser when a document with the correct extension (.cus) is inserted to Oracle 9iFS.
When an application program inserts content into the repository, the application is responsible for invoking the appropriate parser, either a standard Oracle 9iFS parser or a custom parser.
In order to parse a document, an application must:
When a custom application calls a parser, the application may, optionally, pass in a ParserCallback object. A ParserCallback allows an application to provide additional processing before or after the data stream is processed.
The ParserCallback interface specifies three methods that allow an application to interact with a parser:
public LibraryObjectDefinition preOperation (LibraryObject lo,
LibraryObjectDefinition def)
throws IfsException
| Parameter Name | Datatype | Description |
|---|---|---|
|
|
LibraryObject |
The LibraryObject that was created or updated by the parse operation. |
public void postOperation (LibraryObject lo)
throws IfsException
| Parameter Name | Datatype | Description |
|---|---|---|
|
e |
IfsException |
The potential exception. |
public void signalException(IfsException e)
throws IfsException
Example 9-5 provides a brief example of implementing the ParserCallback interface:
/*---FolderParsedObject.java---*/ package oracle.ifs.examples.documentation.parser; import oracle.ifs.beans.Folder; import oracle.ifs.beans.LibraryObject; import oracle.ifs.beans.LibraryObjectDefinition; import oracle.ifs.beans.PublicObject; import oracle.ifs.beans.parsers.ParserCallback; private static class FolderParsedObject implements ParserCallback { private Folder m_TargetFolder; public FolderParsedObject(Folder f) { m_TargetFolder = f; } public LibraryObjectDefinition preOperation(LibraryObject parm1, LibraryObjectDefinition parm2) throws IfsException { return parm2; } public void postOperation(LibraryObject newObject) throws IfsException { m_TargetFolder.addItem((PublicObject) newObject); } public void signalException(IfsException e) throws IfsException { throw e; } }
This SimplestParser extracts the text between the <TITLE> tags of an HTML document and stores that information in a custom field. This requires that a subclass of Document, named CUSTOM with the attribute TITLE, be registered on the server with the file extension .cus. This example happens to use a custom file extension, but this is not required: you could register this parser with the file extensions .htm and .html, and have the parser process all HTML documents.
This is a simplified example and does not take into consideration versioned documents. Nor does it address issues concerning local character sets.
package oracle.ifs.examples.documentation.parser; // These classes provide the building blocks for a 9iFS document. import oracle.ifs.beans.Attribute; import oracle.ifs.beans.Document; import oracle.ifs.beans.DocumentDefinition; import oracle.ifs.beans.Format; import oracle.ifs.beans.LibraryObject; import oracle.ifs.common.Collection; import oracle.ifs.common.AttributeValue; // These classes are used to instantiate a folder object to store the document. import oracle.ifs.beans.Folder; import oracle.ifs.beans.FolderPathResolver; // These classes are used to obtain information about the user at runtime. import oracle.ifs.beans.DirectoryUser; import oracle.ifs.beans.PrimaryUserProfile; import oracle.ifs.beans.LibrarySession; // These classes are the base classes for creating a parser. import oracle.ifs.beans.parsers.Parser; import oracle.ifs.beans.parsers.ParserCallback; import java.util.Hashtable; // These are standard Java objects used to process the document content. import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import java.io.BufferedReader; // This class is used to report exceptions in iFS methods. import oracle.ifs.common.IfsException; public class SimplestParser implements Parser { private String title; private LibrarySession m_librarySession; private Document newDoc; private Folder currentFolder; private Folder homeFolder; // The constructor argument captures the current library session, which is // used to pass information about the user and environment at runtime. public SimplestParser(LibrarySession session) throws IfsException { m_librarySession = session; } /* This parser is called by the host protocol at runtime, passing a Reader * object with the contents of the document being parsed. The callback is * an optional argument that enables the parser to respond to the calling * method. The Hashtable is used to store three key parameters: * CURRENT_PATH_OPTION: the current working directory. * CURRENT_NAME_OPTION: the name of the file being parsed. * UPDATE_OBJECT_OPTION: indicates if the document being parsed is * replacing an object that already exists. */ public LibraryObject parse(Reader htmlStream, ParserCallback callback, Hashtable options) throws IfsException { try { /* Instantiate a FolderPathResolver, then pass it the CURRENT_PATH_OPTION as * a string. Set the currentFolder variable to the path where the document * to be parsed was inserted. */ FolderPathResolver fpr = new FolderPathResolver(m_librarySession); fpr.setRelativePath(options.get(Parser.CURRENT_PATH_OPTION).toString()); currentFolder = fpr.getCurrentDirectory(); /* Instantiate the string variable documentContent. * Instantiate a BufferedReader object named dataStream and populate it * with the document content passed in to the method. */ String documentContent = ""; BufferedReader dataStream = new BufferedReader(htmlStream); // Read the buffered data into the documentContent variable one line at a time. for (String line = dataStream.readLine();line != null; line = dataStream.readLine()) { documentContent = documentContent + line + "\n"; } // Send the resulting string to the parseTitle method to extract the title. String docTitle = parseTitle(documentContent); // Instantiate a DocumentDefinition object. DocumentDefinition docDef = new DocumentDefinition(m_librarySession); // Instantiate a Collection object and populate it with the list of // format extensions. Set the format in the document definition. Collection allFormats = m_librarySession.getFormatExtensionCollection(); docDef.setFormat((Format) allFormats.getItems("cus")); // The Classname is the name of the subclass we've defined (CUSTOM). docDef.setClassname("Custom"); // Set the Name attribute in the document definition to the variable passed // to the parser in the options Hashtable. docDef.setAttribute("NAME", AttributeValue.newAttributeValue (options.get(Parser.CURRENT_NAME_OPTION))); // Set the custom attribute "TITLE" to the docTitle variable returned by the // parseTitle method. docDef.setAttribute("TITLE", AttributeValue.newAttributeValue(docTitle)); // Set the content of the document to the String documentContent. docDef.setContent(documentContent); /* Check to see if the UPDATE_OBJECT_OPTION variable is set. If so, update * the document (update). If not, create a new document (addItem). */ if(options.get(UPDATE_OBJECT_OPTION) != null) { Document currentDoc = (Document) currentFolder.findPublicObjectByPath (docDef.getAttribute("NAME").toString()); currentDoc.update(docDef); } else { // Instantiate a new Document using the DocumentDefinition just defined. Document newDoc = (Document) m_librarySession.createPublicObject(docDef); currentFolder.addItem(newDoc); } } // Catch any exceptions. Set VerboseMessage to true to get a more complete // report of the methods that threw the exception. catch (IfsException ifsExceptionCaught) { ifsExceptionCaught.setVerboseMessage(true); ifsExceptionCaught.printStackTrace(); } catch (Exception exceptionCaught) { exceptionCaught.printStackTrace(); } return newDoc; } /* parse method called when the protocol sends the file content as an * InputStream. This method converts the InputStream to a BufferedReader * and forwards it to the first parse method (keeps code concise). */ public LibraryObject parse(InputStream htmlStream, ParserCallback callback, Hashtable options) { // Convert the InputStream htmlStream to the BufferedReader named redirect. BufferedReader redirect = new BufferedReader(new InputStreamReader(htmlStream)); // Send the resulting BufferedReader to the first parse method. try { Document newDoc = (Document) parse(redirect,callback,options); } // Catch and report (in verbose mode) any exceptions. catch (IfsException ifsExceptionCaught) { ifsExceptionCaught.setVerboseMessage(true); ifsExceptionCaught.printStackTrace(); } catch (Exception exceptionCaught) { exceptionCaught.printStackTrace(); } return newDoc; } /* This is the actual custom parsing routine. It searches the text String * for the tag <TITLE>, starts at the 7th character (the length of the * <TITLE> tag and extracts a substring of all the information through * the last character before the </TITLE> tag. */ private String parseTitle (String parseString){ try { title = parseString.substring((parseString.indexOf("<TITLE>")+ 7), parseString.indexOf("</TITLE>")); } catch (Exception e) { title = "Untitled"; e.printStackTrace(); } return title; } }
The ClassSelectionParser uses a PropertyBundle to identify the correct ClassObject to use when a document is inserted into Oracle 9iFS. It is unique, in that it doesn't process the content of the document but its metadata (the name). To use the ClassSelectionParser you need to perform these tasks:
The first step in a ClassSelectionParser is to create the custom class definition. This example uses an XML configuration file to define a custom class for presentation slides, and describes one additional attribute, NumberOfSlides, to be added to all files of the new type Presentation.
<?xml version = '1.0' standalone = 'yes'?> <!--Presentation.xml--> <ClassObject> <Name>Presentation</Name> <Superclass RefType='name'> Document </Superclass> <Description>Custom Class for Presentations</Description> <Attributes> <Attribute> <Name>NumberOfSlides</Name> <DataType>INTEGER</DataType> </Attribute> </Attributes> </ClassObject>
Once the custom ClassObject has been created, you then associate the file extension (in this example, .ppt) with the ClassSelectionParser in the ParserLookupByFileExtension ValueDefault PropertyBundle. You can register the parser using Oracle 9iFS Manager or XML.
<?xml version = '1.0' standalone = 'yes'?> <!--RegisterPPTParser.xml--> <PropertyBundle> <update reftype='valuedefault'> ParserLookupByFileExtension </update> <Properties> <Property action = 'add'> <Name> ppt </Name> <Value datatype='String'>
oracle.ifs.beans.parsers.ClassSelectionParser </Value> </Property> </Properties> </PropertyBundle>
Once the parser has been registered, you must register the custom class by adding an entry to the IFS.PARSER.ObjectTypeLookupByFileExtension PropertyBundle. In this case, the registration process associates the file extension (ppt) with the custom class (Presentation).
Registering the class completes the process necessary to invoke the ClassSelectionParser. If this step is omitted, the class associated with the ClassSelectionParser defaults to Document; no parsing will occur.
<?xml version = '1.0' standalone = 'yes'?> <!--RegisterPPTObjectType.xml--> <PropertyBundle> <update reftype='valuedefault'> IFS.PARSER.ObjectTypeLookupByFileExtension </update> <Properties> <Property action = 'add'> <Name> ppt </Name> <Value datatype='String'>Presentation</Value> </Property> </Properties> </PropertyBundle>
Once you have registered the file extension with the corresponding file type, documents with that extension will be created thereafter as instances of the ClassObject you define.
|
|
![]() Copyright © 2001 Oracle Corporation. All Rights Reserved. |
|