com.adobe.pdf
Interface PDFWord


public interface PDFWord

PDFWord defines a public interface for accessing one word from a list of the static-text words in a PDF document. Custom XML/PDF Access API for Java applications can use the PDFWord interface to perform these types of operations:

You can obtain a list of the words in a PDF document through a java.util.ListIterator object, which is created by calling the getWords method of a PDFDocument object. For information about obtaining a list of the words and using a list iterator to step through the words, see the Developer Guide.


Method Summary
 int getPageNumber()
          Gets the number of the page on which a particular word occurs.
 java.util.List getQuadList()
          Gets the list of Quad objects associated with a word.
 java.lang.String getString()
          Gets the Unicode characters that represent a particular word.
 

Method Detail

getString

public java.lang.String getString()
Gets the Unicode characters that represent a particular word.

Returns:
A java.lang.String object that identifies a Unicode-encoded string.

getPageNumber

public int getPageNumber()
Gets the number of the page on which a particular word occurs.

Returns:
An integer that represents the page number.

getQuadList

public java.util.List getQuadList()

Gets the list of Quad objects associated with a word. The items in the list represent the areas occupied by the characters in a word. A single Quad object may describe the area occupied by more than one character, depending on the attributes and orientations of the characters that make up the word. The characters could all be the same size and font, but that is not always the case. Also, because text can be aligned to a path, the orientation of each character could change from one character to the next.

The following conceptual diagram illustrates two examples of the same word and indicates how many Quad objects could be referenced in the list, depending on the attributes and orientation of each character in the word.

Quadrilateral areas bounding the characters of a word.

In the first example, the baseline of the text is aligned to a path. A path can be open (like an arc) or closed (like a circle). The characters are the same size and font; however, because the baseline associated with each character has a different orientation, more than one Quad object is required to accurately describe the orientation of each character.

In the second example, one of the characters is larger than the other characters in the word; therefore, three Quad objects are needed to accurately describe the overall area occupied by the word. If all the characters in a word have the same size, font, and orientation, a list referencing only one Quad object is returned.

Returns:
A java.util.List list that references one or more com.adobe.pdf.Quad objects. You can access a Quad object through the Quad interface. For example code, see Quad.
See Also:
Quad