Hi,
There are quite a few options for the reporting of data for built-in and
application defined encoding algorithms.
Text content
------------
built-in
ContentHandler
Convert to characters
PrimitiveTypeContentHandler
Array of primitive type
EncodingAlgorithmContentHandler.object
Object of array of primitive type
EncodingAlgorithmContentHandler.octets
Raw encoded octets
application-defined
ContentHandler
Convert to characters from EncodingAlgorithm.convertToCharacters
EncodingAlgorithmContentHandler.object
Object returned from registed encoding algorithm using
EncodingAlgorithm.decode method
EncodingAlgorithmContentHandler.octets
Raw encoded octets
Attribute value
---------------
built-in
Attributes.getValue
Convert to characters
EncodingAlgorithmAttributes.getAlgorithmData
Object of primitive type or
raw encoded octets
application-defined
Attributes.getValue
Convert to characters from EncodingAlgorithm.convertToCharacters
EncodingAlgorithmAttributes.getAlgorithmData
Object returned from registed encoding algorithm using
EncodingAlgorithm.decode method or
raw encoded octets
Currently the choice for the SAX parser impl is reduced by the following:
- do not report characters for application defined algorithms
- primtive types are never reported as raw encoded octets
- registering of handlers specifies precedence of reporting for text
content and attribute values.
then we need to add:
- application-defined data reported as raw encoded octets unless
encoding algorithm is registered.
that should cover most use-cases and we can tweak as required for
additional edge cases with further properties.
Paul.
--
| ? + ? = To question
----------------\
Paul Sandoz
x38109
+33-4-76188109