Re: Bugs in Decoder.java

From: Arman Djusupov <arman_at_noemax.com>
Date: Thu, 05 Nov 2009 09:59:08 +0200

Hello Andrzej,

In the case when an FI document contains an initial vocabulary but no root element, it cannot be read as a valid FI document (since it doesn't have any XML infoset representation) but rather as a proprietary vocabulary format.

Section 7.2.13 doesn't specify any format for an external vocabulary. It just gives a recommendation on how an external vocabulary can be created and stored.

We can use this opportunity to agree on some format that will be interoperable between our implementations as an informal extension of the standard. Then we can discuss it on fi-interop with other vendors for a wider agreement. Key point here is ensuring interop.

For example we could agree that an FI document with an initial vocabulary followed by a single terminator and no root element will be one of the recognizable FI vocabulary formats. In this case we could even agree on some file extension like ".fivoc" for example.

With best regards,
Arman

Andrzej Gladkowski wrote:
> On Wed, Nov 4, 2009 at 3:16 PM, Arman Djusupov <arman_at_noemax.com> wrote:
>
>> Hi Andrzej,
>>
>> I have fixed the decodeIntegerIndexOnSecondBit() in following manner (note
>> that this is C# code):
>> /*
>> * C.25
>> */
>> protected int decodeIntegerIndexOnSecondBit()
>> {
>> int b = read() | 0x80;
>>
>> switch (DecoderStateTables.ISTRING[b])
>> {
>> case DecoderStateTables.ISTRING_INDEX_SMALL:
>> return b & EncodingConstants.INTEGER_2ND_BIT_SMALL_MASK;
>> case DecoderStateTables.ISTRING_INDEX_MEDIUM:
>> return (((b &
>> EncodingConstants.INTEGER_2ND_BIT_MEDIUM_MASK) << 8) | read()) +
>> EncodingConstants.INTEGER_2ND_BIT_SMALL_LIMIT;
>> case DecoderStateTables.ISTRING_INDEX_LARGE:
>> return (((b &
>> EncodingConstants.INTEGER_2ND_BIT_LARGE_MASK) << 16) | (read() << 8) |
>> read()) + EncodingConstants.INTEGER_2ND_BIT_MEDIUM_LIMIT;
>> case DecoderStateTables.ISTRING_SMALL_LENGTH:
>> case DecoderStateTables.ISTRING_MEDIUM_LENGTH:
>> case DecoderStateTables.ISTRING_LARGE_LENGTH:
>> default:
>> throw new
>> FastInfosetException(Strings.message_decodingIndexOnSecondBit);
>> }
>> }
>>
>> So now I can successfully read the Initial Vocabulary encoded at the
>> beginning of your document. But your document seems that it's ending with a
>> single terminator right after the Initial Vocabulary encoding. So there is
>> no root element there?
>>
>>
>
> # No, there is no root element. This is totally external vocabulary,
> separate file (point 7.2.13 in the specification).
> # That external file is used when decoding the actual document.
>
>
>
>> With best regards,
>> Arman
>>
>>
>> Andrzej Gladkowski wrote:
>>
>>> On Tue, Nov 3, 2009 at 3:26 PM, Arman Djusupov <arman_at_noemax.com> wrote:
>>>
>>> Hello Andrzej,
>>>> It seems that the decodeNumberOfItemsOfSequence() method indeed has a
>>>> problem, since it doesn't add the lower boundary of the range after
>>>> reading
>>>> the value.
>>>>
>>>> But why do you think that C.25.2 is implemented in the wrong way?
>>>>
>>>> As far as I can see C.25.2 implementation is correct. It doesn't add +1
>>>> when reading 1-64 value range, because in the Java implementation the
>>>> vocabulary tables are 0 based, so practically adding and subtracting 1
>>>> while
>>>> reading/writing is not necessary. The same applies to other ranges. It
>>>> adds
>>>> 64 instead of 65 as lower boundary for medium ranged values and 8256
>>>> instead
>>>> of 8257 for high ranged values.
>>>> With best regards,
>>>> Arman
>>>>
>>>>
>>>> #Yes, I agree that subtracting 1 while reading/writing is not necessary.
>>> # No, I think the whole C.25 is implemented correctly, it's about
>>> something
>>> else. Please read further comments.
>>>
>>> # I think confusion is caused by the following points:
>>>
>>> C.13.4 If the alternative string-index is present, then the bit '1'
>>> (discriminant) is appended to the bit stream, and
>>> the string-index is encoded as described in C.25
>>> C.16.5 If the optional component prefix-string-index is present, then the
>>> bit '0' (padding) is appended to the bit
>>> stream, and the component is encoded as described in C.25.
>>> C.16.6 If the optional component namespace-name-string-index is present,
>>> then the bit '0' (padding) is appended
>>> to the bit stream, and the component is encoded as described in C.25.
>>> C.16.7 The bit '0' (padding) is appended to the bit stream, and the
>>> component local-name-string-index is
>>> encoded as described in C.25.
>>>
>>> # The function 'decodeNumberOfItemsOfSequence(..)' implements correctly
>>> point *C.13.4*, when the octed starts with '1'
>>> # but it fails if the octet starts with '0' !
>>> # I have looked into Encoder.java and there are two separate encoding
>>> methods:
>>> - Encoder.encodeNonZeroIntegerOnSecondBitFirstBitZero(..)
>>> - Encoder.encodeNonZeroIntegerOnSecondBitFirstBitOne(..)
>>> # We could fix Decoder.decodeNumberOfItemsOfSequence(..) by adding another
>>> method to handle octets starting with '0' or simply by ignoring the first
>>> bit in the octet all the time.
>>>
>>> # Here is the junit test that can be used to test both scenarios (by
>>> uncommenting the right FIRST_BIT constant in the code):
>>> =========================================================================
>>> import java.io.IOException;
>>> import org.jvnet.fastinfoset.FastInfosetException;
>>> import com.sun.xml.fastinfoset.Decoder;
>>> import junit.framework.TestCase;
>>>
>>> public class DecoderTest extends TestCase {
>>> /* Uncomment the right section to test first bit in the octet '1' or
>>> '0'
>>> */
>>> //private static final byte FIRST_BIT = (byte)0x80;//1000 0000 //
>>> C.13.4
>>> private static final byte FIRST_BIT = (byte)0x00;//0000 0000 //
>>> C.16.5-7
>>>
>>> private TestDecoder decoder;
>>> private byte[] buffer;
>>> private class TestDecoder extends Decoder {
>>> public TestDecoder(byte[] buffer) {
>>> this._octetBufferOffset = 0;
>>> this._octetBufferEnd = 15;
>>> this._octetBuffer = buffer;
>>> }
>>> public int decodeIntegerIndexOnSecondBitTest() throws
>>> FastInfosetException, IOException {
>>> return decodeIntegerIndexOnSecondBit();
>>> }
>>> }
>>>
>>> protected void setUp() throws java.lang.Exception {
>>> buffer = new byte[16];
>>> decoder = new TestDecoder(buffer);
>>> }
>>> // integer in range [1, 64], ( [0, 63] ) 6 bits
>>> public void testIntegerIndex0() throws IOException,
>>> FastInfosetException
>>> {
>>> buffer[0] = 0x00 | FIRST_BIT;
>>> final int result = decoder.decodeIntegerIndexOnSecondBitTest();
>>>
>>> assertEquals(0x00, result);
>>> }
>>> // integer in range [65, 8256], ( [64, 8255] ) 13 bits
>>> public void testIntegerIndex321() throws IOException,
>>> FastInfosetException {
>>> buffer[0] = 0x41 | FIRST_BIT;//100 0001 - last five bits
>>> buffer[1] = 0x01;// 0000 0001 - eight following bits
>>> final int result = decoder.decodeIntegerIndexOnSecondBitTest();
>>>
>>> assertEquals(257 + 64, result);
>>> }
>>> // integer in range [8257, 1048576], ( [8256, 1048575] ) 20 bits
>>> public void testIntegerIndex73793() throws IOException,
>>> FastInfosetException {
>>> buffer[0] = 0x61 | FIRST_BIT;//110 0001 - last four bits
>>> buffer[1] = 0x00;// 0000 0000 - eight following bits
>>> buffer[2] = 0x01;// 0000 0001 - eight following bits
>>> final int result = decoder.decodeIntegerIndexOnSecondBitTest();
>>>
>>> assertEquals(65537 + 8256, result);
>>> }
>>> }
>>> =========================================================================
>>>
>>> # Another small issue can be found in
>>> Decoder.decodeTableItems(QualifiedNameArray array, boolean isAttribute).
>>> # Wrong:
>>>
>>> String namespaceName = "";
>>> int namespaceNameIndex = -1;
>>> if ((b & EncodingConstants.NAME_SURROGATE_NAME_FLAG) > 0) {
>>> namespaceNameIndex = decodeIntegerIndexOnSecondBit();
>>> namespaceName = *_v.prefix.get(prefixIndex);*
>>> }
>>>
>>> # Correct: _v.prefix.get(prefixIndex); changed to
>>> _v.namespaceName.get(namespaceNameIndex);
>>>
>>> String namespaceName = "";
>>> int namespaceNameIndex = -1;
>>> if ((b & EncodingConstants.NAME_SURROGATE_NAME_FLAG) > 0) {
>>> namespaceNameIndex = decodeIntegerIndexOnSecondBit();
>>> namespaceName = *_v.namespaceName.get(namespaceNameIndex);*
>>> }
>>>
>>>
>>> Cheers,
>>> ~Andrzej
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
>> For additional commands, e-mail: dev-help_at_fi.dev.java.net
>>
>>
>