users@fi.java.net

Re: Fast Info Set and large XML files

From: Oleksiy Stashok <Oleksiy.Stashok_at_Sun.COM>
Date: Tue, 29 Dec 2009 13:57:51 +0100

Hi Swami,

FastInfoset implementation has possibility to filter XML attribute
values and XML character content data by their size and/or the total
memory consumed by indexing table.
By default we index XML attributes and XML character content data,
which length is 0 <= length < 32. The total size of indexed elements
in table should be < Integer.MAX_VALUE.
I think if you'll be able to tune those properties to decrease the
amount of indexed attribute values and character chunks - it will help
to avoid the issue.

The default values could be changed via FastInfosetSerializer API
methods [1]. But I'm not sure jibx exposes FastInfosetSezializer
(Encoder)...

Thanks.

WBR,
Alexey.


[1]
     /**
      * Gets the minimum size of character content chunks
      * that will be indexed.
      *
      * @return The minimum character content chunk size.
      */
     public int getMinCharacterContentChunkSize();

     /**
      * Sets the minimum size of character content chunks
      * that will be indexed.
      *
      * @param size the minimum character content chunk size.
      */
     public void setMinCharacterContentChunkSize(int size);

     /**
      * Gets the maximum size of character content chunks
      * that will be indexed.
      *
      * @return The maximum character content chunk size.
      */
     public int getMaxCharacterContentChunkSize();

     /**
      * Sets the maximum size of character content chunks
      * that will be indexed.
      *
      * @param size the maximum character content chunk size.
      */
     public void setMaxCharacterContentChunkSize(int size);

     /**
      * Gets the limit on the memory size of Map of attribute values
      * that will be indexed.
      *
      * @return The attribute value size limit.
      */
     public int getCharacterContentChunkMapMemoryLimit();

     /**
      * Sets the limit on the memory size of Map of attribute values
      * that will be indexed.
      *
      * @param size The attribute value size limit. Any value less
      * that a length of size limit will be indexed.
      */
     public void setCharacterContentChunkMapMemoryLimit(int size);

     /**
      * Gets the minimum size of attribute values
      * that will be indexed.
      *
      * @return The minimum attribute values size.
      */
     public int getMinAttributeValueSize();

     /**
      * Sets the minimum size of attribute values
      * that will be indexed.
      *
      * @param size the minimum attribute values size.
      */
     public void setMinAttributeValueSize(int size);

     /**
      * Gets the maximum size of attribute values
      * that will be indexed.
      *
      * @return The maximum attribute values size.
      */
     public int getMaxAttributeValueSize();

     /**
      * Sets the maximum size of attribute values
      * that will be indexed.
      *
      * @param size the maximum attribute values size.
      */
     public void setMaxAttributeValueSize(int size);

     /**
      * Gets the limit on the memory size of Map of attribute values
      * that will be indexed.
      *
      * @return The attribute value size limit.
      */
     public int getAttributeValueMapMemoryLimit();

     /**
      * Sets the limit on the memory size of Map of attribute values
      * that will be indexed.
      *
      * @param size The attribute value size limit. Any value less
      * that a length of size limit will be indexed.
      */
     public void setAttributeValueMapMemoryLimit(int size);





I think the problem here is that current limits for attribute values
and character sequences are not fit well for your usecase.
On Dec 15, 2009, at 15:06 , Swaminathan Gnanaskandan wrote:

> FastInfoset throws the following exception when marshalling large
> XML files. It works fine for smaller XML files.
>
> Error writing marshalled document
> java.io.IOException: Error writing to stream: java.io.IOException:
> Integer > 1,048,576
> at
> org.jibx.runtime.impl.StAXWriter.startTagOpen(StAXWriter.java:161)
> at
> org
> .jibx
> .runtime
> .impl.MarshallingContext.startTagAttributes(MarshallingContext.java:
> 541)
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model
> .stats.JiBX_MungeAdapter.JiBX_concurrent_model_binding_marshal_1_9()
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model
> .stats
> .TransactionStats
> .JiBX_concurrent_model_binding_marshal_1_0(TransactionStats.java)
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model
> .stats.JiBX_MungeAdapter.JiBX_concurrent_model_binding_marshal_1_10()
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model
> .stats
> .TransactionStatistics
> .JiBX_concurrent_model_binding_marshal_1_0(TransactionStatistics.java)
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model
> .stats.JiBX_MungeAdapter.JiBX_concurrent_model_binding_marshal_1_11()
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model
> .stats
> .TrafficStatistics
> .JiBX_concurrent_model_binding_marshal_1_0(TrafficStatistics.java)
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model
> .ObservedTraffic
> .JiBX_concurrent_model_binding_marshal_1_0(ObservedTraffic.java)
> at
> com
> .cisco
> .avm
> .corona
> .concurrent
> .model.JiBX_concurrent_model_bindingObservedTraffic_access.marshal()
> at
> com
> .cisco
> .avm
> .corona.concurrent.model.ObservedTraffic.marshal(ObservedTraffic.java)
> at
> org
> .jibx
> .runtime.impl.MarshallingContext.marshalRoot(MarshallingContext.java:
> 1021)
> at
> org
> .jibx
> .runtime
> .impl.MarshallingContext.marshalDocument(MarshallingContext.java:1041)
> at
> com
> .cisco
> .avm
> .corona
> .model.marshaller.XmlMarshaller.marshallBinaryXml(XmlMarshaller.java:
> 93)
> at
> com
> .cisco
> .avm
> .corona.model.marshaller.XmlMarshaller.marshall(XmlMarshaller.java:67)
> at
> com
> .cisco
> .avm
> .corona.model.marshaller.XmlMarshaller.marshall(XmlMarshaller.java:51)
> at
> com
> .cisco
> .avm
> .corona
> .messaging.TAModelMessageCodec.encode(TAModelMessageCodec.java:94)
> at
> com
> .cisco
> .avm
> .corona
> .messaging.TAModelMessageCodec.encode(TAModelMessageCodec.java:21)
> at
> com
> .cisco
> .avm
> .messaging
> .codec.AbstractAmxMessageCodec.encode(AbstractAmxMessageCodec.java:47)
> at
> org
> .apache
> .mina
> .filter
> .codec
> .demux.DemuxingProtocolEncoder.encode(DemuxingProtocolEncoder.java:
> 134)
> at
> org
> .apache
> .mina
> .filter
> .codec.ProtocolCodecFilter.filterWrite(ProtocolCodecFilter.java:298)
> at
> org
> .apache
> .mina
> .core
> .filterchain
> .DefaultIoFilterChain
> .callPreviousFilterWrite(DefaultIoFilterChain.java:506)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.access
> $1400(DefaultIoFilterChain.java:46)
> at org.apache.mina.core.filterchain.DefaultIoFilterChain
> $EntryImpl$1.filterWrite(DefaultIoFilterChain.java:805)
> at org.apache.mina.core.filterchain.DefaultIoFilterChain
> $TailFilter.filterWrite(DefaultIoFilterChain.java:731)
> at
> org
> .apache
> .mina
> .core
> .filterchain
> .DefaultIoFilterChain
> .callPreviousFilterWrite(DefaultIoFilterChain.java:506)
> at
> org
> .apache
> .mina
> .core
> .filterchain
> .DefaultIoFilterChain.fireFilterWrite(DefaultIoFilterChain.java:498)
> at
> org
> .apache
> .mina.core.session.AbstractIoSession.write(AbstractIoSession.java:428)
> at
> org
> .apache
> .mina.core.session.AbstractIoSession.write(AbstractIoSession.java:369)
> at
> com
> .cisco
> .avm.messaging.transport.AmxConnector.sendMessage(AmxConnector.java:
> 123)
> at
> com
> .cisco
> .avm.corona.applayout.AppServiceImpl.exportModel(AppServiceImpl.java:
> 243)
> at com.cisco.avm.corona.applayout.AppServiceImpl
> $ExportTask.run(AppServiceImpl.java:301)
> at java.util.concurrent.Executors
> $RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask
> $Sync.innerRunAndReset(FutureTask.java:317)
> at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at java.util.concurrent.ScheduledThreadPoolExecutor
> $ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at java.util.concurrent.ScheduledThreadPoolExecutor
> $ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
> at java.util.concurrent.ScheduledThreadPoolExecutor
> $ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
> at java.util.concurrent.ThreadPoolExecutor
> $Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor
> $Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
>
>
> I think the problem is in the encoder
>
> protected final void
> encodeNonZeroIntegerOnSecondBitFirstBitOne(int i) throws IOException {
> if (i < EncodingConstants.INTEGER_2ND_BIT_SMALL_LIMIT) {
> // [1, 64] ( [0, 63] ) 6 bits
> write(0x80 | i);
> } else if (i <
> EncodingConstants.INTEGER_2ND_BIT_MEDIUM_LIMIT) {
> // [65, 8256] ( [64, 8255] ) 13 bits
> i -= EncodingConstants.INTEGER_2ND_BIT_SMALL_LIMIT;
> _b = (0x80 |
> EncodingConstants.INTEGER_2ND_BIT_MEDIUM_FLAG) | (i >> 8); // 010
> 00000
> // _b = 0xC0 | (i >> 8); // 010 00000
> write(_b);
> write(i & 0xFF);
> } else if (i <
> EncodingConstants.INTEGER_2ND_BIT_LARGE_LIMIT) {
> // [8257, 1048576] ( [8256, 1048575] ) 20 bits
> i -= EncodingConstants.INTEGER_2ND_BIT_MEDIUM_LIMIT;
> _b = (0x80 |
> EncodingConstants.INTEGER_2ND_BIT_LARGE_FLAG) | (i >> 16); // 0110
> 0000
> // _b = 0xE0 | (i >> 16); // 0110 0000
> write(_b);
> write((i >> 8) & 0xFF);
> write(i & 0xFF);
> } else {
> throw new IOException(
>
> CommonResourceBundle.getInstance().getString("message.integerMaxSize",
> new Object[]
> {Integer.valueOf(EncodingConstants.INTEGER_2ND_BIT_LARGE_LIMIT)}));
> }
> }
>
> How do I control indexing?
>
> Any pointers to this problem is appreciated.
>
> Regards,
> Swami