dev@fi.java.net

Re: FI ME 0.1

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Thu, 07 Jul 2005 18:55:57 +0200

Hi Changshin,

Changshin Lee wrote:
> Hi all,
>
> After I released StAX ME (http://www.iasandcb.pe.kr/stax-me) 0.1 right
> before, I started working on FI ME based on the current FI code. I just
> finished porting it to CLDC while several implementations still remain.
> FI ME is published at http://www.iasandcb.pe.kr/fi-me for your review.
>

Wow!. I will check it out.


> I have a couple of items I've found during this work.
>
> 1. Two XMLChar classes
> There are two different XMLChar in FI source. It seems that XMLChar in
> fi.util is not used.
>

Yes. This needs to be resolved. There is one very large class copied
from Xerces and we should reused this the remove the other one. Both
should be used, one by the Encoder and the other i recall by StAX for
events.

The large Xerces class should not be used for ME since it creates a very
large table. I wonder what ME parsers to do efficiently check unicode
ranges for the validation of unicode characters as specified by XML 1.x?



> 2. matchWhiteSpaceDelimnatedWords
> I'm curious of the word "Delimnated"and what the method tries to do. I'm
> actually slightly confused with the pattern "\\s" because Java 5 SE API
> doc says "\s" is a whitespace character.
>

Ah, this must be for the conversion of strings to binary data.

The two '\\' are just a way of saying '\' for a regular expression.

Read here:

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

"Backslashes within string literals in Java source code are interpreted
as required by the Java Language Specification as either Unicode escapes
or other character escapes. It is therefore necessary to double
backslashes in string literals that represent regular expressions to
protect them from interpretation by the Java bytecode compiler. The
string literal "\b", for example, matches a single backspace character
when interpreted as a regular expression, while "\\b" matches a word
boundary. The string literal "\(hello\)" is illegal and leads to a
compile-time error; in order to match the string (hello) the string
literal "\\(hello\\)" must be used."



> 3. Advice
> I just worked on making FI ME compilable on CLDC, and think I need to
> cut some weight out of it because at my first sight it looks a little
> bit heavy in terms of binary size and number of classes, fields, and
> methods. Please let me get your advice and recommendation on
> streamlining FI ME, particularly, considering that FI ME supports only StAX.
>

I agree. The conversion from strings to binary-data is not needed and
will be rarely used if at all (it is not used in the SE impl it is just
there for completeness).

Are you also implementing the event API?


> I'll post more details when I complete FI 0.1.
>


OK.

Perhaps it would be best to have a con-call or IRC chat about all this?

Do you want developer/commit access to the FI workspace? It would
certainly be good if this was part of Java.Net, either as part of FI or
as a related .net project.

I am not sure how feasible it is to share a subset of code for SE and
ME. Since the optimization strategies are different sharing code and
modifying may unduely affect one platform in unintended ways.

If we have separate code then we need to have tests in place for
interoperability. We have a whole bunch of XML files. I can convert them
to FI using the SE serializer and use this as the baseline for tests.

Paul.

-- 
| ? + ? = To question
----------------\
   Paul Sandoz
        x38109
+33-4-76188109