users@jaxb.java.net

RE: JAXB escaping apostrophe (Single Quote)

From: Gary Gregory <GGregory_at_seagullsoftware.com>
Date: Tue, 6 Apr 2010 17:17:28 +0000

Let's use some examples to make it clear.

The following are all valid XML attributes for [Joe's "BIG" Crabs]:

1) myAttr="Joe's &quot;BIG&quot; Crabs"
2) myAttr='Joe&apos;s "BIG" Crabs'
3) myAttr="Joe&apos;s &quot;BIG&quot; Crabs"
4) myAttr='Joe&apos;s &quot;BIG&quot; Crabs'

There is no point in using 3 or 4 since you are just creating fatter XML for no reason. It is valid but wasteful.

Gary Gregory
Senior Software Engineer
Seagull Software
email: ggregory_at_seagullsoftware.com
email: ggregory_at_apache.org
www.seagullsoftware.com


> -----Original Message-----
> From: Ely Schoenfeld [mailto:ely.sun.1_at_mitalteli.com]
> Sent: Tuesday, April 06, 2010 07:58
> To: users_at_jaxb.dev.java.net
> Subject: Re: JAXB escaping apostrophe (Single Quote)
>
> Wolfgang Laun escribió:
> > On Tue, Apr 6, 2010 at 2:45 PM, Ely Schoenfeld <ely.sun.1_at_mitalteli.com>
> wrote:
> >> Well... yes and no.
> >>
> >> Either way. As long as I can tell, the specification encourages the
> >> translation of single quotes to "& apos;" (with no space), in certain
> >> circumstances.
> >
> > Basically, yes. The intent is to give programs and humans a free hand
> > for composing XML documents.
> >
> >> <quote>
> >> To allow attribute values to contain both single and double quotes, the
> >> apostrophe or single-quote character (') may be represented as " & apos;
> >> ", and the double-quote character (") as " & quot; ".
> >> </quote>
> >> Reference: http://www.w3.org/TR/REC-xml/#syntax
> >>
> >> Based on this. If JAXB should comply to the W3C rules, it should be
> >> possible to "translate" the single quote to "& apos;". Don't you think?
> >
> > No. The quoted paragraph simply says that you can use entities for
> > both, apostrophe and quote, which is obviously necessary for the
> > one you use as a surrounding delimiter. If a program serializing XML
> > content sticks to the quote as an attribute value delimiter, it doesn't
> > have to use &apos; There is no point in using the entity for apostrophe
> > when there is no conflict with XML interpunctuation.
> >
> > -W
> >
>
> Ok, "it doesn't have to" but should be possible. I'm not saying that
> every body have to use "& apos;", but it should be doable.
>
> In fact I can see the point on the need of using both entities here.
>
> Let's talk about this particular case. The generated XML represents an
> invoice. That invoice must be able to contain any kind of company name
> or person name in the whole country.
>
> I do consider possible to have some company that has an apostrophe in
> its name (i.e: [Somebody's Store]). Also could imagine some other
> company with double quotes in it's name (i.e: ["SomeInventedName"
> invetions store]). Or even both (i.e: [Sombody's "GREAT" inventions]).
>
> My point is that in this case at least, is not possible to stick to
> either one of these two attribute value delimiters. As I can see, you
> must use entities to generalize the "digital" invoice use.
>
> Am I right?
>
> Ely.
>
> >> Thank you for all your help.
> >>
> >> Ely.
> >>
> >> Wolfgang Laun escribió:
> >>> I've read through the comprobantes fiscales, and if I'm guessing the
> >>> Spanish
> >>> correctly,
> >>> the interesting paragraph is the one preceding what you quoted:
> >>>
> >>> Adicionalmente a las reglas de estructura planteadas dentro del presente
> >>> estándar, el contribuyente que opte por este mecanismo de generación de
> >>> comprobantes deberá sujetarse tanto a las disposiciones fiscales vigentes,
> >>> como a los lineamientos técnicos de forma y sintaxis para la generación
> >>> de
> >>> archivos XML especificados por el consorcio w3, establecidos en
> >>> www.w3.org.
> >>>
> >>> Doesn't this say that you have to follow the rules for XML as defined by
> >>> W3C?
> >>>
> >>> It would appear that the subsequent paragraphs have been inserted in an
> >>> attempt to guide those who think of hand-crafting their XML output. It
> >>> cannot seriously be meant to supersede the XML definition.
> >>>
> >>> Also, what I can understand of the last paragraph
> >>> ("Adicionalmente,...SAT.")
> >>> seems to support my assumption. Anybody who knows how XML representation
> >>> works wouldn't need any of this.
> >>>
> >>> Best
> >>> Wolfgang
> >>>
> >>> PS: If you need a supporting statement from someone with W3C, I know just
> >>> the right guy.
> >>>
> >>> On Mon, Apr 5, 2010 at 11:26 PM, Ely Schoenfeld
> >>> <ely.sun.1_at_mitalteli.com>wrote:
> >>>
> >>>> Hello All.
> >>>>
> >>>> As a suggestion from laune at dev.java.net I'm posting here my problem
> >>>> with jaxb character escapes.
> >>>>
> >>>> I really need help "translating" the single quote character to "&apos;".
> >>>> But if I define this in a CharacterEscapeHandler, I get "&amp;apos;"
> >>>> instead.
> >>>>
> >>>> I opened the issue number 741 called "Characters get escaped twice with
> >>>> Custom CharacterEscapeHandler and encoding=UTF-8" about this problem I
> >>>> have.
> >>>>
> >>>> It can be found at: https://jaxb.dev.java.net/issues/show_bug.cgi?id=741
> >>>>
> >>>> Any help you can provide will be really (REALLY) appreciated.
> >>>>
> >>>>
> >>>> The reason I need to "translate" the single quote to "&apos;" is because
> >>>> I'm required to by a government agency in Mexico. In case you understand
> >>>> Spanish, the specification appears in page number 6 from:
> >>>>
> >>>>
> >>>>
> ftp://ftp2.sat.gob.mx/asistencia_servicio_ftp/publicaciones/cfd/Anex20_v20.pdf
> >>>>
> >>>>
> >>>>
> http://www.sat.gob.mx/sitio_internet/e_sat/comprobantes_fiscales/15_6534.html
> >>>>
> >>>> ---------------- BEGIN ----------------
> >>>> En particular se deberá tener cuidado de que aquellos casos especiales
> >>>> que se presenten en los valores especificados dentro de los atributos
> >>>> del archivo XML como aquellos que usan el caracter & , el caracter " ,
> >>>> el caracter ' , el caracter < y el caracter > que requieren del uso de
> >>>> secuencias de escape.
> >>>>
> >>>> - En el caso del & se deberá usar la secuencia &amp;
> >>>> - En el caso del " se deberá usar la secuencia &quot;
> >>>> - En el caso del < se deberá usar la secuencia &lt;
> >>>> - En el caso del > se deberá usar la secuencia &gt;
> >>>> - En el caso del ' se deberá usar la secuencia &apos;
> >>>>
> >>>> Ejemplos:
> >>>> Para representar nombre="Juan & José & "Niño"" se usará nombre="Juan
> >>>> &amp; José &amp; &quot;Niño&quot;"
> >>>>
> >>>> Adicionalmente, cabe mencionar de que a pesar de que la especificación
> >>>> XML permite el uso de secuencias de escape para el manejo de caracteres
> >>>> acentuados y el carácter ñ, dichas secuencias de escape no son
> >>>> necesarias al expresar el documento XML bajo el estándar de codificación
> >>>> UTF-8 si fue creado correctamente, misma que es utilizada como único
> >>>> estándar por el SAT.
> >>>> ---------------- END ----------------
> >>>>
> >>>>
> >>>> Thank you very much in advance.
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
> >>>> For additional commands, e-mail: users-help_at_jaxb.dev.java.net
> >>>>
> >>>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
> >> For additional commands, e-mail: users-help_at_jaxb.dev.java.net
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
> > For additional commands, e-mail: users-help_at_jaxb.dev.java.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
> For additional commands, e-mail: users-help_at_jaxb.dev.java.net