dev@glassfish.java.net

Re: encoding in JSP Documents - JDK6 I18N Issue??

From: Kin-man Chung <Kin-Man.Chung_at_Sun.COM>
Date: Wed, 08 Nov 2006 17:19:06 -0800

On Wed, 2006-11-08 at 15:22, Krishnamohan Meduri wrote:
> Kin-man Chung wrote On 11/08/06 13:05,:
> > On Wed, 2006-11-08 at 11:36, Krishnamohan Meduri wrote:
> >
> >>Hi Kin-Man,
> >>
> >>You are right. The SQE test seems to be incorrect. Here is what is
> >>happening:
> >>
> >>The text in JSP document is encoded in gb2312. The xml prolog mentions
> >>encoding as "big5". The golden file is based on JDK5. It has a lot of
> >>unmappable characters(? characters). Apparently the output for
> >>unmappable characters is different in JDK5 and JDK6 (as mentioned by
> >>JDK6 I18N developer)
> >
> >
> > I am glad that this is verified.
> >
> >
> >>The test makes more sense if
> >> - the text in JSP document is encoded in big5
> >> - charset in content-type header field is "gb2312"
> >>
> >
> > Such a test is still questionable. If the original page is
> > encoded in big5, I am still not sure if the response encoding can
> > be gb2312, because there may be cases where a big5 char cannot be
> > mapped to gb2312.
>
> Is it too greedy to expect the JSP engine to decode big5 char to unicode
> first and then encode to gb2312 in such case? (I mean
> big5->unicode->gb2312).
>

The unicodes for big5 characters overlaps the unidcodes for GB
characters, but there are lots of big5 characters whose unicodes
are not unicodes for any GB characters. Remember: big5 is for
traditional Chinese characters, and GB is for simplified Chinese
characters. Characters that are part of both big5 and GB have the
same unicodes, and characters that have a simplified version have
one unicode for the traditional character (big5) and another for its
simplified version (GB).

If the intend of the test is to translate any text written in
big5 to gb2312, then the step should be

    big5->unicode1->unicode2->gb2312

The step unicode1->unicode2 is not one to one, and may fail in
some cases. In any case, it's not automatic and will require a
table lookup that is particular to these two encodings.

The short answer to your questions is: yes, it's too greedy
to expect that it'll work. :-)

-Kin-man


> -Krishna.
> >
> > -Kin-man
> >
> >
> >>-Krishna.
> >>
> >>Kin-man Chung wrote On 11/07/06 11:39,:
> >>
> >>>Is this a valid page? The xml header says that the encoding is
> >>>big5, yet the text is encoded in gb2312. No ideas how it works
> >>>with JDK5.
> >>>
> >>>Have you compared the .java files produced with JDK 5 vs JDK6?
> >>>
> >>>-Kin-man
> >>>
> >>>On Mon, 2006-11-06 at 17:23, Krishnamohan Meduri wrote:
> >>>
> >>>
> >>>>Hi,
> >>>>
> >>>>Glassfish with JDK6 produces undesired results (with some characters
> >>>>messed up) for the attached JSP Document.
> >>>>The same produces the desired results with JDK5.
> >>>>
> >>>>Is this any known issue in JDK6? If so, Could somebody pls give me the
> >>>>details?
> >>>>
> >>>>Thanks,
> >>>>-Krishna.
> >>>>
> >>>>______________________________________________________________________
> >>>>
> >>>><?xml version="1.0" encoding="big5"?>
> >>>><!--
> >>>> @author James Cai, for I18N feature in JSP 2.0 in XML
> >>>>
> >>>> Configuration as the following:
> >>>> Syntax pageEncoding contentType XML-prolog
> >>>> XML big5 gb2312 -
> >>>>-->
> >>>>
> >>>><jsp:root xmlns:jsp="http://java.sun.com/JSP/Page"
> >>>>version="2.0">
> >>>><jsp:directive.page contentType="text/html; charset=gb2312" />
> >>>><jsp:text><![CDATA[<HTML>
> >>>><HEAD>
> >>>> <TITLE>I18N TEST JSP2.0</TITLE>
> >>>></HEAD>
> >>>>
> >>>><BODY>
> >>>>
> >>>><FONT SIZE='+3'>
> >>>>
> >>>><CENTER>
> >>>>
> >>>><BR/>
> >>>>
> >>>>
> >>>><H1>Test Name: XML8-gb </H1>
> >>>>
> >>>><BR/>
> >>>><BR/>
> >>>><TABLE BORDER='1'>
> >>>><TR><TH>Language</TH><TH>Charset</TH></TR>
> >>>><TR><TD>Chinese</TD><TD>gb2312</TD></TR>
> >>>></TABLE>
> >>>>
> >>>><BR/>
> >>>>
> >>>>The following is Chinese character with gb2312 charset:
> >>>>
> >>>><BR/><BR/>
> >>>>
> >>>>¾©±¨ÍøѶ£¨¼ÇÕߡ¡¶¡ÕØÎģ©´ӽñÌ쿪ʼ£¬Ã9úÂéʡÀí¹¤ѧԺ˹¡¹ÜÀíѧԺÓëÇ廪´óѧ¾­¹ÜѧԺ¡¢¸´µ©´óѧ¹ÜÀíѧԺ¡¢Ïã¸ÛÁëÄϴóѧ
> >>>> ΪºÏ×÷ÅàÑø¹ú¼ÊMBA£¬ÔÚÏã¸ÛÕٿª¹ËÎÊίԱ»á»áÒ顣¼ÇÕß×òÌì´ÓÇ廪´óѧ»ñϤ£¬×÷Ϊ¸÷×Թú¼Ò×¼âµÄÉÌѧԺ֮һµÄÇ廪´óѧ
> >>>> ¾­¹ÜѧԺÓëÂéʡÀí¹¤ѧԺ˹¡¹ÜÀíѧԺ£¬ÔÚԲÂúÍê³ÉÁ˵Úһ¸öÎåÄêÖÜÆڵĺÏ×÷֮ºó£¬ÒѾ­¾ö¶¨´ӽñÄêÆðÐøǩÎåÄêµĺÏ×÷ÏîĿ£¬˫·½
> >>>> ´Ó1500ÃûÉêÇëÕßÖÐѡ°γöµÄÐÂһÅú120ÃûºÏ×÷ÅàÑøµĹú¼ÊMBAѧԱ£¬¼´½«ÓڽñÄê9Ô½øÇ廪´óѧ¡£
> >>>>
> >>>> Ç廪´óѧ¾­¹ÜѧԺԺ³¤ÕԴ¿¾ù×òÌìÔڽÓÊܼÇÕ߲ɷÃʱ±íʾ£¬Ç廪×Ô1996ÄêÓëÂéʡÀí¹¤ºÏ×÷ÒÔ4£¬Ö}ñÒÑÅàÑøÁËÈýÆڹ²100ÓàÃû¹ú
> >>>> ¼ÊMBAѧԱ£¬ÕâЩ±ÏҵѧԱһ°㶼ȥÁ˹úÄÚÍâµĿç¹úÆóҵºÍ×é֯£¬һֱ¹©²»ӦÇó£¬ÓеÄÔڶ̶Ì}ÈýÄêÄڣ¬¾ÍÒѾ­×øµ½ÁËCEO
> >>>> µÄλÖã»ÔںÏ×÷Æڼ䣬Ç廪ÏȺóÅɳö20ÓàÃû½Ìʦ£¬ÔÚ˹¡¾­¼ùÜÀíѧԺѧϰ¡£ÕԴ¿¾ùԺ³¤»¹͸¶£¬ÔڹýȥµĺÏ×÷ÖУ¬Ç廪·½ÃæµÄÖ÷
> >>>> Ҫ¾«f¼¯ÖÐÔÚÁ˱ØÐ޿εĽ¨É跽Ã棬Ôڵڶþ¸öÎåÄêµĺÏ×÷ÖУ¬·ÅÔÚѡÐ޿η½Ãæµľ«f»á¶àһЩ¡£ͬʱ£¬ÕâÖÖÀàËÆÓڼ¼ÊõתÈõÄģ
> >>>> ʽ£¬Ôھ­ÀúÁËÎåÄêµĺÏ×÷ºóÒѾ­ʵÏÖÁ˽«¹úÍâ³ÉÊìµÄMBAÅàÑøģʽÒÆֲµ½¹úÄڵÄĿ±ꡣ
> >>>>
> >>>> ×òÌìר³̵½±±¾©²μÓÇ廪2001½ì¹ú¼ÊMBA±ÏҵµäÀñµÄ˹¡¹ÜÀíѧԺǰÈÎԺ³¤¡¢ÖøÃû¹ú¼ʾ­¼Ãѧ¼Ò˹Èô½ÌÊÚҲÈÏΪ˫·½ǰÎåÄê
> >>>> µĺÏ×÷·dz£³ɹ¦£¬ËûÔø¾­ͨ¹ýÊÓƵ»áÒéµķ½ʽ¡°Ãæ¶ÔÃ桱µØΪÇ廪¹ú¼ÊMBAѧԱÊڹý¿Σ¬ËûÈÏΪÖйúµĹú¼ÊMBAÏîĿÒԼ°Åà
> >>>> Ñø³öµÄѧԱ¶¼²»±ÈÃ9ú²ΩһµÄȱµãÔÚÓÚѧÉúµijɷÖȱ·¦¹ú¼ÊÐԣ¬ÔÚÃ9ú´óԼ40£¥µÄѧԱ4×ÔÓÚÆäËû²»ͬµĹú¼ң¬¶øÔÚÖйú¼¸ºõ
> >>>> ÊÇÇåһɫµĹúÄÚѧԱ¡£
> >>>>
> >>>></CENTER>
> >>>><BR/>
> >>>><BR/>
> >>>>]]></jsp:text>
> >>>><jsp:text>CharacterEncoding:
> >>>></jsp:text>
> >>>><jsp:expression>response.getCharacterEncoding()</jsp:expression>
> >>>><jsp:text>&lt;BR/&gt;</jsp:text>
> >>>><jsp:text><![CDATA[
> >>>></body>
> >>>></html>
> >>>>]]></jsp:text>
> >>>></jsp:root>
> >>>>
> >>>>______________________________________________________________________
> >>>>
> >>>>---------------------------------------------------------------------
> >>>>To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
> >>>>For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
> >>>For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
> >>>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
> >>For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
> > For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
>