users@glassfish.java.net

Re: RE: character encoding

From: <glassfish_at_javadesktop.org>
Date: Thu, 13 May 2010 18:44:14 PDT

From another forum i got my eyes opened to curl so i tried this one time. I put this in a terminal window and then i got this below here;

curl -I http://neptunediving.com/neptune/index.jsp

HTTP/1.1 200 OK
Date: Thu, 13 May 2010 19:33:23 GMT
Server: GlassFish v3
X-Powered-By: JSP/2.1
Content-Type: text/html; Charset=UTF-8;charset=ISO-8859-1
Content-Language: en-US
Transfer-Encoding: chunked
Set-Cookie: JSESSIONID=3282bd6a29d9d46bef242a1b513f; Path=/neptune

And here clearly is something wrong, i get Content-Type: text/html; Charset=UTF-8;charset=ISO-8859-1 Why this i have no idea but maybe you can help me with this.

So we seem to find out the same problem here you and me. So now the question what is wrong here and what do i need to do to fix this? I have no really idea why the HTTP Header is doing this or how? If you more time to help me it would be very appreciated! If not i understand also since you have other things to do also. Thanks a lot for all your help so far Stijn!!!

PS: I will go thru the codes snippets you talked about as well one time. Then i have no HTML entities left in my page. Just use the real characters: € , © and é. With this you mean that i write them as they are in my source code? Not in any coded way?



Hi again, torleif. First of all apologies for the delayed response. I am
doing this in my spare time and sometimes other things come up that prevent
me from being responsive. I have had a loot at your site neptunediving.

First of all I see that in some places you are using HTML entities, such as
– ... Don't.
The nice thing of unicode is that you should be able to use all characters
as is. No funny &something; required... except for:

& = &
= >
" = "
' = '

.. because those are special characters in XML. Those 5 are all you will
ever need.

I'll go even further and say that, even though it doesn't really hurt, you
should NEVER use any other HTML entities. No € , no © no é
Just use the real characters: € , © and é.

Maybe, only because the actual character is invisible... but you
shouldn't use anyway. It's used for all the wrong reasons on the web
these days. If you consider it wrong to use these entities it will lead to
the best pages in the long run.

Now back to your problem. First of all, I think your page is actually almost
fine. You are nearly there.

However, if I view your page with Firefox, I see that it is trying to show
it as 'Western (ISO-8859-1)'. You can see this by looking in the menu
View -> Character Encoding -> (selected encoding). For your page it has
'Western (ISO-8859-1)' selected. This translates into some characters being
mangled. However, if I pick 'UTF-8' from this menu, it works good.

So why does Firefox choose 'Western (ISO-8859-1)'?

Because you are telling it to do so.... To see what I mean, please download
the 'Live HTTP Headers' plugin for firefox:

https://addons.mozilla.org/nl/firefox/search?q=Live%20HTTP%20Headers
menu Extra -> Add-ons, button 'Acquire Add-ons', type 'Live HTTP Headers' in
the search box, press Enter and once you have found it, Install it. Restart
firefox after installing and it's ready for use.

In menu Extra you should now have the option 'Live HTTP Headers'. Select
that and a window opens. It has tabs Headers, Generator, Configure and
About. Make sure tab Headers is selected and checkbox 'Capture' is checked.
(note I am using Dutch version myself, so my label translations might be
slightly off)

Now visist your page again and you will see the headers your browser is
sending to the server and the response the server is sending:

http://www.neptunediving.com/neptune/

GET /neptune/ HTTP/1.1
Host: www.neptunediving.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; nl; rv:1.9.2)
Gecko/20100115 Firefox/3.6 ( .NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: nl,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: JSESSIONID=135cda96326513825a1134e17dc3;
__utma=112243995.225387687.1273747211.1273747211.1273747211.1;
__utmb=112243995.6.10.1273747211;
__utmz=112243995.1273747211.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
__utmc=112243995

HTTP/1.1 200 OK
Date: Thu, 13 May 2010 10:51:41 GMT
Server: GlassFish v3
X-Powered-By: JSP/2.1
Content-Type: text/html; Charset=UTF-8;charset=ISO-8859-1
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked

(only top headers of first request and response shown here)

In the above headers, the browser was sending a GET request to
www.neptunediving.com/neptune and the server was sending response HTTP/1.1
200 OK. Notice that it says Server: GlassFish v3 and most importantly:

Content-Type: text/html; Charset=UTF-8;charset=ISO-8859-1

That's wrong. It should have been:

Content-Type: text/html; charset=UTF-8

I don't know yet *why* your server is sending that, but that is the cause, I
am sure. If you fix that, your page will look right.

Can you paste the top part of the JSP you made for this page? I suspect it
maybe has the wrong heading in it. It should have:

<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>

(BTW, from your comments I assume you are using JSP. If you are using
something else, such as JSF Facelets, let us know).

Good luck with it,

-StijnHi again, torleif. First of all apologies for the delayed response. I am
doing this in my spare time and sometimes other things come up that prevent
me from being responsive. I have had a loot at your site neptunediving.

First of all I see that in some places you are using HTML entities, such as
– ... Don't.
The nice thing of unicode is that you should be able to use all characters
as is. No funny &something; required... except for:

& = &
= >
" = "
' = '

.. because those are special characters in XML. Those 5 are all you will
ever need.

I'll go even further and say that, even though it doesn't really hurt, you
should NEVER use any other HTML entities. No € , no © no é
Just use the real characters: € , © and é.

Maybe, only because the actual character is invisible... but you
shouldn't use anyway. It's used for all the wrong reasons on the web
these days. If you consider it wrong to use these entities it will lead to
the best pages in the long run.

Now back to your problem. First of all, I think your page is actually almost
fine. You are nearly there.

However, if I view your page with Firefox, I see that it is trying to show
it as 'Western (ISO-8859-1)'. You can see this by looking in the menu
View -> Character Encoding -> (selected encoding). For your page it has
'Western (ISO-8859-1)' selected. This translates into some characters being
mangled. However, if I pick 'UTF-8' from this menu, it works good.

So why does Firefox choose 'Western (ISO-8859-1)'?

Because you are telling it to do so.... To see what I mean, please download
the 'Live HTTP Headers' plugin for firefox:

https://addons.mozilla.org/nl/firefox/search?q=Live%20HTTP%20Headers
menu Extra -> Add-ons, button 'Acquire Add-ons', type 'Live HTTP Headers' in
the search box, press Enter and once you have found it, Install it. Restart
firefox after installing and it's ready for use.

In menu Extra you should now have the option 'Live HTTP Headers'. Select
that and a window opens. It has tabs Headers, Generator, Configure and
About. Make sure tab Headers is selected and checkbox 'Capture' is checked.
(note I am using Dutch version myself, so my label translations might be
slightly off)

Now visist your page again and you will see the headers your browser is
sending to the server and the response the server is sending:

http://www.neptunediving.com/neptune/

GET /neptune/ HTTP/1.1
Host: www.neptunediving.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; nl; rv:1.9.2)
Gecko/20100115 Firefox/3.6 ( .NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: nl,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: JSESSIONID=135cda96326513825a1134e17dc3;
__utma=112243995.225387687.1273747211.1273747211.1273747211.1;
__utmb=112243995.6.10.1273747211;
__utmz=112243995.1273747211.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
__utmc=112243995

HTTP/1.1 200 OK
Date: Thu, 13 May 2010 10:51:41 GMT
Server: GlassFish v3
X-Powered-By: JSP/2.1
Content-Type: text/html; Charset=UTF-8;charset=ISO-8859-1
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked

(only top headers of first request and response shown here)

In the above headers, the browser was sending a GET request to
www.neptunediving.com/neptune and the server was sending response HTTP/1.1
200 OK. Notice that it says Server: GlassFish v3 and most importantly:

Content-Type: text/html; Charset=UTF-8;charset=ISO-8859-1

That's wrong. It should have been:

Content-Type: text/html; charset=UTF-8

I don't know yet *why* your server is sending that, but that is the cause, I
am sure. If you fix that, your page will look right.

Can you paste the top part of the JSP you made for this page? I suspect it
maybe has the wrong heading in it. It should have:

<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>

(BTW, from your comments I assume you are using JSP. If you are using
something else, such as JSF Facelets, let us know).

Good luck with it,

-Stijn
[Message sent by forum member 'torleif67']

http://forums.java.net/jive/thread.jspa?messageID=469761