RFC-2047 Header Character Set Encoding JK + Tomcat 5

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC-2047 Header Character Set Encoding JK + Tomcat 5

Guernsey, Byron (GE Consumer & Industrial)

Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character
sets? (ie, does it support RFC-2047)

We use some single sign-on plugin's at the web server (apache 2) that
set specific headers which may contain international characters.  The
headers are being returned by Tomcat to jsps/servlets in such a way that
the strings decode properly only if the browser is forced to view them
as UTF-8.

This implies that the values are actually UTF-8 encoded, but improperly
assumed to be ISO-8859-1 as some point.

I have not yet tracked down which component in the chain is at fault. It
may very well be that the SSO plugin is calling the Apache API to set
Headers with UTF-8 values when they accept only ISO-8859-1 values, or
values encoded per RFC-2047.

I'd like to find out what mod_jk expects the header values to be when it
retrieves them from Apache, and whether Tomcat supports RFC-2047
decoding of header values.

If anyone has any experience with this, or can refer me to a discussion
or thread about this very item, I'd greatly appreciate the tip.  I'm not
looking forward to the amount of inspection I'm going to have to do to
find the culprit.

thanks,
Byron


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: RFC-2047 Header Character Set Encoding JK + Tomcat 5

Tim Funk
You may need to add this to your Connector declaration:
URIEncoding="UTF-8"

-Tim

Guernsey, Byron (GE Consumer & Industrial) wrote:

> Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character
> sets? (ie, does it support RFC-2047)
>
> We use some single sign-on plugin's at the web server (apache 2) that
> set specific headers which may contain international characters.  The
> headers are being returned by Tomcat to jsps/servlets in such a way that
> the strings decode properly only if the browser is forced to view them
> as UTF-8.
>
> This implies that the values are actually UTF-8 encoded, but improperly
> assumed to be ISO-8859-1 as some point.
>
> I have not yet tracked down which component in the chain is at fault. It
> may very well be that the SSO plugin is calling the Apache API to set
> Headers with UTF-8 values when they accept only ISO-8859-1 values, or
> values encoded per RFC-2047.
>
> I'd like to find out what mod_jk expects the header values to be when it
> retrieves them from Apache, and whether Tomcat supports RFC-2047
> decoding of header values.
>
> If anyone has any experience with this, or can refer me to a discussion
> or thread about this very item, I'd greatly appreciate the tip.  I'm not
> looking forward to the amount of inspection I'm going to have to do to
> find the culprit.
>
> thanks,
> Byron
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RFC-2047 Header Character Set Encoding JK + Tomcat 5

Guernsey, Byron (GE Consumer & Industrial)
In reply to this post by Guernsey, Byron (GE Consumer & Industrial)

Does URIEncoding affect all HTTP headers or only the URIs?

Thanks,
Byron


-----Original Message-----
From: Tim Funk [mailto:[hidden email]]
Sent: Wednesday, July 13, 2005 6:31 AM
To: Tomcat Users List
Subject: Re: RFC-2047 Header Character Set Encoding JK + Tomcat 5

You may need to add this to your Connector declaration:
URIEncoding="UTF-8"

-Tim

Guernsey, Byron (GE Consumer & Industrial) wrote:

> Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character

> sets? (ie, does it support RFC-2047)
>
> We use some single sign-on plugin's at the web server (apache 2) that
> set specific headers which may contain international characters.  The
> headers are being returned by Tomcat to jsps/servlets in such a way
> that the strings decode properly only if the browser is forced to view

> them as UTF-8.
>
> This implies that the values are actually UTF-8 encoded, but
> improperly assumed to be ISO-8859-1 as some point.
>
> I have not yet tracked down which component in the chain is at fault.
> It may very well be that the SSO plugin is calling the Apache API to
> set Headers with UTF-8 values when they accept only ISO-8859-1 values,

> or values encoded per RFC-2047.
>
> I'd like to find out what mod_jk expects the header values to be when
> it retrieves them from Apache, and whether Tomcat supports RFC-2047
> decoding of header values.
>
> If anyone has any experience with this, or can refer me to a
> discussion or thread about this very item, I'd greatly appreciate the
> tip.  I'm not looking forward to the amount of inspection I'm going to

> have to do to find the culprit.
>
> thanks,
> Byron
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RFC-2047 Header Character Set Encoding JK + Tomcat 5

Guernsey, Byron (GE Consumer & Industrial)
In reply to this post by Guernsey, Byron (GE Consumer & Industrial)

For others who might be interested- and the tomcat developers should
correct me if I'm wrong since this goes into the archive, Tomcat 5.5.9
or < does not appear to support RFC-2047 for processing MIME-Headers
that use different character encodings besides ISO-8859-1.

Searching through 1000's of lines of tomcat code, as best I could tell,
the code always assumes headers are of ISO-8859-1 type... from the
MimeHeaders class down to the ChunkByte class.  While both appear to
have the ability to specify encoding, they correctly assume the default
to be ISO and from what I could tell, the code parsing headers from the
Request does nothing to change this.

I could find no provisions for processing RFC-2047 compliant headers in
any of the connectors.  Listed here:
http://www.faqs.org/rfcs/rfc2047.html and referenced from the HTTP 1.1
RFC listed here: http://www.faqs.org/rfcs/rfc2616.html (see section 2.2
on basic rules for TEXT, and the definition of headers in section 4.2)
and references in JSR-154 servlet 2.4 spec.  Is Tomcat still considered
a reference implementation?

I hope this helps all who run into similar issues and can find no
information on them.  Now on to the Apache 2 source code to see if it
specifies the format required in the Header module API.

Byron

Keywords: International Headers UTF-8 ISO-8859-1 RFC-2047

-----Original Message-----
From: Guernsey, Byron (GE Consumer & Industrial)
Sent: Tuesday, July 12, 2005 4:16 PM
To: Tomcat Users List
Subject: RFC-2047 Header Character Set Encoding JK + Tomcat 5


Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character
sets? (ie, does it support RFC-2047)

We use some single sign-on plugin's at the web server (apache 2) that
set specific headers which may contain international characters.  The
headers are being returned by Tomcat to jsps/servlets in such a way that
the strings decode properly only if the browser is forced to view them
as UTF-8.

This implies that the values are actually UTF-8 encoded, but improperly
assumed to be ISO-8859-1 as some point.

I have not yet tracked down which component in the chain is at fault. It
may very well be that the SSO plugin is calling the Apache API to set
Headers with UTF-8 values when they accept only ISO-8859-1 values, or
values encoded per RFC-2047.

I'd like to find out what mod_jk expects the header values to be when it
retrieves them from Apache, and whether Tomcat supports RFC-2047
decoding of header values.

If anyone has any experience with this, or can refer me to a discussion
or thread about this very item, I'd greatly appreciate the tip.  I'm not
looking forward to the amount of inspection I'm going to have to do to
find the culprit.

thanks,
Byron


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]