Encoding properties

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Encoding properties

Luis Arriaga

Hi Apache team,

 

We are updating our encoding implementation on Tomcat and the ServiceNow platform from ISO-8859-1 to UTF-8 and ran into some concerns. There is a URIEncoding property that defaults to UTF-8 if it is not specified. Is there any reason there is no BodyEncoding property or is there a workaround you guys are aware of that does not require the source code to be modified? Looking through the Tomcat source code the default body encoding seems to be ISO-8859-1, looking at the Parameters, ByteChunk, and Constants classes there are two variables DEFAULT_BODY_CHARSET and DEFAULT_CHARSET that determine the body charset/encoding.

We have forked the Tomcat source code and applied the changes below which fixed the issue. Are you guys aware of this? Seems strange to have a URIEncoding property but not a BodyEncoding property unless I am missing something. Maybe this is an enhancement request we can submit unless there is a valid reason to not have such property.

diff --git a/java/org/apache/coyote/Constants.java b/java/org/apache/coyote/Constants.java

index 9de194d55..0883904f4 100644

--- a/java/org/apache/coyote/Constants.java

+++ b/java/org/apache/coyote/Constants.java

@@ -33,7 +33,7 @@ public final class Constants {

     public static final String DEFAULT_CHARACTER_ENCODING="ISO-8859-1";

     public static final Charset DEFAULT_URI_CHARSET = StandardCharsets.ISO_8859_1;

-    public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1;

+    public static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8;

     public static final int MAX_NOTES = 32;

diff --git a/java/org/apache/tomcat/util/buf/ByteChunk.java b/java/org/apache/tomcat/util/buf/ByteChunk.java

index 555c0f6b8..ed9f6e5ea 100644

--- a/java/org/apache/tomcat/util/buf/ByteChunk.java

+++ b/java/org/apache/tomcat/util/buf/ByteChunk.java

@@ -123,7 +123,7 @@ public final class ByteChunk extends AbstractChunk {

      * standards seem to converge, but the servlet API requires 8859_1, and this

      * object is used mostly for servlets.

      */

-    public static final Charset DEFAULT_CHARSET = StandardCharsets.ISO_8859_1;

+    public static final Charset DEFAULT_CHARSET = StandardCharsets.UTF_8;

     private transient Charset charset;

diff --git a/java/org/apache/tomcat/util/http/Parameters.java b/java/org/apache/tomcat/util/http/Parameters.java

index 4d7d6cc1e..f59f75514 100644

--- a/java/org/apache/tomcat/util/http/Parameters.java

+++ b/java/org/apache/tomcat/util/http/Parameters.java

@@ -266,7 +266,7 @@ public final class Parameters {

      */

     @Deprecated

     public static final String DEFAULT_ENCODING = "ISO-8859-1";

-    private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.ISO_8859_1;

+    private static final Charset DEFAULT_BODY_CHARSET = StandardCharsets.UTF_8;

     private static final Charset DEFAULT_URI_CHARSET = StandardCharsets.UTF_8;

 

_____________________________________________

Luis Arriaga

Software Engineer

M: +17605192599

servicenow.com

LinkedIn | Twitter | YouTube | Facebook

Reply | Threaded
Open this post in threaded view
|

Re: Encoding properties

markt
On 14/01/2020 19:33, Luis Arriaga wrote:

> Hi Apache team,
>
> We are updating our encoding implementation on Tomcat and the ServiceNow
> platform from ISO-8859-1 to UTF-8 and ran into some concerns. There is a
> URIEncoding property that defaults to UTF-8 if it is not specified. Is
> there any reason there is no BodyEncoding property or is there a
> workaround you guys are aware of that does not require the source code
> to be modified? Looking through the Tomcat source code the default body
> encoding seems to be ISO-8859-1, looking at the Parameters, ByteChunk,
> and Constants classes there are two variables DEFAULT_BODY_CHARSET and
> DEFAULT_CHARSET that determine the body charset/encoding.
>
> We have forked the Tomcat source code and applied the changes below
> which fixed the issue. Are you guys aware of this? Seems strange to have
> a URIEncoding property but not a BodyEncoding property unless I am
> missing something. Maybe this is an enhancement request we can submit
> unless there is a valid reason to not have such property.

Please read the FAQ.

https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding#CharacterEncoding-Q3

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]