[Bug 61197] New: Breaking change in Content-Type / Character Encoding handling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61197] New: Breaking change in Content-Type / Character Encoding handling

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61197

            Bug ID: 61197
           Summary: Breaking change in Content-Type / Character Encoding
                    handling
           Product: Tomcat 8
           Version: 8.5.15
          Hardware: All
                OS: All
            Status: NEW
          Severity: regression
          Priority: P2
         Component: Catalina
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ----

I *believe* this constitutes some level of regression, based on distinct
difference from prior behaviour, but please correct me if I'm wrong :) Also I
couldn't find any clear mention of this change in the change log for 8.5.15.

Prior to 8.5.15 (specifically, this commit:
https://github.com/apache/tomcat/commit/b2bab804b543bfe181fe435efe35628ce0e21b39)
the behaviour of `org.apache.catalina.connector.Response` when setting the
content-type with encoding parameter included, e.g.
`setContentType("application/json;charset=MS932")`, was to simply take the
provided encoding string and set this for the output.

As long as the character set was supported by the JVM (as a specific code page,
or an alias of one of the supported code pages), requests would return with the
*exact* character set string provided.

Since the above commit / 8.5.15 release, this is now forcibly modified with no
option to disable such behaviour. For instance, if I specify "MS932" or
"windows-932" this is replaced now with "windows-31j" , or "eucjis" with
"EUC-JP", "sjis" with "Shift-JIS", etc.

This may seem like a reasonable behaviour for modern systems that we would
*hope* support mapping aliased encodings, but with legacy systems unable to
handle this (and any system that, stupidly or otherwise, checks for a specific
encoding string, possibly in a case-sensitive manner), suddenly we have broken
behaviour. The client expects one encoding string and receives something
equivalent but that it just can't handle.

Unfortunately I'm now stuck in this situation as a legacy-systems integrations
engineer. We *have* to be able to provide our output with very specific
encoding strings set or else several dozen systems we (sadly) can't change will
break. Thankfully we caught this in internal testing of the upgrade to 8.5.15
and can put it off temporarily, but we're now also stuck with either needing to
maintain our own patched version of Tomcat to revert this behaviour, not
continue updating (not a real option given security requirements), or possibly
review migrating to an alternative servlet container (please no q_q).

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61197] Breaking change in Content-Type / Character Encoding handling

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61197

--- Comment #1 from Mark Thomas <[hidden email]> ---
The change relates to this entry in the change log:

<quote>
Start to switch to using Charset rather than String to store encoding
configuration settings to reduce the number of places the associated Charset
needs to be looked up. (markt)
</quote>

The primary drivers for the change were performance (the repeated String ->
Charset calls were relatively expensive) and earlier error reporting when an
invalid value was provided.

There might be an alternative way of setting the charset that avoids this
restriction. I'll take a look. If that doesn't work, preserving the user
provided value is another option.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61197] Breaking change in Content-Type / Character Encoding handling

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61197

Mark Thomas <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Mark Thomas <[hidden email]> ---
Fixed in:
- trunk for 9.0.0.M22 onwards
- 8.5.x for 8.5.16 onwards

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...