The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

Bruce Huang
Hi all,

We have placed a file named 檔名.txt into
the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
retrieve the file by an HTTP GET request from the URL, for example,
http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)

When it was on tomcat v8.0.23, everything works fine. However, after we
have migrated to the v8.0.43, the client app will receive response with
HTTP 400 Bad Request. The code that our client app used as below. Looks
like that it didn't encode the URL path and only translate the whitespace
to %20.

Is there any solution that we can configure the tomcat 8.0.43 to make this
case works as usual(On tomcat v8.0.23), since there are lots of client
app deployed?

        SpaceToTwenty(szServerPath, szBuf, MAXURLSIZE);

        memset(szServerPath, 0, MAXURLSIZE);

        strcpy(szServerPath, szBuf);



        memset(szSendBuf, 0, SEND_BUF_SIZE);

        // the buffer for sending to the socket

        sprintf(szSendBuf, "GET %s HTTP/1.1\r\nHost:%s\r\n"

                           "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"

                           "Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3\r\n"

                           "Accept-Encoding: gzip, deflate\r\n"

                           "Connection: keep-alive\r\n\r\n",

                           szServerPath, szServerIP);

        LOG(LOG_ERROR, "[DL_Download] szServerPath: %s, szServerIP: %s,",

                        szServerPath, szServerIP);

        // create a socket for sending request

        SockCtx = SOCKET_Create(szServerIP, iServerPort, bSSLEnable,
CERTF_PATH);

        if (SockCtx == NULL)

        {

                LOG(LOG_ERROR, "[DL_Download] Socket Create Error!!!\n");

                iReturn = _ERROR;

                goto FUNC_EXIT;

        }



        SOCKET_Send(SockCtx, szSendBuf, strlen(szSendBuf));

        memset(szRecvBuf, 0, RECV_BUF_SIZE);

        iRecvBytes = SOCKET_Recv(SockCtx, szRecvBuf, sizeof(szRecvBuf));

        if (iRecvBytes <= 0)

        {

                LOG(LOG_ERROR, "[DL_Download] Socket Recv Error!!!
iRecvBytes = %d\n",iRecvBytes);

                iReturn = _ERROR;

                goto FUNC_EXIT;

        }



        memset(szHttpStatus, 0, sizeof(szHttpStatus));

        strncpy(szHttpStatus, szRecvBuf, strstr(szRecvBuf, "\r\n") -
szRecvBuf);

        // here it will receive the HTTP 400 Bad Request on the tomcat
v8.0.43

        // the szHttpStatus is 400

        if (strstr(szHttpStatus, "200 OK") == NULL)

        {

                LOG(LOG_ERROR, "[DL_Download] Http Status != 200, Status =
%s\n",szHttpStatus);

                iReturn = _ERROR;

                goto FUNC_EXIT;

        }



int SpaceToTwenty(char* szSrc, char* szDst, int iLen)

{

        int iReturn = _SUCCESS;

        char* c1;

        char* c2;

        char* c;

        int new_string_length = 0;



        for (c = szSrc; *c != '\0'; c++)

        {

                if (*c == ' ')

                        new_string_length += 2;

                new_string_length++;

        }



        if (new_string_length >= iLen)

                func_exit(_ERROR);



        memset(szDst, 0, iLen);

        for (c1 = szSrc, c2 = szDst; *c1 != '\0'; c1++)

        {

                if (*c1 == ' ')

                {

                        c2[0] = '%';

                        c2[1] = '2';

                        c2[2] = '0';

                        c2 += 3;

                }

                else

                {

                        *c2 = *c1;

                        c2++;

                }

        }

        *c2 = '\0';

FUNC_EXIT:

        return iReturn;

}




Thanks,

Bruce
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

markt
On 01/08/17 03:26, Bruce Huang wrote:

> Hi all,
>
> We have placed a file named 檔名.txt into
> the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
> retrieve the file by an HTTP GET request from the URL, for example,
> http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)
>
> When it was on tomcat v8.0.23, everything works fine. However, after we
> have migrated to the v8.0.43, the client app will receive response with
> HTTP 400 Bad Request. The code that our client app used as below. Looks
> like that it didn't encode the URL path and only translate the whitespace
> to %20.
>
> Is there any solution that we can configure the tomcat 8.0.43 to make this
> case works as usual(On tomcat v8.0.23), since there are lots of client
> app deployed?

Sorry, no. This is part of the fix for CVE-2016-6816.

Options have since been added to allow some illegal characters through
but they will not be sufficient to allow the full range of UTF-8 bytes.

The fix was added to 8.0.39 so any version up to 8.0.38 should work for you.

You might be able to put a more lenient reverse proxy in front of Tomcat
which will accept these characters and then pass the request (correctly
encoded) to Tomcat. However that depends on finding a suitable reverse
proxy.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

André Warnier (tomcat)
In reply to this post by Bruce Huang
On 01.08.2017 04:26, Bruce Huang wrote:
> Hi all,
>
> We have placed a file named 檔名.txt into
> the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
> retrieve the file by an HTTP GET request from the URL, for example,
> http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)

This is one of those cases where it can get very confusing very quickly, because of the
multiple opportunities for things to get encoded/decoded or not, and to be /seen/ as
encoded/decoded or not. (Such as : are we really seeing the above URL as you meant to send
it, or are we seeing some other form, as encoded by the email systems in-between ?)

Strictly speaking, according to the relevant Internet HTTP RFCs (and which ones are
relevant can be yet another confusing matter), you MAY NOT include the above Chinese
characters directly in a URL string. The set of characters/bytes allowed in a URL string
is very restrictive, and in any case does not include even the individual bytes which
would result from encoding the above Unicode characters as UTF-8.
(See : https://tools.ietf.org/html/rfc3986#section-2)

Before you send out this URL from the client, you would have to :
- encode the above Chinese characters as a UTF-8 byte sequence. This would probably result
in 3 bytes or more per character, so let's say 6 bytes in total.
- then, for each of the 6 bytes, you would have to check if they are within the range of
bytes allowed in a URL, and if not, /that/ byte should be encoded/escaped as a "%xy"
3-character ASCII byte sequence. (There are many existing functions to do that).

Then on the server side receiving this URL, the opposite transformation should take place :
- the first step would be to "%-decode" the URL string, to restore the original bytes
which the client wanted to send. To my knowledge, all HTTP servers do that.

- then, the server and the application would have to /assume/ that URLs received from your
clients are always Unicode, UTF-8 encoded.  That is (still) not the default in HTTP (the
default is still ISO-8859-1). (And there is no mechanism in the current RFCs, that allow
either client or server to indicate, in the request itself, what character set the request
URL really is written in, or should be).
But you can force Tomcat to assume this, see :
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
--> URIEncoding
(and there is also "useBodyEncodingForURI", but that does not apply in your particular case)
- the next step would thus be for the application (e.g. the default servlet), to /assume/
that this URL is Unicode/UTF-8, and decode this into a corresponding internal Unicode string.
- and then comes the step of looking for the corresponding file in the filesystem, by the
name you got from the previous step. And depending on the OS and the filesystem, this may
be character-set-agnostic or not, and may be case-agnostic or not.
(But your problem is currently not that it does not find the file; it is that the HTTP
request itself gets rejected as invalid. So your request URI contains bytes which the
server considers - rightly or not - as invalid in a URL.)

[rant]
In other words and basically, no wonder that developers (of servers as well as of
applications) get confused from time to time, and maybe unwittingly introduce bugs when
trying to handle URLs and/or content that is anything else than English.
In that respect, the HTTP protocols are still hopelessly outdated and obnoxious when
handling the vast amounts of languages which are in use in today's real-life Internet.

And it is a never-ending wonder to me why whoever are in charge of these things, have
apparently not yet made a serious attempt at publishing a new set of coordinated HTTP (and
HTML, and CGI, and Javascript etc.) versions which would make Unicode/UTF-8 the default
charset/encoding (for URLs as well as for text content), instead of the long-obsolete
ASCII and ISO-8859-1 character sets. I would bet that millions of useless work-hours would
be saved worldwide every year by such a change.
[end of rant]


>
> When it was on tomcat v8.0.23, everything works fine. However, after we
> have migrated to the v8.0.43, the client app will receive response with
> HTTP 400 Bad Request.

Most probably, that was a correction in Tomcat, which previously did not properly reject
some URLs which are invalid according to the existing (deficient) RFCs.

The code that our client app used as below. Looks
> like that it didn't encode the URL path and only translate the whitespace
> to %20.

Exactly. You app has to encode that URL properly before issuing the request.

>
> Is there any solution that we can configure the tomcat 8.0.43 to make this
> case works as usual(On tomcat v8.0.23), since there are lots of client
> app deployed?
>

If "as usual" was wrong and/or could cause security issues, your chances are slim, and you
will have to update your app.


>          SpaceToTwenty(szServerPath, szBuf, MAXURLSIZE);
>
>          memset(szServerPath, 0, MAXURLSIZE);
>
>          strcpy(szServerPath, szBuf);
>
>
>
>          memset(szSendBuf, 0, SEND_BUF_SIZE);
>
>          // the buffer for sending to the socket
>
>          sprintf(szSendBuf, "GET %s HTTP/1.1\r\nHost:%s\r\n"
>
>                             "Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
>
>                             "Accept-Language: zh-tw,en-us;q=0.7,en;q=0.3\r\n"
>
>                             "Accept-Encoding: gzip, deflate\r\n"
>
>                             "Connection: keep-alive\r\n\r\n",
>
>                             szServerPath, szServerIP);
>
>          LOG(LOG_ERROR, "[DL_Download] szServerPath: %s, szServerIP: %s,",
>
>                          szServerPath, szServerIP);
>
>          // create a socket for sending request
>
>          SockCtx = SOCKET_Create(szServerIP, iServerPort, bSSLEnable,
> CERTF_PATH);
>
>          if (SockCtx == NULL)
>
>          {
>
>                  LOG(LOG_ERROR, "[DL_Download] Socket Create Error!!!\n");
>
>                  iReturn = _ERROR;
>
>                  goto FUNC_EXIT;
>
>          }
>
>
>
>          SOCKET_Send(SockCtx, szSendBuf, strlen(szSendBuf));
>
>          memset(szRecvBuf, 0, RECV_BUF_SIZE);
>
>          iRecvBytes = SOCKET_Recv(SockCtx, szRecvBuf, sizeof(szRecvBuf));
>
>          if (iRecvBytes <= 0)
>
>          {
>
>                  LOG(LOG_ERROR, "[DL_Download] Socket Recv Error!!!
> iRecvBytes = %d\n",iRecvBytes);
>
>                  iReturn = _ERROR;
>
>                  goto FUNC_EXIT;
>
>          }
>
>
>
>          memset(szHttpStatus, 0, sizeof(szHttpStatus));
>
>          strncpy(szHttpStatus, szRecvBuf, strstr(szRecvBuf, "\r\n") -
> szRecvBuf);
>
>          // here it will receive the HTTP 400 Bad Request on the tomcat
> v8.0.43
>
>          // the szHttpStatus is 400
>
>          if (strstr(szHttpStatus, "200 OK") == NULL)
>
>          {
>
>                  LOG(LOG_ERROR, "[DL_Download] Http Status != 200, Status =
> %s\n",szHttpStatus);
>
>                  iReturn = _ERROR;
>
>                  goto FUNC_EXIT;
>
>          }
>
>
>
> int SpaceToTwenty(char* szSrc, char* szDst, int iLen)
>
> {
>
>          int iReturn = _SUCCESS;
>
>          char* c1;
>
>          char* c2;
>
>          char* c;
>
>          int new_string_length = 0;
>
>
>
>          for (c = szSrc; *c != '\0'; c++)
>
>          {
>
>                  if (*c == ' ')
>
>                          new_string_length += 2;
>
>                  new_string_length++;
>
>          }
>
>
>
>          if (new_string_length >= iLen)
>
>                  func_exit(_ERROR);
>
>
>
>          memset(szDst, 0, iLen);
>
>          for (c1 = szSrc, c2 = szDst; *c1 != '\0'; c1++)
>
>          {
>
>                  if (*c1 == ' ')
>
>                  {
>
>                          c2[0] = '%';
>
>                          c2[1] = '2';
>
>                          c2[2] = '0';
>
>                          c2 += 3;
>
>                  }
>
>                  else
>
>                  {
>
>                          *c2 = *c1;
>
>                          c2++;
>
>                  }
>
>          }
>
>          *c2 = '\0';
>
> FUNC_EXIT:
>
>          return iReturn;
>
> }
>
>
>
>
> Thanks,
>
> Bruce
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: The GET request encounters 400 Bad Request from a URL with Chinese words on Tomcat 8.0.43

Bruce Huang
Mark Thomas <[hidden email]> 於 2017年8月1日 週二 下午7:37寫道:

> On 01/08/17 03:26, Bruce Huang wrote:
> > Hi all,
> >
> > We have placed a file named 檔名.txt into
> > the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
> > retrieve the file by an HTTP GET request from the URL, for example,
> > http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)
> >
> > When it was on tomcat v8.0.23, everything works fine. However, after we
> > have migrated to the v8.0.43, the client app will receive response with
> > HTTP 400 Bad Request. The code that our client app used as below. Looks
> > like that it didn't encode the URL path and only translate the whitespace
> > to %20.
> >
> > Is there any solution that we can configure the tomcat 8.0.43 to
> make this
> > case works as usual(On tomcat v8.0.23), since there are lots of client
> > app deployed?
>
> Sorry, no. This is part of the fix for CVE-2016-6816.
>
> Options have since been added to allow some illegal characters through
> but they will not be sufficient to allow the full range of UTF-8 bytes.
>
> The fix was added to 8.0.39 so any version up to 8.0.38 should work for
> you.
>
> You might be able to put a more lenient reverse proxy in front of Tomcat
> which will accept these characters and then pass the request (correctly
> encoded) to Tomcat. However that depends on finding a suitable reverse
> proxy.
>
> Mark
>

Hi Mark,

Thanks for the reply. We will try to stay with the version 8.0.38 before we
migrate all our app clients.

For those who search for this, the configuration property is  tomcat.
util.http.parser.HttpParser. requestTargetAllow
<https://tomcat.apache.org/tomcat-8.5-doc/config/systemprops.html#Other>


>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

André Warnier (tomcat) <[hidden email]> 於 2017年8月1日 週二 下午8:10寫道:

> On 01.08.2017 04:26, Bruce Huang wrote:
> > Hi all,
> >
> > We have placed a file named 檔名.txt into
> > the \apache-tomcat-8.0.43\webapps\Apps folder. And our client app can
> > retrieve the file by an HTTP GET request from the URL, for example,
> > http://192.168.1.1/Apps/檔名.txt (The 檔名 are two Chinese words)
>
> This is one of those cases where it can get very confusing very quickly,
> because of the
> multiple opportunities for things to get encoded/decoded or not, and to be
> /seen/ as
> encoded/decoded or not. (Such as : are we really seeing the above URL as
> you meant to send
> it, or are we seeing some other form, as encoded by the email systems
> in-between ?)
>
> Strictly speaking, according to the relevant Internet HTTP RFCs (and which
> ones are
> relevant can be yet another confusing matter), you MAY NOT include the
> above Chinese
> characters directly in a URL string. The set of characters/bytes allowed
> in a URL string
> is very restrictive, and in any case does not include even the individual
> bytes which
> would result from encoding the above Unicode characters as UTF-8.
> (See : https://tools.ietf.org/html/rfc3986#section-2)
>
> Before you send out this URL from the client, you would have to :
> - encode the above Chinese characters as a UTF-8 byte sequence. This would
> probably result
> in 3 bytes or more per character, so let's say 6 bytes in total.
> - then, for each of the 6 bytes, you would have to check if they are
> within the range of
> bytes allowed in a URL, and if not, /that/ byte should be encoded/escaped
> as a "%xy"
> 3-character ASCII byte sequence. (There are many existing functions to do
> that).
>
> Then on the server side receiving this URL, the opposite transformation
> should take place :
> - the first step would be to "%-decode" the URL string, to restore the
> original bytes
> which the client wanted to send. To my knowledge, all HTTP servers do that.
>
> - then, the server and the application would have to /assume/ that URLs
> received from your
> clients are always Unicode, UTF-8 encoded.  That is (still) not the
> default in HTTP (the
> default is still ISO-8859-1). (And there is no mechanism in the current
> RFCs, that allow
> either client or server to indicate, in the request itself, what character
> set the request
> URL really is written in, or should be).
> But you can force Tomcat to assume this, see :
> http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
> --> URIEncoding
> (and there is also "useBodyEncodingForURI", but that does not apply in
> your particular case)
> - the next step would thus be for the application (e.g. the default
> servlet), to /assume/
> that this URL is Unicode/UTF-8, and decode this into a corresponding
> internal Unicode string.
> - and then comes the step of looking for the corresponding file in the
> filesystem, by the
> name you got from the previous step. And depending on the OS and the
> filesystem, this may
> be character-set-agnostic or not, and may be case-agnostic or not.
> (But your problem is currently not that it does not find the file; it is
> that the HTTP
> request itself gets rejected as invalid. So your request URI contains
> bytes which the
> server considers - rightly or not - as invalid in a URL.)
>
> [rant]
> In other words and basically, no wonder that developers (of servers as
> well as of
> applications) get confused from time to time, and maybe unwittingly
> introduce bugs when
> trying to handle URLs and/or content that is anything else than English.
> In that respect, the HTTP protocols are still hopelessly outdated and
> obnoxious when
> handling the vast amounts of languages which are in use in today's
> real-life Internet.
>
> And it is a never-ending wonder to me why whoever are in charge of these
> things, have
> apparently not yet made a serious attempt at publishing a new set of
> coordinated HTTP (and
> HTML, and CGI, and Javascript etc.) versions which would make
> Unicode/UTF-8 the default
> charset/encoding (for URLs as well as for text content), instead of the
> long-obsolete
> ASCII and ISO-8859-1 character sets. I would bet that millions of useless
> work-hours would
> be saved worldwide every year by such a change.
> [end of rant]
>
>
Hi André,

Thanks for the clear and helpful explanation. It seems that our
application developers have unwittingly introduced the bug this time as you
said.:(


>
> >
> > When it was on tomcat v8.0.23, everything works fine. However, after we
> > have migrated to the v8.0.43, the client app will receive response with
> > HTTP 400 Bad Request.
>
> Most probably, that was a correction in Tomcat, which previously did not
> properly reject
> some URLs which are invalid according to the existing (deficient) RFCs.
>
> The code that our client app used as below. Looks
> > like that it didn't encode the URL path and only translate the whitespace
> > to %20.
>
> Exactly. You app has to encode that URL properly before issuing the
> request.
>
> >
> > Is there any solution that we can configure the tomcat 8.0.43 to make
> this
> > case works as usual(On tomcat v8.0.23), since there are lots of client
> > app deployed?
> >
>
> If "as usual" was wrong and/or could cause security issues, your chances
> are slim, and you
> will have to update your app.
>
>
>
>
Loading...