Recent charset breakage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Recent charset breakage

Christopher Schultz-2
All,

I got a report of a user on our development system at $work saying that
special characters were being mangled. We are using Tomcat 8.5 with a
custom web application and MariaDB under the hood. We are expecting to
use UTF-8 everywhere and I can confirm that our testing environment and
production environments do *not* have this problem.

I've written a tiny JSP to demonstrate the problem.

charecho.jsp
==== CUT ====
<%
   response.setContentType("text/html");
   response.setCharacterEncoding("UTF-8");
%><html>
<head>
<meta http-equiv="Content-Type" content="text/html; UTF-8" />
<meta charset="UTF-8" />
</head>
<body>
<form method="post" accept-charset="UTF-8">
<textarea name="text"><%= (null != request.getParameter("text") ?
request.getParameter("text") : "")%></textarea>
<input type="submit" />
</form>
</body>
</html>
==== CUT ====

I tried this on my development and testing environments and it behaves
properly in my testing environment running 8.5.53, but not on my
development environment running 8.5.64.

So I got myself a fresh copy of both 8.5.53 and 8.5.64 and put this JSP
into the ROOT web application and it didn't work as expected.

Just enter either or both of these multi-byte Unicode characters into
the text area and submit the form. You'll get mangled characters showing
up which, if you submit many times, will multiple over and over again.

†
😈

Our custom application does have a "character encoding filter" in-place
which sets the request character encoding to "UTF-8" if it's null (which
is very common) which is the only thing I can think of that's not quite
similar to an out-of-the-box configuration for Tomcat.

I'm in the process of checking *everything*. But I'm hoping someone can
(a) explain why the above JSP doesn't behave as expected on an
out-of-the-box Tomcat and (b) what I might be overlooking, especially
since this has been working for us for many years without any problems
until somewhat recently.

Thanks,
-chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Recent charset breakage

Christopher Schultz-2
All,

On 3/31/21 17:54, Christopher Schultz wrote:

> All,
>
> I got a report of a user on our development system at $work saying that
> special characters were being mangled. We are using Tomcat 8.5 with a
> custom web application and MariaDB under the hood. We are expecting to
> use UTF-8 everywhere and I can confirm that our testing environment and
> production environments do *not* have this problem.
>
> I've written a tiny JSP to demonstrate the problem.
>
> charecho.jsp
> ==== CUT ====
> <%
>   response.setContentType("text/html");
>   response.setCharacterEncoding("UTF-8");
> %><html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; UTF-8" />
> <meta charset="UTF-8" />
> </head>
> <body>
> <form method="post" accept-charset="UTF-8">
> <textarea name="text"><%= (null != request.getParameter("text") ?
> request.getParameter("text") : "")%></textarea>
> <input type="submit" />
> </form>
> </body>
> </html>
> ==== CUT ====
>
> I tried this on my development and testing environments and it behaves
> properly in my testing environment running 8.5.53, but not on my
> development environment running 8.5.64.
>
> So I got myself a fresh copy of both 8.5.53 and 8.5.64 and put this JSP
> into the ROOT web application and it didn't work as expected.
>
> Just enter either or both of these multi-byte Unicode characters into
> the text area and submit the form. You'll get mangled characters showing
> up which, if you submit many times, will multiple over and over again.
>
> †
> 😈
>
> Our custom application does have a "character encoding filter" in-place
> which sets the request character encoding to "UTF-8" if it's null (which
> is very common) which is the only thing I can think of that's not quite
> similar to an out-of-the-box configuration for Tomcat.
>
> I'm in the process of checking *everything*. But I'm hoping someone can
> (a) explain why the above JSP doesn't behave as expected on an
> out-of-the-box Tomcat and (b) what I might be overlooking, especially
> since this has been working for us for many years without any problems
> until somewhat recently.
>
> Thanks,
> -chris

I knew this had to be a problem in my own environment, but here's the
explanation. First, to answer (a) above:

In order to make charecho.jsp work as expected in a vanilla Tomcat
environment, you have to use a CharacterEncodingFilter. I wasn't able to
get it to work by simply adding
<request-character-encoding>UTF-8</request-character-encoding> to
webapps/ROOT/WEB-INF/web.xml.

Once that was done, it works as expected.

For my own environment, we recently violated item #6 from this set of
instructions:

https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding#CharacterEncoding-Q8

We (I, actually!) had installed a new <filter> which reads a request
parameter and it was firing *before* the CharacterEncodingFilter was
setting the default character encoding.

So, somewhat "mystery solved" although I'd like to understand why
<request-character-encoding> didn't work.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Recent charset breakage

Konstantin Kolinko
In reply to this post by Christopher Schultz-2
чт, 1 апр. 2021 г. в 00:55, Christopher Schultz <[hidden email]>:

>
> [...]
>
> I've written a tiny JSP to demonstrate the problem.
>
> charecho.jsp
> ==== CUT ====
> <%
>    response.setContentType("text/html");
>    response.setCharacterEncoding("UTF-8");
> %><html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; UTF-8" />

The value above is misspelled. You are missing "charset=" before "UTF-8".
Personally, I usually echo the actual contentType header value when
writing a meta tag. I think that would be
<meta http-equiv="Content-Type" content="<%= response.getContentType() %>">

[...]

>
> So, somewhat "mystery solved" although I'd like to understand why
> <request-character-encoding> didn't work.

Does validating your web.xml file against an xsd schema complete successfully?

request-character-encoding is defined in
(javax|jakarta)/serv/et/resources/web-app_4_0.xsd, which means Tomcat
9 or later. You wrote that you are running Tomcat 8.5.

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Recent charset breakage

Christopher Schultz-2
Konstantin,

On 4/1/21 05:06, Konstantin Kolinko wrote:

> чт, 1 апр. 2021 г. в 00:55, Christopher Schultz <[hidden email]>:
>>
>> [...]
>>
>> I've written a tiny JSP to demonstrate the problem.
>>
>> charecho.jsp
>> ==== CUT ====
>> <%
>>     response.setContentType("text/html");
>>     response.setCharacterEncoding("UTF-8");
>> %><html>
>> <head>
>> <meta http-equiv="Content-Type" content="text/html; UTF-8" />
>
> The value above is misspelled. You are missing "charset=" before "UTF-8".
> Personally, I usually echo the actual contentType header value when
> writing a meta tag. I think that would be
> <meta http-equiv="Content-Type" content="<%= response.getContentType() %>">

Thanks for pointing that out. I have modified the charecho.jsp file, so
it is now:

<%@page contentType="text/html; charset=UTF-8" %>
<html>
<head>
<meta http-equiv="Content-Type" content="<%= response.getContentType()
%>" />
<meta charset="<%= response.getCharacterEncoding() %>" />
</head>
<body>
<form method="post" accept-charset="UTF-8">
<textarea name="text"><%= (null != request.getParameter("text") ?
request.getParameter("text") : "")%></textarea>
<input type="submit" />
</form>
</body>
</html>

The behavior is the same.

If I instead insert the following after the @page directive (to act as a
filter, to keep the example completely self-contained), then this works
as desired:

<%
   if(null == request.getCharacterEncoding()) {
     application.log("Character encoding is unset; setting to UTF-8");
     request.setCharacterEncoding("UTF-8");
   }
%>

> [...]
>
>>
>> So, somewhat "mystery solved" although I'd like to understand why
>> <request-character-encoding> didn't work.
>
> Does validating your web.xml file against an xsd schema complete successfully?
>
> request-character-encoding is defined in
> (javax|jakarta)/serv/et/resources/web-app_4_0.xsd, which means Tomcat
> 9 or later. You wrote that you are running Tomcat 8.5.

Ooh, that would do it.

Confirmed: Using <request-character-encoding> with Tomcat *9* behaves as
desired, even without the filter/hack to correct a missing charset.

Thanks,
-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[OT] programming style or mental process ?

André Warnier (tomcat/perl)
Hi.
I have a question which may be totally off-topic for this list, but this has been puzzling
me for a while and I figure that someone here may be able to provide some clue as to the
answer, or at least some interesting ponts of view.

In various places (including on this list), I have seen multiple occurrences of a certain
way to write a test, namely :

   if (null == request.getCharacterEncoding()) {

as opposed to

   if (request.getCharacterEncoding() == null) {

Granted, the two are equivalent in the end.
But it would seem to me, maybe naively, that the second form better corresponds to some
"semantic logic", by which one wants to know if a certain a-priori unknown piece of data
(here the value obtained by retrieving the character encoding of the current request) is
defined (not null) or not (null).

Said another way : we don't want to know if "null" is equal to anything; we want to know
if request.getCharacterEncoding() is null or not.

Or in yet another way : the focus (or the "subject" of the test) here is on
"request.getCharacterEncoding()" (which we don't know), and not on "null" (which we know
already).

Or, more literarily, given that the syntax of most (all?) programming languages is based
on English (if, then, else, new, for, while, until, exit, continue, etc.), we (*) do
normally ask "is your coffee cold ?" and not "is cold your coffee ?".


So why do (some) people write it the other way ?
Is it purely a question of individual programming style ?
Is there some (temporary ?) fashion aspect involved ?
Do the people who write this either way really think in a different way ?
Or is there really something "technical" behind this, which makes one or the other way be
slightly more efficient (whether to compile, or optimise, or run) ?

(*) excepting Yoda of course


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Zorro
On 4/4/21 12:23 PM, André Warnier (tomcat/perl) wrote:

> Hi.
> I have a question which may be totally off-topic for this list, but
> this has been puzzling me for a while and I figure that someone here
> may be able to provide some clue as to the answer, or at least some
> interesting ponts of view.
>
> In various places (including on this list), I have seen multiple
> occurrences of a certain way to write a test, namely :
>
>  if (null == request.getCharacterEncoding()) {
>
> as opposed to
>
>  if (request.getCharacterEncoding() == null) {
>
> Granted, the two are equivalent in the end.
> But it would seem to me, maybe naively, that the second form better
> corresponds to some "semantic logic", by which one wants to know if a
> certain a-priori unknown piece of data (here the value obtained by
> retrieving the character encoding of the current request) is defined
> (not null) or not (null).
>
> Said another way : we don't want to know if "null" is equal to
> anything; we want to know if request.getCharacterEncoding() is null or
> not.
>
> Or in yet another way : the focus (or the "subject" of the test) here
> is on "request.getCharacterEncoding()" (which we don't know), and not
> on "null" (which we know already).
>
> Or, more literarily, given that the syntax of most (all?) programming
> languages is based on English (if, then, else, new, for, while, until,
> exit, continue, etc.), we (*) do normally ask "is your coffee cold ?"
> and not "is cold your coffee ?".
>
>
> So why do (some) people write it the other way ?
> Is it purely a question of individual programming style ?
> Is there some (temporary ?) fashion aspect involved ?
> Do the people who write this either way really think in a different way ?
> Or is there really something "technical" behind this, which makes one
> or the other way be slightly more efficient (whether to compile, or
> optimise, or run) ?
>

Cannot find it back right now.

But I seem to remember that it came from Scott Meyers in C++ programming.

Maybe there it forces the compiler to use the right method when there is
overloading.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Olaf Kock
In reply to this post by André Warnier (tomcat/perl)
Hi André

On 04.04.21 12:23, André Warnier (tomcat/perl) wrote:

>
>   if (null == request.getCharacterEncoding()) {
>
> as opposed to
>
>   if (request.getCharacterEncoding() == null) {
>
>
> So why do (some) people write it the other way ?
> Is it purely a question of individual programming style ?
> Is there some (temporary ?) fashion aspect involved ?
> Do the people who write this either way really think in a different way ?
> Or is there really something "technical" behind this, which makes one
> or the other way be slightly more efficient (whether to compile, or
> optimise, or run) ?
>
> (*) excepting Yoda of course
>
I can't say I'm always writing Yoda style, but if I stretch my memory,
then the rationale behind this style of comparisons is to have a
constant on the left side, so that you get a compiler error in case
you're using = instead of ==.

In your case, with a function call, this wouldn't make a difference
"if(request.getCharacterEncoding() = null)" would be illegal syntax as
well, but "if(someObject = null)" is perfectly legal, but doesn't
express the author's intent clearly: Is it a smart person who's taking a
shortcut, or a newbie using the wrong operator?

Of course, the style doesn't really help people new to the language, as
they first need to understand that this is something that they might
want to apply to their code. And today, with so many IDE warnings being
flagged while typing, it might be outdated, though it still clearly
expresses the intent to have a real comparison and not an assignment here.

And I agree with the other answer posted already: It makes a lot more
sense in C++ with all the implicit boolean conversions and habits of
outsmarting the code's maintainers with clever expressions.

Olaf



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

André Warnier (tomcat/perl)
On 04.04.2021 12:57, Olaf Kock wrote:

> Hi André
>
> On 04.04.21 12:23, André Warnier (tomcat/perl) wrote:
>>
>>    if (null == request.getCharacterEncoding()) {
>>
>> as opposed to
>>
>>    if (request.getCharacterEncoding() == null) {
>>
>>
>> So why do (some) people write it the other way ?
>> Is it purely a question of individual programming style ?
>> Is there some (temporary ?) fashion aspect involved ?
>> Do the people who write this either way really think in a different way ?
>> Or is there really something "technical" behind this, which makes one
>> or the other way be slightly more efficient (whether to compile, or
>> optimise, or run) ?
>>
>> (*) excepting Yoda of course
>>
> I can't say I'm always writing Yoda style, but if I stretch my memory,
> then the rationale behind this style of comparisons is to have a
> constant on the left side, so that you get a compiler error in case
> you're using = instead of ==.

I like that explanation, in the sense that it provides a programming rationale for using
the first form (and not only in Java), even if it feels intuitively un-natural.
So it's apparently not only fashion or Yoda fandom.
Thanks.

>
> In your case, with a function call, this wouldn't make a difference
> "if(request.getCharacterEncoding() = null)" would be illegal syntax as
> well, but "if(someObject = null)" is perfectly legal, but doesn't
> express the author's intent clearly: Is it a smart person who's taking a
> shortcut, or a newbie using the wrong operator?
>

Let the seasoned programmer who's never made that same mistake throw the first stone.

> Of course, the style doesn't really help people new to the language, as
> they first need to understand that this is something that they might
> want to apply to their code. And today, with so many IDE warnings being
> flagged while typing, it might be outdated, though it still clearly
> expresses the intent to have a real comparison and not an assignment here.
>
> And I agree with the other answer posted already: It makes a lot more
> sense in C++ with all the implicit boolean conversions and habits of
> outsmarting the code's maintainers with clever expressions.
>

+1 to that too.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Zala Pierre GOUPIL
> >
> > In your case, with a function call, this wouldn't make a difference
> > "if(request.getCharacterEncoding() = null)" would be illegal syntax as
> > well, but "if(someObject = null)" is perfectly legal, but doesn't
> > express the author's intent clearly: Is it a smart person who's taking a
> > shortcut, or a newbie using the wrong operator?
> >
>
> Let the seasoned programmer who's never made that same mistake throw the
> first stone.
>


I think I never did that mistake. Or at least, I didn't realize it.
Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

André Warnier (tomcat/perl)
On 05.04.2021 00:21, Zala Pierre GOUPIL wrote:

>>>
>>> In your case, with a function call, this wouldn't make a difference
>>> "if(request.getCharacterEncoding() = null)" would be illegal syntax as
>>> well, but "if(someObject = null)" is perfectly legal, but doesn't
>>> express the author's intent clearly: Is it a smart person who's taking a
>>> shortcut, or a newbie using the wrong operator?
>>>
>>
>> Let the seasoned programmer who's never made that same mistake throw the
>> first stone.
>>
>
>
> I think I never did that mistake. Or at least, I didn't realize it.
>

J'ai jamais tué d'chats
Ou alors y'a longtemps
Ou bien j'ai oublié
Ou ils sentaient pas bon
(Jacques Brel - Ces gens-là)

Couldn't resist.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Christopher Schultz-2
In reply to this post by André Warnier (tomcat/perl)
André,

On 4/4/21 06:23, André Warnier (tomcat/perl) wrote:

> Hi.
> I have a question which may be totally off-topic for this list, but this
> has been puzzling me for a while and I figure that someone here may be
> able to provide some clue as to the answer, or at least some interesting
> ponts of view.
>
> In various places (including on this list), I have seen multiple
> occurrences of a certain way to write a test, namely :
>
>   if (null == request.getCharacterEncoding()) {
>
> as opposed to
>
>   if (request.getCharacterEncoding() == null) {
>
> Granted, the two are equivalent in the end.
> But it would seem to me, maybe naively, that the second form better
> corresponds to some "semantic logic", by which one wants to know if a
> certain a-priori unknown piece of data (here the value obtained by
> retrieving the character encoding of the current request) is defined
> (not null) or not (null).
>
> Said another way : we don't want to know if "null" is equal to anything;
> we want to know if request.getCharacterEncoding() is null or not.
>
> Or in yet another way : the focus (or the "subject" of the test) here is
> on "request.getCharacterEncoding()" (which we don't know), and not on
> "null" (which we know already).
>
> Or, more literarily, given that the syntax of most (all?) programming
> languages is based on English (if, then, else, new, for, while, until,
> exit, continue, etc.), we (*) do normally ask "is your coffee cold ?"
> and not "is cold your coffee ?".

On the other hand, in English, coffee which is not hot is called "cold
coffee" but in e.g. Spanish, it's "coffee cold".

> So why do (some) people write it the other way ?

I personally put the null first because of my background in C. C
compilers (especially older ones) would happily compile this code
without batting an eyelash:

   char *s;

   s = call_some_function();

   if(s = null) {
     // do some stuff
   }

Guess what? "Do some stuff" is always executed, and s is always null.

If you switch the operands, the compiler will fail because you can't
assign a value to null:

   if(null = s ) {
     // Compiler will refuse to compile
   }

So it's a defensive programming technique for me.

> Is it purely a question of individual programming style ?

Perhaps at this stage in history, it is only "style". But it does have a
practical

> Is there some (temporary ?) fashion aspect involved ?
> Do the people who write this either way really think in a different way ?
> Or is there really something "technical" behind this, which makes one or
> the other way be slightly more efficient (whether to compile, or
> optimise, or run) ?
>
> (*) excepting Yoda of course

-chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

logo
All,

> Am 05.04.2021 um 14:38 schrieb Christopher Schultz <[hidden email]>:
>
> André,
>
>> On 4/4/21 06:23, André Warnier (tomcat/perl) wrote:
>> Hi.
>> I have a question which may be totally off-topic for this list, but this has been puzzling me for a while and I figure that someone here may be able to provide some clue as to the answer, or at least some interesting ponts of view.
>> In various places (including on this list), I have seen multiple occurrences of a certain way to write a test, namely :
>>  if (null == request.getCharacterEncoding()) {
>> as opposed to
>>  if (request.getCharacterEncoding() == null) {
>> Granted, the two are equivalent in the end.
>> But it would seem to me, maybe naively, that the second form better corresponds to some "semantic logic", by which one wants to know if a certain a-priori unknown piece of data (here the value obtained by retrieving the character encoding of the current request) is defined (not null) or not (null).
>> Said another way : we don't want to know if "null" is equal to anything; we want to know if request.getCharacterEncoding() is null or not.
>> Or in yet another way : the focus (or the "subject" of the test) here is on "request.getCharacterEncoding()" (which we don't know), and not on "null" (which we know already).
>> Or, more literarily, given that the syntax of most (all?) programming languages is based on English (if, then, else, new, for, while, until, exit, continue, etc.), we (*) do normally ask "is your coffee cold ?" and not "is cold your coffee ?".
>
> On the other hand, in English, coffee which is not hot is called "cold coffee" but in e.g. Spanish, it's "coffee cold".
>
>> So why do (some) people write it the other way ?
>
> I personally put the null first because of my background in C. C compilers (especially older ones) would happily compile this code without batting an eyelash:
>
>  char *s;
>
>  s = call_some_function();
>
>  if(s = null) {
>    // do some stuff
>  }
>
> Guess what? "Do some stuff" is always executed, and s is always null.
>
> If you switch the operands, the compiler will fail because you can't assign a value to null:
>
>  if(null = s ) {
>    // Compiler will refuse to compile
>  }
>

Isn‘t it true that only one bit difference would result in false - so result would not have to be completely tested?

Peter


> So it's a defensive programming technique for me.
>
>> Is it purely a question of individual programming style ?
>
> Perhaps at this stage in history, it is only "style". But it does have a practical
>
>> Is there some (temporary ?) fashion aspect involved ?
>> Do the people who write this either way really think in a different way ?
>> Or is there really something "technical" behind this, which makes one or the other way be slightly more efficient (whether to compile, or optimise, or run) ?
>> (*) excepting Yoda of course
>
> -chris
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Christopher Schultz-2
Peter,

On 4/5/21 12:35, Peter Kreuser wrote:

> All,
>
>> Am 05.04.2021 um 14:38 schrieb Christopher Schultz <[hidden email]>:
>>
>> André,
>>
>>> On 4/4/21 06:23, André Warnier (tomcat/perl) wrote:
>>> Hi.
>>> I have a question which may be totally off-topic for this list, but this has been puzzling me for a while and I figure that someone here may be able to provide some clue as to the answer, or at least some interesting ponts of view.
>>> In various places (including on this list), I have seen multiple occurrences of a certain way to write a test, namely :
>>>   if (null == request.getCharacterEncoding()) {
>>> as opposed to
>>>   if (request.getCharacterEncoding() == null) {
>>> Granted, the two are equivalent in the end.
>>> But it would seem to me, maybe naively, that the second form better corresponds to some "semantic logic", by which one wants to know if a certain a-priori unknown piece of data (here the value obtained by retrieving the character encoding of the current request) is defined (not null) or not (null).
>>> Said another way : we don't want to know if "null" is equal to anything; we want to know if request.getCharacterEncoding() is null or not.
>>> Or in yet another way : the focus (or the "subject" of the test) here is on "request.getCharacterEncoding()" (which we don't know), and not on "null" (which we know already).
>>> Or, more literarily, given that the syntax of most (all?) programming languages is based on English (if, then, else, new, for, while, until, exit, continue, etc.), we (*) do normally ask "is your coffee cold ?" and not "is cold your coffee ?".
>>
>> On the other hand, in English, coffee which is not hot is called "cold coffee" but in e.g. Spanish, it's "coffee cold".
>>
>>> So why do (some) people write it the other way ?
>>
>> I personally put the null first because of my background in C. C compilers (especially older ones) would happily compile this code without batting an eyelash:
>>
>>   char *s;
>>
>>   s = call_some_function();
>>
>>   if(s = null) {
>>     // do some stuff
>>   }
>>
>> Guess what? "Do some stuff" is always executed, and s is always null.
>>
>> If you switch the operands, the compiler will fail because you can't assign a value to null:
>>
>>   if(null = s ) {
>>     // Compiler will refuse to compile
>>   }
>>
>
> Isn‘t it true that only one bit difference would result in false - so result would not have to be completely tested?

I'm not sure what you mean, here.

This isn't an issue in Java: conditional predicates (the stuff inside
the "if" statement) must be boolean expressions. C and C++ will both
happily cast a number to what programmers typically consider to be a
boolean (remember: C doesn't actually have a boolean data type), and for
that it uses the truthiness of the number to determine what to do.

In C, NULL (the constant) is typically defined to be (void*)0, and
(surprise!) the only truthy numeric value in C is 0. So,

   if(s = NULL) {
     // Stuff
   }

does two things:

1. Assigns the value of 0 to s (nulling-out any pointer you had)
and
2. Executes the body of the conditional, since 0 is considered true

In C, this can be disastrous for a few reasons, not the least of which
is the simple (lack of) correctness of the behavior relative to the
programmer's likely intent: nulling a pointer can lead to memory leaks.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Christopher Schultz-2
All,

On 4/5/21 16:20, Christopher Schultz wrote:

> Peter,
>
> On 4/5/21 12:35, Peter Kreuser wrote:
>> All,
>>
>>> Am 05.04.2021 um 14:38 schrieb Christopher Schultz
>>> <[hidden email]>:
>>>
>>> André,
>>>
>>>> On 4/4/21 06:23, André Warnier (tomcat/perl) wrote:
>>>> Hi.
>>>> I have a question which may be totally off-topic for this list, but
>>>> this has been puzzling me for a while and I figure that someone here
>>>> may be able to provide some clue as to the answer, or at least some
>>>> interesting ponts of view.
>>>> In various places (including on this list), I have seen multiple
>>>> occurrences of a certain way to write a test, namely :
>>>>   if (null == request.getCharacterEncoding()) {
>>>> as opposed to
>>>>   if (request.getCharacterEncoding() == null) {
>>>> Granted, the two are equivalent in the end.
>>>> But it would seem to me, maybe naively, that the second form better
>>>> corresponds to some "semantic logic", by which one wants to know if
>>>> a certain a-priori unknown piece of data (here the value obtained by
>>>> retrieving the character encoding of the current request) is defined
>>>> (not null) or not (null).
>>>> Said another way : we don't want to know if "null" is equal to
>>>> anything; we want to know if request.getCharacterEncoding() is null
>>>> or not.
>>>> Or in yet another way : the focus (or the "subject" of the test)
>>>> here is on "request.getCharacterEncoding()" (which we don't know),
>>>> and not on "null" (which we know already).
>>>> Or, more literarily, given that the syntax of most (all?)
>>>> programming languages is based on English (if, then, else, new, for,
>>>> while, until, exit, continue, etc.), we (*) do normally ask "is your
>>>> coffee cold ?" and not "is cold your coffee ?".
>>>
>>> On the other hand, in English, coffee which is not hot is called
>>> "cold coffee" but in e.g. Spanish, it's "coffee cold".
>>>
>>>> So why do (some) people write it the other way ?
>>>
>>> I personally put the null first because of my background in C. C
>>> compilers (especially older ones) would happily compile this code
>>> without batting an eyelash:
>>>
>>>   char *s;
>>>
>>>   s = call_some_function();
>>>
>>>   if(s = null) {
>>>     // do some stuff
>>>   }
>>>
>>> Guess what? "Do some stuff" is always executed, and s is always null.
>>>
>>> If you switch the operands, the compiler will fail because you can't
>>> assign a value to null:
>>>
>>>   if(null = s ) {
>>>     // Compiler will refuse to compile
>>>   }
>>>
>>
>> Isn‘t it true that only one bit difference would result in false - so
>> result would not have to be completely tested?
>
> I'm not sure what you mean, here.
>
> This isn't an issue in Java: conditional predicates (the stuff inside
> the "if" statement) must be boolean expressions. C and C++ will both
> happily cast a number to what programmers typically consider to be a
> boolean (remember: C doesn't actually have a boolean data type), and for
> that it uses the truthiness of the number to determine what to do.
>
> In C, NULL (the constant) is typically defined to be (void*)0, and
> (surprise!) the only truthy numeric value in C is 0. So,
>
>   if(s = NULL) {
>     // Stuff
>   }
>
> does two things:
>
> 1. Assigns the value of 0 to s (nulling-out any pointer you had)
> and
> 2. Executes the body of the conditional, since 0 is considered true

Chuck didn't have the heart to publicly point out that this is 100%
wrong, but it is. He s=guessed correctly that I was remembering that a 0
return value from many functions means "all is well" or similar.

Actually, I was remembering that strcmp returns 0 when the strings are
equal, and so you need to logically-invert that value when checking to
see if two strings are equal:

   if(!strcmp("foo", "bar")) {
     // confusingly, "foo" and "bar" are evidently equal...
   }

> In C, this can be disastrous for a few reasons, not the least of which
> is the simple (lack of) correctness of the behavior relative to the
> programmer's likely intent: nulling a pointer can lead to memory leaks.

So, let's re-do that example again, shall we?

    if(s = NULL) {
      // Stuff
    }

This will null-out your pointer and *not* execute the stuff you should
do when your pointer is NULL.

It gets more fun when you do something like this:

    if(s = NULL) {
      // Stuff
    } else {
      free(s); // boom
    }

:(

Anyway, the whole point is that I tend to lead with rvalues as they are
not assignable, and therefore trigger compiler errors for simple typos
which are syntactically valid in C. This is much less of an issue in
Java, and one of the reasons it's a "safer" language than C.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

André Warnier (tomcat/perl)
In reply to this post by Christopher Schultz-2
On 05.04.2021 14:37, Christopher Schultz wrote:
>> Or, more literarily, given that the syntax of most (all?) programming languages is based
>> on English (if, then, else, new, for, while, until, exit, continue, etc.), we (*) do
>> normally ask "is your coffee cold ?" and not "is cold your coffee ?".
>
> On the other hand, in English, coffee which is not hot is called "cold coffee" but in e.g.
> Spanish, it's "coffee cold".

To nitpick, in Spanish one would rather say "cafe frio".
But that's a bit beside the point since - as mentioned above - most currently fashionable
programming languages are based on English.
Nevertheless, just for the sake of it, and in some imaginary situation
in which the Java syntax would be based on Spanish, one would probably have this :

   si (nada == requerimiento.obtengaCodificaciónCarácteros()) entonces {

   } sino {

   }

as opposed to

    si (requerimiento.obtengaCodificaciónCarácteros() == nada) entonces {

   } sino {

   }

.. which makes it even more striking that the first form deviates from the human language,
because "nothing" cannot really be equal to anything, and thus the first form should
always evaluate to false. (*)

(Which would also lead to more concise Java programs, because if you already know the
answer, then you don't even need to make the test in the first place.)

On the other hand, this provides an interesting insight into English-speaking people's
thought processes, for example as to the expression "nothing matches a good coffee in the
morning", which is undoubtedly evaluated as true by many, although logically it cannot be.

:-)


(*) actually, this appears to be false : in Java, (null == null) is true.
See here for an in-depth discussion :
https://stackoverflow.com/questions/2707322/what-is-null-in-java

P.S.
If anyone is interested about how it would be to write programs based on a Latin-inspired
programming language, I recommend this :
https://metacpan.org/pod/distribution/Lingua-Romana-Perligata/lib/Lingua/Romana/Perligata.pm
(in which language it would be very difficult to confuse "==" and "=")

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Konstantin Kolinko
In reply to this post by André Warnier (tomcat/perl)
вс, 4 апр. 2021 г. в 13:24, André Warnier (tomcat/perl) <[hidden email]>:

>
> Hi.
> I have a question which may be totally off-topic for this list, but this has been puzzling
> me for a while and I figure that someone here may be able to provide some clue as to the
> answer, or at least some interesting ponts of view.
>
> In various places (including on this list), I have seen multiple occurrences of a certain
> way to write a test, namely :
>
>    if (null == request.getCharacterEncoding()) {
>
> as opposed to
>
>    if (request.getCharacterEncoding() == null) {
>
> Granted, the two are equivalent in the end.

Some programming languages have rules, in what order an expression is
evaluated. E.g. the left side is evaluated first, the result is stored
in a register (memory) of a CPU, then the right side is evaluated and
the result is stored, then it is followed by a comparison and a
conditional jump. Thus the two variants are not equivalent.

(Well, as null is a zero and not really a specific value, maybe it
does not need evaluation and a memory register to store it.)

In Java the Java Language Specification dictates the evaluation order,
"15.7.1 Evaluate Left-Hand Operand First". I vaguely remember that in
the C language the evaluation order in such expressions is
unspecified.

https://docs.oracle.com/javase/specs/

If one side of an expression can have unexpected side effects (like a
function call or a null pointer dereference can have), I prefer them
to be evaluated first. Thus my preference is for
"(request.getCharacterEncoding() == null)".


Otherwise, another point of view to consider is readability of the
code. If the function call is some lengthy expression, " (null ==
request.getCharacterEncoding()) " may be more readable when formatting
the code results in wrapping the lengthy expression, splitting it into
several lines.


I think that I should also mention the well-known construct when a
comparison is done by calling the "equals()" method on some constant
value:

   CONSTANT_VALUE.equals(someFunction())

In this case the "CONSTANT_VALUE" is known to be non-null, and thus
calling its method cannot result in a NullPointerException. (In more
complex cases the static method "Objects.equals()" helps to compare
two values in a null-aware way).

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Christopher Schultz-2
Konstantin,

On 4/6/21 06:41, Konstantin Kolinko wrote:

> вс, 4 апр. 2021 г. в 13:24, André Warnier (tomcat/perl) <[hidden email]>:
>>
>> Hi.
>> I have a question which may be totally off-topic for this list, but this has been puzzling
>> me for a while and I figure that someone here may be able to provide some clue as to the
>> answer, or at least some interesting ponts of view.
>>
>> In various places (including on this list), I have seen multiple occurrences of a certain
>> way to write a test, namely :
>>
>>     if (null == request.getCharacterEncoding()) {
>>
>> as opposed to
>>
>>     if (request.getCharacterEncoding() == null) {
>>
>> Granted, the two are equivalent in the end.
>
> Some programming languages have rules, in what order an expression is
> evaluated. E.g. the left side is evaluated first, the result is stored
> in a register (memory) of a CPU, then the right side is evaluated and
> the result is stored, then it is followed by a comparison and a
> conditional jump. Thus the two variants are not equivalent.
>
> (Well, as null is a zero and not really a specific value, maybe it
> does not need evaluation and a memory register to store it.)

JVM uses a stack and not registers, but of course many architectures
(like most RISC) do use registers under the hood, so there is a bit of
mapping here and there, at multiple levels. Then x86 is accumlator-based
but also has a few registers, and that number grows with each processor
revision.

Anyhow, Java bytecode has primitives for loading null values onto the
stack, so it both has a definite value (probably 0, I've never bothered
to dig into it too much) and it is definitely loaded into registers
(well, onto the stack).

Further, JLS says that class members without explicit definitions get
whatever the equivalent of "0" is in their data type. References are
assigned "null", so null is probably == 0, though they could go
old-school and use 0xdeadbeef like some C compilers back in the day.

> In Java the Java Language Specification dictates the evaluation order,
> "15.7.1 Evaluate Left-Hand Operand First". I vaguely remember that in
> the C language the evaluation order in such expressions is
> unspecified.
>
> https://docs.oracle.com/javase/specs/
>
> If one side of an expression can have unexpected side effects (like a
> function call or a null pointer dereference can have), I prefer them
> to be evaluated first. Thus my preference is for
> "(request.getCharacterEncoding() == null)".
>
>
> Otherwise, another point of view to consider is readability of the
> code. If the function call is some lengthy expression, " (null ==
> request.getCharacterEncoding()) " may be more readable when formatting
> the code results in wrapping the lengthy expression, splitting it into
> several lines.
>
>
> I think that I should also mention the well-known construct when a
> comparison is done by calling the "equals()" method on some constant
> value:
>
>     CONSTANT_VALUE.equals(someFunction())
>
> In this case the "CONSTANT_VALUE" is known to be non-null, and thus
> calling its method cannot result in a NullPointerException. (In more
> complex cases the static method "Objects.equals()" helps to compare
> two values in a null-aware way).

In a way, this makes "null == thing" more consistent, because null is
the constant in this case. You can't call null.equals(), of course, but
it's the same idea... though for the opposite reason: in your case, you
want to avoid both NPE and needless null-avoidance code.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

gustavo.avitabile
In reply to this post by André Warnier (tomcat/perl)

Quoting "André Warnier (tomcat/perl)" <[hidden email]>:

> On 05.04.2021 14:37, Christopher Schultz wrote:
>>> Or, more literarily, given that the syntax of most (all?)  
>>> programming languages is based on English (if, then, else, new,  
>>> for, while, until, exit, continue, etc.), we (*) do normally ask  
>>> "is your coffee cold ?" and not "is cold your coffee ?".
>>
>> On the other hand, in English, coffee which is not hot is called  
>> "cold coffee" but in e.g. Spanish, it's "coffee cold".
>
> To nitpick, in Spanish one would rather say "cafe frio".

... and, in Italian, "caffè freddo",
but we Italians love coffee, and we have much phantasy, so try also:
"granita di caffè", "caffè gelato", "caffè col ghiaccio", "il caffè  
s'è fatto freddo", ...

> But that's a bit beside the point since - as mentioned above - most  
> currently fashionable programming languages are based on English.
> Nevertheless, just for the sake of it, and in some imaginary situation
> in which the Java syntax would be based on Spanish, one would  
> probably have this :
>
>   si (nada == requerimiento.obtengaCodificaciónCarácteros()) entonces {
>
>   } sino {
>
>   }
>
> as opposed to
>
>    si (requerimiento.obtengaCodificaciónCarácteros() == nada) entonces {
>
>   } sino {
>
>   }
>
> .. which makes it even more striking that the first form deviates  
> from the human language, because "nothing" cannot really be equal to  
> anything, and thus the first form should always evaluate to false. (*)
>
> (Which would also lead to more concise Java programs, because if you  
> already know the answer, then you don't even need to make the test  
> in the first place.)
>
> On the other hand, this provides an interesting insight into  
> English-speaking people's thought processes, for example as to the  
> expression "nothing matches a good coffee in the morning", which is  
> undoubtedly evaluated as true by many, although logically it cannot  
> be.
>
> :-)
>
>
> (*) actually, this appears to be false : in Java, (null == null) is true.
> See here for an in-depth discussion :  
> https://stackoverflow.com/questions/2707322/what-is-null-in-java
>
> P.S.
> If anyone is interested about how it would be to write programs  
> based on a Latin-inspired programming language, I recommend this :
> https://metacpan.org/pod/distribution/Lingua-Romana-Perligata/lib/Lingua/Romana/Perligata.pm
> (in which language it would be very difficult to confuse "==" and "=")
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

André Warnier (tomcat/perl)
On 06.04.2021 20:06, [hidden email] wrote:
>> To nitpick, in Spanish one would rather say "cafe frio".
>
> ... and, in Italian, "caffè freddo",
> but we Italians love coffee, and we have much phantasy, so try also:
> "granita di caffè", "caffè gelato", "caffè col ghiaccio", "il caffè s'è fatto freddo", ...

Not so you'd think that Italians are the only ones with imagination when it comes to
coffee, Spanish people also call this "granizado de cafe" (or "cafe granizado") or "cafe
del tiempo". And that's only for the basic cold type, because there are many subtypes each
with it's own name, with and without different types of liquor (flambé or not), short,
medium, large or "americano" (== like water), real coffee or powder, decaffeinated or not,
with or without (hot or cold) milk, in different types of recipients.

And not that some people would think that this is now all totally [OT], I would remind
everyone of the definite historical and cultural connections between Tomcat, Java,
programming and coffee (and Jakarta). (And dutch people. Where are they in this discussion
by the way ? (but they have only one type of coffee I think)).


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] programming style or mental process ?

Carsten Klein

> (And dutch people. Where are they in this discussion by the way ? (but
> they have only one type of coffee I think)).

Dutch people may only have one type of coffee (actually I don't know).
But remember, Dutch people have 'Coffee Shops' offering stuff far beyond
coffee... :) Is there a relation between that and usage of Yoda style?

Germans used to drink so called 'filter coffee' for decades, which
today, even if served hot, many people would call 'cold coffee' ('kalter
Kaffee' in German), an idiom that could be translated to English as 'an
old hat').

Now, thanks to companies like De'Longhi or Seaco (which is now owned by
Philips and so is actually Dutch), most of us prefer Italian coffee
types like Espresso, Cappuccino or Latte Macchiato.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12