Servlet encoding woes in Open Liberty

I have a simple test servlet that should output a non ASCII character (right single quotation mark – ’). In Tomcat, it works, but in Liberty I get junk. Is this a bug in Liberty, am I doing it wrong, or a config issue?

package test;

import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;


public class TestServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        response.setContentType("text/html;charset=UTF-8");
        response.setCharacterEncoding("UTF-8");
        try (PrintWriter out = response.getWriter()) {
            out.print("’");
            out.close();
        }
    }
}

JavaScript
​x
 
package test;​import java.io.IOException;import java.io.PrintWriter;import javax.servlet.ServletException;import javax.servlet.http.HttpServlet;import javax.servlet.http.HttpServletRequest;import javax.servlet.http.HttpServletResponse;​​public class TestServlet extends HttpServlet {​    @Override    protected void doGet(HttpServletRequest request, HttpServletResponse response)            throws ServletException, IOException {        response.setContentType("text/html;charset=UTF-8");        response.setCharacterEncoding("UTF-8");        try (PrintWriter out = response.getWriter()) {            out.print("’");            out.close();        }    }}​

and the web.xml

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="3.1" xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd">
    <servlet>
        <servlet-name>TestServlet</servlet-name>
        <servlet-class>test.TestServlet</servlet-class>
    </servlet>
    <servlet-mapping>
        <servlet-name>TestServlet</servlet-name>
        <url-pattern>/TestServlet</url-pattern>
    </servlet-mapping>
</web-app>

JavaScript
 
<?xml version="1.0" encoding="UTF-8"?><web-app version="3.1" xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd">    <servlet>        <servlet-name>TestServlet</servlet-name>        <servlet-class>test.TestServlet</servlet-class>    </servlet>    <servlet-mapping>        <servlet-name>TestServlet</servlet-name>        <url-pattern>/TestServlet</url-pattern>    </servlet-mapping></web-app>​

From Tomcat the response is (courtesy of Fiddler):

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Content-Length: 3
Date: Wed, 23 Jun 2021 23:40:07 GMT

’

JavaScript
 
HTTP/1.1 200 OKServer: Apache-Coyote/1.1Content-Type: text/html;charset=UTF-8Content-Length: 3Date: Wed, 23 Jun 2021 23:40:07 GMT​’​

The body hex is: E2, 80, 99 (which is correct UTF-8 for ’)

From Liberty it is

HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/html;charset=UTF-8
Content-Length: 3
Content-Language: en-CA
Date: Wed, 23 Jun 2021 23:52:49 GMT

â€™

JavaScript
 
HTTP/1.1 200 OKX-Powered-By: Servlet/3.1Content-Type: text/html;charset=UTF-8Content-Length: 3Content-Language: en-CADate: Wed, 23 Jun 2021 23:52:49 GMT​â€™​

The hex for that content is: C3, A2, E2, 82, AC, E2, 84, A2

Dev tools (F12) matches Fiddler.

I’ve tried moving around the code

        response.setContentType("text/html;charset=UTF-8");
        response.setCharacterEncoding("UTF-8");

JavaScript
 
        response.setContentType("text/html;charset=UTF-8");        response.setCharacterEncoding("UTF-8");​

before and after the getWriter (the docs say it should be before getWriter). With and without setCharacterEncoding and all kinds of things, content types etc.

The .java file itself is saved with UTF-8 encoding.

It’s curious that the content length header says 3 bytes with either server, but with Liberty the actual content length is 8 bytes. As if the bytes have been re-encoded?

So, what is going on here?

UPDATE: taking out the out.close() per @pmdinh’s answer had an effect, but didn’t fix it. This is the closest I could get to proper behaviour

    response.setCharacterEncoding("UTF-8");    
            
    try (PrintWriter out = response.getWriter()) {
        response.setContentType("text/html;charset=UTF-8");    
        
        out.print("’1234");

    }

JavaScript
 
    response.setCharacterEncoding("UTF-8");                    try (PrintWriter out = response.getWriter()) {        response.setContentType("text/html;charset=UTF-8");                    out.print("’1234");​    }​

This correctly encodes it, but now the content length is wrong by 2 bytes. So the response is

HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/html;charset=UTF-8
Content-Length: 5
Content-Language: en-CA
Date: Thu, 24 Jun 2021 17:50:55 GMT

’1234

JavaScript
 
HTTP/1.1 200 OKX-Powered-By: Servlet/3.1Content-Type: text/html;charset=UTF-8Content-Length: 5Content-Language: en-CADate: Thu, 24 Jun 2021 17:50:55 GMT​’1234​

but since the content-length is 2 short the browser shows ’12

Also note that the placing of setCharacterEncoding and setContentType matters and other combinations make the output even worse (incorrect encoding).

Answer

Remove the

out.close();

JavaScript
 
out.close();​

that should resolve the issue.

Ref: https://www.ibm.com/support/pages/apar/PM71666

Advertisement

Answer