Fuero's blog     About     Archive     Feed

Shibboleth SP, Tomcat and Charset oddities

The problem

When passing an attribute to Tomcat via HTTP or AJP, I noticed that Tomcat treats it as ISO-8859-1.
No matter what I did fiddling with Apache’s config settings, it didn’t matter.
Shibboleth SP’s docs mandate treating the data as UTF-8.

Digging through tomcat’s source code, I found this in ByteChunk.java:

/** Default encoding used to convert to strings. It should be UTF8,
    as most standards seem to converge, but the servlet API requires
    8859_1, and this object is used mostly for servlets.
*/
public static final Charset DEFAULT_CHARSET = B2CConverter.ISO_8859_1;

Judging from the code where HTTP headers/AJP attributes are fed into the ServletRequest object, no
attempt is made to pass any encoding information about them to ByteChunk - thus defaulting to ISO.

Reproducing the problem

The Tomcat FAQ has an entry on this topic, but as this example shows, it doesn’t matter for the
interpretation of HTTP headers or AJP attributes.

  • Download tomcat 8.x, unzip and change to its directory.
  • mkdir -p webapps/foo/WEB-INF
  • create web.xml with this content:
<web-app>
  <filter>
    <filter-name>encoding-filter</filter-name>
    <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
    <init-param>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
    </init-param>
  </filter>
  <filter-mapping>
    <filter-name>encoding-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
</web-app>
  • create index.jsp:
<%--
  Adapted 2015 by fuero
  original Copyright (c) 2002 by Phil Hanna
  All rights reserved.
  
  You may study, use, modify, and distribute this
  software for any purpose provided that this
  copyright notice appears in all copies.
  
  This software is provided without warranty
  either expressed or implied.
--%>
<%@ page contentType="text/html; charset=UTF-8" %>
<%@ page import="java.util.*" %>
<html>
   <head>
      <title>Echo</title>
   </head>
   <body>
      <h1>HTTP Request Headers Received</h1>
      <table border="1" cellpadding="4" cellspacing="0">
      <%
         Enumeration eNames = request.getHeaderNames();
         while (eNames.hasMoreElements()) {
            String name = (String) eNames.nextElement();
            String value = normalize(request.getHeader(name));
      %>
         <tr><td><%= name %></td><td><%= value %></td></tr>
      <%
         }
      %>
      </table>
      <h1>HTTP Request Parameters Received</h1>
      <table border="1" cellpadding="4" cellspacing="0">
      <%
         eNames = request.getParameterNames();
         while (eNames.hasMoreElements()) {
            String name = (String) eNames.nextElement();
            String value = normalize(request.getParameter(name));
      %>
         <tr><td><%= name %></td><td><%= value %></td></tr>
      <%
         }
      %>
      <h1>HTTP Request Attributes Received</h1>
      <table border="1" cellpadding="4" cellspacing="0">
      <%
         eNames = request.getAttributeNames();
         while (eNames.hasMoreElements()) {
            String name = (String) eNames.nextElement();
            String value = normalize((String) request.getAttribute(name));
      %>
         <tr><td><%= name %></td><td><%= value %></td></tr>
      <%
         }
      %>
      </table>
   </body>
</html>
<%!
   private String normalize(String value)
   {
      StringBuffer sb = new StringBuffer();
      for (int i = 0; i < value.length(); i++) {
         char c = value.charAt(i);
         sb.append(c);
         if (c == ';')
            sb.append("<br>");
      }
      return sb.toString();
   }
%>
  • add URIEncoding=”UTF-8” to server.xml (HTTP Connector)
  • run curl on a UTF-8 enabled shell:

curl –header “Foo: äöü” http://localhost:8080/foo/

<html>
   <head>
      <title>Echo</title>
   </head>
   <body>
      <h1>HTTP Request Headers Received</h1>
      <table border="1" cellpadding="4" cellspacing="0">
      
         <tr><td>user-agent</td><td>curl/7.38.0</td></tr>
      
         <tr><td>host</td><td>localhost:8080</td></tr>
      
         <tr><td>accept</td><td>*/*</td></tr>
      
         <tr><td>foo</td><td>äöü</td></tr>
      
      </table>
      <h1>HTTP Request Parameters Received</h1>
      <table border="1" cellpadding="4" cellspacing="0">
      
      <h1>HTTP Request Attributes Received</h1>
      <table border="1" cellpadding="4" cellspacing="0">
      
      </table>
   </body>
</html>

Solution

I’m not too familiar with the servlet API, so I’m unsure what to make of the
comment in ByteChunk.
Without being able to modify the third party application I wanted to “shibbolize”, I had
to get sneaky.
I’ve used a custom ServletFilter to inject a dynamically proxied (using ByteBuddy)
Request object, intercepting getAttribute and getHeader to fix the charset issue and
intercepting getAttributeNames to fix another annoying issue of Shibboleth
attribute names being hidden, but being returned when calling getAttribute.