The HttpListener
class in .NET lets you create a lightweight HTTP server without having to go through all the rigmarole¹ of installing and managing IIS. It’s incredibly easy to get a simple HTTP server up and working with HttpListener
But when it comes to handling query parameters, things break in a very strange way. The server I’m working on currently accepts requests to search for strings in video titles. Last night I got a bug report saying that it didn’t work when searching for Kanji (Japanese) characters. David was looking for the string “尺八”², and got no results back even though he knew that there were matching titles in the database.
When a browser sends a query string to the server, it encodes the string using the UTF-8 character encoding. So David’s search for “尺八” resulted in this request to my server: “/?q=%E5%B0%BA%E5%85%AB”
. Which is correct.
Then things get weird.
The HttpListenerContext.Request
object contains all the information about the request that came to the server. If I look at the relevant properties, I see the following:
Request.RawUrl = "/?q=%E5%B0%BA%E5%85%AB" Request.Query["q"] = "尺八" Request.Url = {http://localhost:8080/?q=尺八}
The problem here is that the Request.Query
property is apparently interpreting the encoded query string parameter using something other than UTF-8. And, looking at the code for HttpListenerRequest.QueryString
(part of the .NET runtime library) confirms that:
public NameValueCollection QueryString { get { NameValueCollection nvc = new NameValueCollection(); Helpers.FillFromString(nvc, this.Url.Query, true, this.ContentEncoding); return nvc; } }
The problem is the this.ContentEncoding
, which says, “use the Request
object’s encoding to interpret this string.” That’s pretty strange. It’s hard to be sure, but I think that the current standard (RFC3986) says that query strings should be UTF-8 encoded. If that’s true, then this is a bug in the HttpListenerRequest
implementation.
Fortunately, there’s an easy workaround. The Request.Url
property is properly formed, so I can use its Query
property to construct my own queryString
collection and ignore Request.QueryString
:
var queryString = HttpUtility.ParseQueryString(context.Request.Url.Query); string q = queryString["q"];
As far as I know, this is the only way to properly handle encoded query strings in HttpListener
. If you know of some way to make Request.QueryString
work as expected (or can tell me why the current behavior isn’t a bug), I’d sure like to hear about it.
¹I always pronounced that word “rig-a-ma-role.” But the word is “rig-ma-role”. Learn something new every day.
²WordPress lets you add Unicode characters when adding a new post, but if you pull the post up to edit it afterwards, the Unicode characters get turned into question marks. Also, the “new post” editor accepts Unicode characters directly but any editing done after that requires you to input the characters using HTML Unicode escapes, like <code>尺八</code>.