Reading from streams, part 1

This is the first in a short series about reading data from streams in C#. In this initial post, I’m going to discuss reading from a stream that has a defined end. You might not know where the end is, but you can be assured that if you continue reading you will reach the end. This is common with files, for example, and when reading data from Web requests (downloading a Web page). I’ll discuss messaging protocols in a subsequent post.

C# programmers who are accustomed to reading files with FileStream often become confused when working with NetworkStream or other types that derive from Stream or that have an underlying Stream, like SerialPort. The confusion arises from assumptions about how the Read method works.

The code to read 100 bytes from a FileStream is straightforward; you request 100 bytes and that’s what you get. For example:

    byte[] buffer = new byte[100];
    int bytesRead;
    using (FileStream f = File.OpenRead(filename))
    {
        bytesRead = f.Read(buffer, 0, 100); // request 100 bytes
    }
    // do something with the data read

Actually, I lied. I said that when you ask for 100 bytes, that’s what you get. In truth, you get up to 100 bytes. As the documentation for FileStream.Read says:

The returned value is the actual number of bytes read, or zero if the end of the stream is reached.

The Read method returns zero only after reaching the end of the stream. Otherwise, Read always reads at least one byte from the stream before returning. If no data is available from the stream upon a call to Read, the method will block until at least one byte of data can be returned. An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.

(The emphasis is mine.)

In practice, at least in my experience when writing .NET applications on Windows, FileStream.Read always returns the number of bytes requested or the number of bytes remaining in the file. So if I ask for 100 bytes and the file is only 85 bytes long, I’ll get 85 bytes. The function only returns zero if I asked to read when the file was already positioned at the end, and it never returns less than I asked for unless it reaches the end of the file. So in the code above, the value of bytesRead would always be either 100 or, if the file contains fewer than 100 bytes, the total number of bytes in the file.

It turns out, though, that depending on this behavior is wrong, as you will see below.

The documentation for Stream.Read has wording that’s somewhat similar to that of FileStream.Read:

Implementations of this method read a maximum of count bytes from the current stream and store them in buffer beginning at offset. The current position within the stream is advanced by the number of bytes read; however, if an exception occurs, the current position within the stream remains unchanged. Implementations return the number of bytes read. The implementation will block until at least one byte of data can be read, in the event that no data is available. Read returns 0 only when there is no more data in the stream and no more is expected (such as a closed socket or end of file). An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.

Again, the emphasis is mine.

The last sentence is particularly important. NetworkStream.Read and SerialPort.Read frequently return fewer bytes than requested. As an illustration, consider this code that reads the HTML from my blog.

    HttpWebRequest request =
        (HttpWebRequest)WebRequest.Create("http://blog.mischel.com/");
    using (HttpWebResponse response =
        (HttpWebResponse) request.GetResponse())
    {
        // limit data to 1 megabyte
        const int maxLength = 1024*1024;
        byte[] buffer = new byte[maxLength];
        int totalBytesRead = 0;
        using (Stream s = response.GetResponseStream())
        {
            while (totalBytesRead < buffer.Length)
            {
                int bytesRead = s.Read(
                    buffer,
                    totalBytesRead, 
                    buffer.Length - totalBytesRead);
                if (bytesRead == 0)
                    break;
                Console.WriteLine("{0:N0} bytes read.", bytesRead);
                totalBytesRead += bytesRead;
            }
        }
        Console.WriteLine("Done: {0:N0} total bytes read.", totalBytesRead);
    }

Note that although there is a ContentLength property in the response object, some Web sites reply with an invalid Content-Length header and others supply 0 or -1. Rather than handle those exceptional cases, I just set an upper limit on the size of download I will accept, and read up to that many bytes, or until I reach the end of the stream.

When I run that program, here’s the output I get:

    1,048,576 bytes requested. 123 bytes read.
    1,048,453 bytes requested. 43 bytes read.
    1,048,410 bytes requested. 12 bytes read.
    ...
    ... Lots of other read requests
    ...
    903,066 bytes requested. 5 bytes read.
    903,061 bytes requested. 15 bytes read.
    903,046 bytes requested. 99 bytes read.
    902,947 bytes requested. 16 bytes read.
    Done: 145,645 total bytes read in 380 requests.

On the first call I asked for a megabyte. I got 123 bytes. The number of bytes I received when making requests varied from 1 to about 10,000.

As you can see, you can’t depend on a stream to give you all the bytes you requested. You have to keep asking it for pieces until it’s given you what you want, or until it returns 0 to indicate that there isn’t any more data forthcoming.

I pointed out above that FileStream.Read doesn’t appear to have this behavior. But the documentation indicates that it could. Apparently the .NET development team thinks it could, too, because the File.ReadAllBytes method allocates a buffer and does multiple reads. That is, it does essentially this:

    using (FileStream fs = new FileStream(...))
    {
        int count = (int)fs.Length;
        byte[] bytes = new byte[count];
        int index = 0;
        while (count > 0)
        {
            int n = fs.Read(bytes, index, count);
            if (n == 0)
            {
                // error, unexpected end of file
            }
            index += n;
            count -= n;
        }
        return bytes;
    }

I removed some error checking (in particular, the length check to ensure that the file isn’t larger than 2 gigabytes), but it’s plain that they don’t expect FileStream.Read to return them the exact number of bytes requested.

Next time I’ll explain a bit about why streams work this way.