Copying large files on Windows

Warning:  copying very large files (larger than available memory) on Windows will bring your computer to a screeching halt.

Let’s say you have a 60 gigabyte file on Computer A that you wish to copy to Computer B.  Both Computer A and Computer B have 16 gigabytes of memory.  Assuming that you have the network and file sharing permissions set correctly, you can issue this command on ComputerB:

copy /v \\ComputerA\Data\Filename.bin C:\Data\Filename.bin

As you would expect, that command reaches across the network and begins copying the file from Computer A to the local drive on Computer B.

What you don’t expect is for the command to bring Computer A and possibly Computer B to a screeching halt.  It takes a while, but after 20 or 30 gigabytes of the file is copied, Computer A stops responding.  It doesn’t gradually get slower.  No, at some point it just stops responding to keyboard and mouse input.  Every program starts running as though you’re emulating a Pentium on a 2 MHz 6502, using a cassette tape as virtual memory.

Why does this happen?  I’m so glad you asked.  It happens because Windows is caching the reads.  It’s reading ahead, copying data from the disk into memory as fast as it can, and then dribbling it out across the network as needed.  When the cache has consumed all unused memory, it starts chewing on memory that’s used by other programs, somehow forcing the operating system to page executing code and active data out to virtual memory in favor of the cache.  Then, the system starts thrashing:  swapping things in and out of virtual memory.

It’s a well known problem with Windows.  As I understand it, it comes from the way that the COPY and XCOPY commands (as well as the copy operation in Windows Explorer) are implemented.  Those commands use the CopyFile or CopyFileEx API functions, which “take advantage” of disk caching.  The suggested workaround is to use a program that creates an empty file and then calls the ReadFile and WriteFile functions to read and write smaller-sized blocks of the file.

That’s idiotic.  There may be very good reasons to use CopyFileEx in favor of ReadFile/WriteFile, but whatever advantages that function has are completely negated if using it causes Windows to cache stupidly and prevent other programs from running. It seems to me that either CopyFileEx should be made a little smarter about caching, or COPY, XCOPY and whatever other parts of Windows use it should be rewritten. There is no excuse for a file copy to consume all memory in the system.

I find it interesting that the TechNet article I linked above recommends using a different program (ESEUTIL, which apparently is part of Exchange) to copy large files.

This problem has been known for a very long time. Can anybody give me a good reason why it hasn’t been addressed? Is there some benefit to have the system commands implemented in this way?

Update, October 16
It might be that Microsoft doesn’t consider this a high priority.  In my opinion it should be given the highest possible priority because it enables what is in effect a denial of service attack.  Copying a large file across the network will cause the source machine to become unresponsive.  As bad as that is for desktop machines, it’s much worse for servers.  Imagine finding that your Web site is unresponsive because you decided to copy a large transaction file from the server.

5 comments to Copying large files on Windows

  • Darrin Chandler

    The problem hasn’t been addressed because it’s such a very low priority. These kinds of problems aren’t cropping up that much for ordinary users, and it’s not hurting sales figures at all. MS has other problems that are hurting sales, and they will be focused on those.

    CopyFileEx probably works fine for typical daily use. The silly caching stuff may actually perform well in those common situations (files up to a certain size, copied to local/removable media). It’s a shame they didn’t put some sensible limits on it to keep it sane.

    I’m afraid you’re going to keep hitting these sorts of problems using Windows in such unorthodox ways. Your app and its data are bound to be outside the limits (in several areas) for which Windows is optimized.

  • Roy Harvey

    Windows is not just a desktop OS. Windows Server is supposed to handle a diverse concurrent workload gracefully. I suspect that Jim is having the problem with one of the Server versions.

  • Darrin’s point really highlights the strength of open source solutions. Limiting the size of files able to be copied to the amount of physical and virtual memory shouldn’t be considered a “low priority” bug.

    Recommended memory for 2008 server is 2GB. You’d better not be copying any oracle datafiles on that machine.

    Dumb arbitrary limits implemented by dumb programming.

  • Jim

    As Roy Harvey pointed out, I’m using Windows Server 2008. This is supposed to be a server operating system, and one would expect it to handle large files without trouble.

    Windows doesn’t refuse to copy the file. In fact, it copies the file just fine. The problem is that the operating system somehow decides that the file operation needs all of the physical memory, which pages every other process’s working set.

    These types of problems aren’t limited to Windows or other closed-source solutions. I’ve run into plenty of similar limitations in open source tools. And it’s not like I can’t write or obtain a different copy utility that tells CopyFileEx not to buffer the file. It’s just that I shouldn’t have to.

  • “Every program starts running as though you’re emulating a Pentium on a 2 MHz 6502, using a cassette tape as virtual memory.”

    Made me LOL.

Categories

Archives

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.