A new file system primitive?

A problem I was working on recently got me to wishing that I could lop off the front of a file. Kind of like a “truncate at front,” if you will. Truncating a file at the back end is a common operation–something we do without even thinking much about it. But lopping off the front of a file? Sounds ridiculous at first, but only because we’ve been trained to think that it’s impossible. But a lop operation could be useful in some situations.

A simple example (certainly not the only or necessarily the best example) is a queue. You’re adding new items to the end of the file and pulling items out of the file from the front. The file grows over time and there’s a huge empty space at the front. With current file systems, there are several ways around this problem:

  • As each item is removed, copy the remaining items up to replace it, and truncate the file. Although it works, this solution is very expensive time-wise.
  • Monitor the size of the empty space at the front, and when it reaches a particular size or percentage of the entire file size, move everything up and truncate the file. This is much more efficient than the previous solution, but still costs time when items are moved in the file.
  • Implement a circular queue in the file, adding new items to the hole at the front of the file as items are removed. This can be quite efficient, especially if you don’t mind the possibility of things getting out of order in the queue. If you do care about order, there’s the potential of having to move items around. But in general, a circular queue is pretty easy to implement and manages disk space well.

But if there was a lop operation, removing an item from the queue would be as easy as adding one. As easy, in fact, as truncating a file. Why, then, is there no such operation?

Disk files are typically allocated in fixed-size blocks: four kilobytes, for example. If you create a file and only put one character in it, the file system will allocate an entire block for that file. The file allocation table (or equivalent structure) associates the file name with that particular block, and also stores either the length of the file, or the offset in the block of the last used byte.

If a file exceeds the block allocation size, the file system allocates another block, links it to the previous block, and updates the file size or last byte pointer. In this way, files of arbitrary size can be allocated and disk fragmentation is kept, if not to a minimum, at least manageable.

Truncating a file is a very simple and quick operation. If the truncation is within the last allocated block, all the file system has to do is adjust the file size. If truncating removes allocated blocks, the file system adjusts the file size pointer into the final used block and then frees any following blocks that were used by the file.

A lop operation (chopping off the front of the file) would be similarly easy to implement if the file allocation table maintained an “offset to first byte” pointer similar to the file size pointer. If you wanted to lop the file off at position 10, for example, the file system would just have to update the pointer. If a lop occurred in something other than the first block of the file, the file system would have to adjust the pointer into that block, make that block the first block of the file, and then free the previous blocks. Simple and neat.

The only real cost is a few extra bytes for each directory entry in the file allocation table. In the past, when disk space was expensive and precious, the idea of even one extra byte per directory entry was unthinkable. But with terabyte hard drives appearing on the market for $400 (yeah, 40 cents per gigabyte), the idea of saving a few bytes per directory entry is just silly. Even if we allocated four bytes per directory entry (which is overkill, considering that most file systems use block sizes that are less than 64K, which would require only two bytes), we’re only talking one megabyte for every 250,000 files. On my laptop, where I have about 140,000 files occupying 30 gigabytes, that works out a little less than one megabyte per 50 gigabytes, or 0.002% of disk space. Seems like a win to me.

Are there other reasons why file systems don’t implement a lop operation? What other operations could file systems implement if designers started worrying less about using a few extra megabytes here and there?