Today I tried to use the Windows FINDSTR command to find occurrences of a particular string in a large (33 gigabyte) text file. Simple enough, right?
findstr /L "xyzzy" bigfile.xml
FINDSTR immediately started giving me errors:
FINDSTR: Line 408555 is too long. FINDSTR: Line 432128 is too long. FINDSTR: Line 801201 is too long. FINDSTR: Line 927897 is too long. FINDSTR: Line 939189 is too long. FINDSTR: Line 939189 is too long. FINDSTR: Line 939189 is too long. FINDSTR: Line 1006538 is too long FINDSTR: Line 1579088 is too long.
I couldn’t imagine why it would tell me that the lines are too long. Unfortunately, I’m not able to view the file in-place because after all this time there still isn’t a decent text file viewer that can handle a file that large. I can get kind of close with less, although there are problems displaying non-US character sets in a Windows console program.
In any case, if I extract those lines to a file (by writing a program that scans the big file and pulls out the lines in question), I was able to determine that the shortest of the lines listed above is about 3,500 characters long and the longest is about 25,000 characters. And here’s the kicker: running the same FINDSTR command on that file results in no errors.
I also noticed that FINDSTR told me three different times that line number 939,189 is too long.
Obviously, “line is too long” is a catch-all message for a number of different errors. FINDSTR has some issues. Some time ago, I said that FINDSTR was marginally useful. After today, I’d say it’s even less useful than I thought it was then.
GNU grep for Windows, by the way, has no problems with the file. The only reason I used FINDSTR is because I don’t have GNU grep installed on the server where the file exists.
Oh, and Microsoft still hasn’t fixed that idiotic file caching bug.