Removing Duplicate Lines From a File
I came across a post on Cloudflare Blog where the author was describing his experience on tackling the issue of removing duplicate lines from a large file. The following text is a comment sharing my experience and alternative approach to the task that yields performance results similar to the author’s but without going down the route of coding in C. Funnily, this comment was rejected by Cloudflare Blog moderators. Thus, not to lose a couple of hour’s worth of work, I am putting it out here.
Source: Removing Duplicate Lines From a File, an article by Ivan Pesin.