codemonks.org | bitheap


bitheap is some code for managing an archive of files

You put your archive in some directory (known as the root of the archive) and then build a database that knows the MD5 hash of all the files in there

Then you can examine another tree of files you have somewhere else, discovering which files are known (perhaps under another name) by the archive, and which are novel

Here's the code: it's in python

Note: you must have PostgreSQL and PyGreSQL installed, and have access to a database that shares your name (unless you tweak the source code)