Tuesday, September 13, 2011

File entropy calculator

A long time ago ... I got a request from a colleague: how could we, given a bunch of files, sort out the ones that could be encrypted?

I remembered that encrypted files tend to exhibit an entropy that's higher than the usual file, so I wrote a quick python script - I was learning the language - and used it on our large dataset. This was really useful and we were able to quickly find all the encrypted files.

A few false positives were caught: mostly compressed files. Feel free to drop me a line if you find this useful or if you find any bug.

Git repository