Permanently deleting large files from git repository

Gurus on Stackoverflow have already answered it. I wrote a script which automate this process. I have written a python script to automate this process. This script accepts size of the file after -s switch (in bytes) and regular expression after -e switch to match against the name of the file. For example, if I want to delete files bigger than 20000 bytes and with names prefixed by pdf then I’ll have to use the script as following :

 python -s 20000 -e .*pdf$ 

It might take a lot of time to complete the job. It writes full branch-tree as many times are their are commits. I believe you know the danger of doing this on a shared repository. Another script which only searches files bigger than a given size and regular pattern. Regular pattern is optional. If it is not given, all files bigger than given size are printed on console. This script is available here. Using it is safe. It does not change the state of repository in any way. You can dump its output to a file and then execute your evil plans accordingly. On github, there is an article on ‘removing sensitive data from github’ or something like that. Do read that article. Happy gitting!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s