Deleting Subversion repository files (for real)

Keeping files and directories in the repository is one of the key principles of Subversion, so once you’ve committed something, it’s there for ever. You can delete files, but they still exist somewhere in the repository, so you can go back in time.

But there is always that time where you’ve (accidentally) committed a password file, a directory full of hi-res images, or some other contents you don’t want other people to see that you want to get rid off. That’s where the hard part starts…

After searching the internet and checking the Subversion FAQ it looks quite hard, but with some guidance, you’ll find out it’s not.

Finding the problems

First you have to do a (complete) checkout of the repository you want to clean:

svn co http://svn.apache.org/repos/asf/ asf

Now you can start to locate the problems and delete the files/directories (not svn delete!):

rm -Rf subversion/trunk/tools/buildbot;
rm -Rf subversion/trunk/README;
rm -Rf subversion/trunk/build;

When you’re done delete files and directories, you can generate a list of ‘missing’ files.

Checking your files:

svn status
!      subversion/trunk/tools/buildbot
!      subversion/trunk/README
!      subversion/trunk/build

Generating that list (outside the working copy):

svn status | sed s/"!      "// > ../filter.txt

Fixing the problems

Now you have a nice list of files to delete (make sure it includes the parent directories, right to the root), you should login on the server hosting the repository.

We first want to make sure there is a backup:

svnadmin dump file:///var/svn/asf > ~/backup_svn/asf.dump

Now we can use that backup file as the input of file for the svndumpfilter command. In combination with the filter list we’ve generated on the client, we can create a filtered dump version:

svndumpfilter exclude `cat filter.txt` < ~/backup_svn/asf.dump > asf_filtered.dump

To load that file back in the repository, we should ‘delete’ the original repository. (The httpd commands are just to make sure no one commits while processing the changes).

/etc/init.d/httpd stop;
mv /var/svn/asf ~/backup_svn/asf;
svnadmin create --fs-type fsfs /var/svn/asf;
svnadmin load /var/svn/asf &lt; asf_filtered.dump;
/etc/init.d/httpd start;

Please note that directories and command line options can be different, but the outcome should be the same.

Now we have the same repository, without the (accidentally) committed files/directories!

New problems

After the filtering, it is possible that complete revisions are empty. It is possible to skip empty revisions, but then all revisions are renumbered, and that could be problematic for other software (e.g. Trac).