Keeping files and directories in the repository is one of the key principles of Subversion, so once you’ve committed something, it’s there for ever. You can delete files, but they still exist somewhere in the repository, so you can go back in time.
But there is always that time where you’ve (accidentally) committed a password file, a directory full of hi-res images, or some other contents you don’t want other people to see that you want to get rid off. That’s where the hard part starts…
Finding the problems
First you have to do a (complete) checkout of the repository you want to clean:
svn co http://svn.apache.org/repos/asf/ asf
Now you can start to locate the problems and delete the files/directories (not
rm -Rf subversion/trunk/tools/buildbot; rm -Rf subversion/trunk/README; rm -Rf subversion/trunk/build;
When you’re done delete files and directories, you can generate a list of ‘missing’ files.
Checking your files:
svn status ! subversion/trunk/tools/buildbot ! subversion/trunk/README ! subversion/trunk/build
Generating that list (outside the working copy):
svn status | sed s/"! "// > ../filter.txt
Fixing the problems
Now you have a nice list of files to delete (make sure it includes the parent directories, right to the root), you should login on the server hosting the repository.
We first want to make sure there is a backup:
svnadmin dump file:///var/svn/asf > ~/backup_svn/asf.dump
Now we can use that backup file as the input of file for the
svndumpfilter command. In combination with the filter list we’ve generated on the client, we can create a filtered dump version:
svndumpfilter exclude `cat filter.txt` < ~/backup_svn/asf.dump > asf_filtered.dump
To load that file back in the repository, we should ‘delete’ the original repository. (The httpd commands are just to make sure no one commits while processing the changes).
/etc/init.d/httpd stop; mv /var/svn/asf ~/backup_svn/asf; svnadmin create --fs-type fsfs /var/svn/asf; svnadmin load /var/svn/asf < asf_filtered.dump; /etc/init.d/httpd start;
Please note that directories and command line options can be different, but the outcome should be the same.
Now we have the same repository, without the (accidentally) committed files/directories!
After the filtering, it is possible that complete revisions are empty. It is possible to skip empty revisions, but then all revisions are renumbered, and that could be problematic for other software (e.g. Trac).