Deleting Subversion repository files (for real)

Keeping files and directories in the repository is one of the key principles of Subversion, so once you’ve committed something, it’s there for ever. You can delete files, but they still exist somewhere in the repository, so you can go back in time.

But there is always that time where you’ve (accidentally) committed a password file, a directory full of hi-res images, or some other contents you don’t want other people to see that you want to get rid off. That’s where the hard part starts…

After searching the internet and checking the Subversion FAQ it looks quite hard, but with some guidance, you’ll find out it’s not.

Finding the problems

First you have to do a (complete) checkout of the repository you want to clean:

svn co http://svn.apache.org/repos/asf/ asf

Now you can start to locate the problems and delete the files/directories (not svn delete!):

rm -Rf subversion/trunk/tools/buildbot;
rm -Rf subversion/trunk/README;
rm -Rf subversion/trunk/build;

When you’re done delete files and directories, you can generate a list of ‘missing’ files.

Checking your files:

svn status
!      subversion/trunk/tools/buildbot
!      subversion/trunk/README
!      subversion/trunk/build

Generating that list (outside the working copy):

svn status | sed s/"!      "// > ../filter.txt

Fixing the problems

Now you have a nice list of files to delete (make sure it includes the parent directories, right to the root), you should login on the server hosting the repository.

We first want to make sure there is a backup:

svnadmin dump file:///var/svn/asf > ~/backup_svn/asf.dump

Now we can use that backup file as the input of file for the svndumpfilter command. In combination with the filter list we’ve generated on the client, we can create a filtered dump version:

svndumpfilter exclude `cat filter.txt` < ~/backup_svn/asf.dump > asf_filtered.dump

To load that file back in the repository, we should ‘delete’ the original repository. (The httpd commands are just to make sure no one commits while processing the changes).

/etc/init.d/httpd stop;
mv /var/svn/asf ~/backup_svn/asf;
svnadmin create --fs-type fsfs /var/svn/asf;
svnadmin load /var/svn/asf &lt; asf_filtered.dump;
/etc/init.d/httpd start;

Please note that directories and command line options can be different, but the outcome should be the same.

Now we have the same repository, without the (accidentally) committed files/directories!

New problems

After the filtering, it is possible that complete revisions are empty. It is possible to skip empty revisions, but then all revisions are renumbered, and that could be problematic for other software (e.g. Trac).

Hostnames in Logwatch reports

Where I work, we have a lot of servers to maintain, and only 2 server admins (me and my colleague). We use Nagios to keep us informed about the server status and Logwatch to analyze to server logs on a daily basis.

We have per server a lot of subdomains/vhosts and these virtual hosts all write into their own log (blog.jachim.be_acces_log, www.jachim.be_error_log, etc…).

The log entries look like this:

192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /a/i/red_cube.png HTTP/1.0" 200 190
192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /a/i/search/search_icon.gif HTTP/1.0" 200 428
192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /index.php HTTP/1.0" 200 6541

When Logwatch merges all the httpd log files, the host information (in the log filename) is lost, resulting in Logwatch reports like this:

Requests with error response codes
    401 Unauthorized
       /: 4 Time(s)
       /a/i/blue_cube.png: 1 Time(s)
       /favicon.ico: 2 Time(s)
       /wp/login: 2 Time(s)
...

We actually want reports like this:

Requests with error response codes
    401 Unauthorized
       www.jachim.be/: 4 Time(s)
       jachim.be/a/i/blue_cube.png: 1 Time(s)
       blog.jachim.be/favicon.ico: 2 Time(s)
       blog.jachim.be/wp/login: 2 Time(s)
...

Now we have all the information we want and are able to fix the possible problems much easier.

Because this is not possible in Logwatch (see mailinglist), I’ve added it in the Apache logs.

I’ve added a new logformat named logwatch in httpd.conf:

LogFormat "%h %l %u %t \"%m %{Host}i%U%q %H\" %>s %b" logwatch

Now the new format is available and can be used in the Virtual Host:

CustomLog logs/www.jachim.be-access_log logwatch

Resources:

My personal home server – part 1

centos_logo

A month ago I moved to my new house (yay) and I’d promised Joggink I would set up a home server we could use to play with.

Several weeks later, I’ve managed (as in: finally had time, rather than: it was complex) to do a complete install of CentOS on our ‘server’.

No I’m waiting on my router (some D-Link, I forgot the type) to complete the access to the internet and make sure it’s secure!

Bear with me 😉