Using reStructuredText to generate an ERD

Recently I had to create an ERD for a project, and I didn’t wanted to use a binary (and mostly a proprietary) format. The reasons for this is that I don’t want to bind myself to a specific program, that I want to be able to use the data in different places/formats, and most importantly that I can version control it (and diff it as text).

Next to Visio, Google Draw, Omnigraffle, etc… there are new players like Skipper, that can be used to easily generate Doctrine models, but these are still a bit to limited for me.

Because I like to be verbose in decisions I make when designing models and how they fit together, I usually start with a basic text file listing entities and there properties. Those text files evolved to MarkDown files, and recently I started using reStuctured text (.rst) files, because it is a very powerfull format (and it has a lot of parsers, which will help me achieve my goals I stated in the beginning of this blog post).

A simple file could start like this:

Entity-Relationship Diagram
===========================

This file is used to describe all entities, their properties and how they relate.

User
----

A user entity is used to...

Fields
++++++

:username: The username a user will use to login to his/her account
:email: ...

Group
-----

A group of users can have specific roles in the application.

Fields
++++++

:name: The group name.
       This should be...
:roles: An list of roles

Using this format, we can add a lot of background information and annotations, while keeping a structured format we can use later.

A lot of viewers (GitHub included) can present the reStructuredText quite good already, including references in the document. If your document gets quite large, the link can be very handy:

...

Group
-----

A group of User_ entities can have specific roles in the application.

...

Now that we use references (the User_ format), we can take it one step further and start listing the relations of the entities:

...

User
----

A user entity is used to...

Fields
++++++

:username: The username a user will use to login to his/her account
:email: ...

Relations
+++++++++

:`Group`_: A user can be part of one group.

...

Because we are using the reference notation again, we can already click link to navigate to the different relations.

Now that we have a base document, we can try to generate diagrams of it.

DocUtils

To easily read the information from the RST files, I recommend using DocUtils to generate an XML version of the source file.

rst2xml.py source.rst > ERD.xml

Graphviz DOT files

The Graphviz DOT language is an powerfull format to generate graphs from a text (dot) file.

digraph graphname {
     a -> b -> c;
     b -> d;
 }

The code above will generate the following graph:

A directed graph (source: wikipedia.org)

Putting it all together

Now we can write some code to parse the XML file and generate a dot file out of it.

When we pipe the output to the Graphviz library, we have our final result!

Win!

This image can get quite large, but Graphviz will find a way to position all the entities.

Some notes/thoughts/improvements:

  • Use `Field Lists`_ for fields.
  • Always include “one” or “multiple” in the relation descriptions.
  • Use {timestamp}, {string} notations in the field descriptions to indicate the type.

Hostnames in Logwatch reports

Where I work, we have a lot of servers to maintain, and only 2 server admins (me and my colleague). We use Nagios to keep us informed about the server status and Logwatch to analyze to server logs on a daily basis.

We have per server a lot of subdomains/vhosts and these virtual hosts all write into their own log (blog.jachim.be_acces_log, www.jachim.be_error_log, etc…).

The log entries look like this:

192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /a/i/red_cube.png HTTP/1.0" 200 190
192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /a/i/search/search_icon.gif HTTP/1.0" 200 428
192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /index.php HTTP/1.0" 200 6541

When Logwatch merges all the httpd log files, the host information (in the log filename) is lost, resulting in Logwatch reports like this:

Requests with error response codes
    401 Unauthorized
       /: 4 Time(s)
       /a/i/blue_cube.png: 1 Time(s)
       /favicon.ico: 2 Time(s)
       /wp/login: 2 Time(s)
...

We actually want reports like this:

Requests with error response codes
    401 Unauthorized
       www.jachim.be/: 4 Time(s)
       jachim.be/a/i/blue_cube.png: 1 Time(s)
       blog.jachim.be/favicon.ico: 2 Time(s)
       blog.jachim.be/wp/login: 2 Time(s)
...

Now we have all the information we want and are able to fix the possible problems much easier.

Because this is not possible in Logwatch (see mailinglist), I’ve added it in the Apache logs.

I’ve added a new logformat named logwatch in httpd.conf:

LogFormat "%h %l %u %t \"%m %{Host}i%U%q %H\" %>s %b" logwatch

Now the new format is available and can be used in the Virtual Host:

CustomLog logs/www.jachim.be-access_log logwatch

Resources:

Random PHP function

To explore new things, I sometimes use the ‘Random article‘ link on Wikipedia. It is a simple and basic principle, but it gets you to articles you would never get by just browsing the site.

I was looking for something that did the same job for the php.net site, but could not find anything. As a programmer, you’re most likely to end up in the same chapters of the manual over and over again, and a ‘Random function’ link would be nice to find out new stuff.

So I wrote on myself…

The script parses the http://php.net/quickref.php page, and picks a random link to redirect you to. Anyone heard of HaruDoc? 🙂

Try it yourself: Random PHP Function. Good luck!

Stuff to read from Google

Google

Google has some really nice stuff to read, written in an easy to read way, about all sorts of things. It’s a lot about front-end things, but as a back-end developer I’m responsible to build pages & sites with an infrastructure which leverages these things.

The things I found recently: