Using reStructuredText to generate an ERD

Recently I had to create an ERD for a project, and I didn’t wanted to use a binary (and mostly a proprietary) format. The reasons for this is that I don’t want to bind myself to a specific program, that I want to be able to use the data in different places/formats, and most importantly that I can version control it (and diff it as text).

Next to Visio, Google Draw, Omnigraffle, etc… there are new players like Skipper, that can be used to easily generate Doctrine models, but these are still a bit to limited for me.

Because I like to be verbose in decisions I make when designing models and how they fit together, I usually start with a basic text file listing entities and there properties. Those text files evolved to MarkDown files, and recently I started using reStuctured text (.rst) files, because it is a very powerfull format (and it has a lot of parsers, which will help me achieve my goals I stated in the beginning of this blog post).

A simple file could start like this:

Entity-Relationship Diagram
===========================

This file is used to describe all entities, their properties and how they relate.

User
----

A user entity is used to...

Fields
++++++

:username: The username a user will use to login to his/her account
:email: ...

Group
-----

A group of users can have specific roles in the application.

Fields
++++++

:name: The group name.
       This should be...
:roles: An list of roles

Using this format, we can add a lot of background information and annotations, while keeping a structured format we can use later.

A lot of viewers (GitHub included) can present the reStructuredText quite good already, including references in the document. If your document gets quite large, the link can be very handy:

...

Group
-----

A group of User_ entities can have specific roles in the application.

...

Now that we use references (the User_ format), we can take it one step further and start listing the relations of the entities:

...

User
----

A user entity is used to...

Fields
++++++

:username: The username a user will use to login to his/her account
:email: ...

Relations
+++++++++

:`Group`_: A user can be part of one group.

...

Because we are using the reference notation again, we can already click link to navigate to the different relations.

Now that we have a base document, we can try to generate diagrams of it.

DocUtils

To easily read the information from the RST files, I recommend using DocUtils to generate an XML version of the source file.

rst2xml.py source.rst > ERD.xml

Graphviz DOT files

The Graphviz DOT language is an powerfull format to generate graphs from a text (dot) file.

digraph graphname {
     a -> b -> c;
     b -> d;
 }

The code above will generate the following graph:

A directed graph (source: wikipedia.org)

Putting it all together

Now we can write some code to parse the XML file and generate a dot file out of it.

When we pipe the output to the Graphviz library, we have our final result!

Win!

This image can get quite large, but Graphviz will find a way to position all the entities.

Some notes/thoughts/improvements:

  • Use `Field Lists`_ for fields.
  • Always include “one” or “multiple” in the relation descriptions.
  • Use {timestamp}, {string} notations in the field descriptions to indicate the type.

Finding memory leaks in PHP objects

I recently spent quite some time figuring out why a (cli) php script was eating all the memory. In PHP, memory leaks mostly show up in long running scripts. In my case, it was doing calculations on (a lot of) database records.

The script crashed all the time, because it was running against the memory_limit. After trying to figure out what was going wrong (using Xdebug’s function traces, PHP’s garbage collection, the unset trick of Paul M. Jones, etc…), I turned to a simple but effective manner to inspect the (problem) object.

Write your object to a file from time to time, and diff it.

Dumping your object:

<?php
file_put_contents(
    '/tmp/myobject_' . time() . '.txt',
    print_r($object, true)
);

// do other logic

file_put_contents(
    '/tmp/myobject_' . time() . '.txt',
    print_r($object, true)
);

Now you can see how big your object is getting (just by watching the filesize):

$ ls -al /tmp/myobject_*

And now you can pick two files to actually see what is causing the problem:

$ diff /tmp/myobject_1357140320.txt /tmp/myobject_1357140450.txt
1236a1237,1247
>             [931277dc8ecbbb394b7f5454f64c5d0c] => Array
>                 (
>                     [hash] => 4143484432
>                     [mtime] => 1357137505
>                     [expire] => 1357141105
>                     [tags] => Array
>                         (
>                         )
>
>                 )

In my case Zend_Db_Profiler was enabled and was keeping track of all the queries in memory.

Using multiple databases in phpunit/dbunit with composer

I’m using multiple databases in most of my projects, so having access to multiple databases in my test suite is a must.

phpunit/dbunit is excellent, but you are stuck with one database. The guys at Etsy created very good extensions to fix this problem (MultipleDatabase), but it took me a while to figure out how to use it.

Because PHPUnit is now available via Composer, you can fetch all dependencies with a single command.

The composer.json file to include all dependencies looks like this:

Install the dependencies by (installing and) running composer:

Now you can create a “parent” class to register the databases (mine is called DatabaseTest). Make sure you create a getDatabaseConfigs method (which is required and should return an array of PHPUnit_Extensions_MultipleDatabase_Database). For the fixtures, I use Xml Datasets, which look like this.

I’ve added a getConnection method, so I can use the same assertions as the normal dbunit testcase (see Database Assertions API):

The magic about to happen is quite cool. PHPUnit will read these database configs and use them to make sure all databases and tables are in a known state before every test and does this in following order:

  1. Connect to all databases
  2. TRUNCATE all tables supplied in the database fixture file
  3. Insert all rows supplied in the fixture file
  4. Execute the test
  5. TRUNCATE all tables supplied in the database fixture file

Now you’ll be able to run tests for code making changes in your database without affecting other tests.

Some improvements:

  • Add composer support to Hamcrest (once Hamcrest PHP has moved to GitHub).
  • Add composer support to the Etsy extensions (I’ve submitted a PR, but it won’t work until Hamcrest PHP is on GitHub).
  • See if the getConnection() method can be done in a better way.

make test every PHP version and send feedback

Since David Coallier’s talk during PHPBenelux, I realized the importance of running make test on all PHP releases and send feedback to PHP.

It is so easy, that I will run it from now on every time a new release is announced.

If you’re running on Mac, you might want to install Xcode, so you can run “make” on command line. If you’re on Linux, you’re all set to go.

How to do it:

  • Open a shell
  • Create a directory (e.g. mkdir ~/src/php)
  • Download the latest version into the directory (e.g. wget http://downloads.php.net/stas/php-5.4.0RC6.tar.gz)
  • Untar the file (e.g. tar -xzf php-5.4.0RC6.tar.gz)
  • Go into the directory and run ./configure, you should get output like:
    checking for grep that handles long lines and -e... /usr/bin/grep
    checking for egrep... /usr/bin/grep -E
    checking for a sed that does not truncate output... /usr/bin/sed
    checking build system type... i386-apple-darwin11.2.0
    checking host system type... i386-apple-darwin11.2.0
    checking target system type... i386-apple-darwin11.2.0
    ...
    Thank you for using PHP. 
  • After that, you can run make and should get a lot of output ending in:
    Build complete.
    Don't forget to run 'make test'.
  • After that, you should do as the previous command suggests and run make test. You should see all tests passing by.
  • After all tests are run, you will get a summary. In my case, I get the message: “You may have found a problem in PHP.”
    Bug #55509 (segfault on x86_64 using more than 2G memory) [Zend/tests/bug55509.phpt]
    Sort with SORT_LOCALE_STRING [ext/standard/tests/array/locale_sort.phpt]
  • Whether you get an error or not, you should always send the report back to PHP. You can do that by just answering Y to the question: “Do you want to send this report now?”
  • Enter your email address and PHP will thank you.

How hard was that?

SOAP-ERROR: Parsing WSDL

If you’re writing or consuming webservices with PHP SOAP, it’s possible you run into the SOAP-ERROR: Parsing WSDL problem once.

The complete error string is:
SOAP-ERROR: Parsing WSDL: Couldn't load from 'http://host/service?wsdl' : <specific error>

If you copy and paste the url (including the ?wsdl parameter) in the browser and you see the WSDL file, the problem lies in the fact that the PHP cannot reach the host.

When you connect to http://host/service, PHP fetches the XML from the WSDL page via fopen(‘http://host/service?wsdl‘) so it can use it to handle the request. In some cases, that request is not routed correct, resulting in the SOAP-ERROR.

Some solutions:

  • Add the hostname in the hostfile of the server (127.0.0.1 hostname).
  • Add the hostname or IP address in the correct VirtualHost (ServerAlias hostname).

You can test the code by adding a file on the server:

<?php
echo htmlentities(file_get_contents('http://host/service?wdsl'));

That way, you know if the server can reach and read the XML file.

Keeping track of your config files

I know and use Subversion properties (and the keyword substitution) for quite a while now, but never used all of them and mostly stayed with the Id keyword.

This results in a substituted string like this:

$Id: ProductController.php 227 2010-04-28 08:25:32Z jachim $

Because my colleague Arno and myself do a lot of server maintenance and configuration, we ended up maintaining a lot of configuration files in a dedicated repostitory. The big problem here is the fact that you need discipline to check in the changes in Subversion and exporting them into place (manually or with a hook).

In order to help us pointing out which files are in the repository and where, we’ve added 2 keywords in every file:

// $HeadURL$
// $Id$

Which gets nicely transformed into usefull information on the servers’ filesystem:

// $HeadURL: http://server/trunk/dnsserver/var/named/chroot/etc/zones/dmz.zones $
// $Id: dmz.zones 1889 2010-05-31 12:26:20Z jachim $

Custom URL’s in Zend Framework

I am searching for a url solution in Zend Framework for quite a while now, and I hope the solution I came up with helps other people trying to do the same thing.

More important, I hope more experienced (Zend Framework) developers can point out problems or suggest better solutions for this problem.

The CMS of my company makes it possible for users to create a tree structure of products/sections/chapters/etc… with an unlimited depth. This makes the url they ‘generate’ very variable.

I’ve searched the internet for solutions and for quite a while I used the solution of Jani. Because this completely overrides the Router object, I continued my search via the Zend MVC mailing list and had an extensive look at the Zend_Controller_Router manual page.

During the search Ivo Jansch tweeted about ZF URL’s and pointed me towards chaining, but the solution for my specific problem lies in adding Routes to the route stack.

I wanted to override the default route so that a language is required in the url. To accomplisch this I added an new function in the Bootstrap.php file:

protected function _initLanguageRoute()
{
    $this->bootstrap('frontController');
    $fc = $this->frontController;
    $router = $fc->getRouter();
    $route = new Zend_Controller_Router_Route(
        ':language/:controller/:action/*',
        array(
            'module' => 'default',
            'language' => 'nl',
            'controller' => 'index',
            'action' => 'index'
        ),
        array(
            'language' => '[a-z]{2}'
        ),
        $translator
    );
    $router->addRoute('default', $route);
}

To add a variable route (calculated by the CMS), I extended the Zend_Controller_Router_Route_Abstract class so the route could also be used by the Url View helper.

The database I use below, has calculated colums already, but a custom mapping can be calculated in the class as well. As long as the request object is pointed to some controller and action.

class Coudenysj_Controller_Router_Route_Mapping extends Zend_Controller_Router_Route_Abstract
{

    private $_db;


    public function __construct($db)
    {
        $this->_db = $db;
    }

    /**
     * A method to publish the way the route operates.
     *
     * @see Zend/Controller/Router/Rewrite.php:392
     *
     * @return int
     */
    public function getVersion()
    {
        return 1;
    }

    /**
     * The required functions (required by Interface).
     *
     * @param Zend_Config $config The config object with defaults.
     *
     * @return void
     */
    public static function getInstance(Zend_Config $config)
    {
        return;
    }

    /**
     * Matches a user submitted path with a previously defined route.
     * Assigns and returns an array of defaults on a successful match.
     *
     * @param string $path Path used to match against this routing map
     *
     * @return array|false An array of assigned values or a false on a mismatch
     */
    public function match($path)
    {
        $path = trim($path, '/');

        return $this->_db->fetchRow(
            'SELECT lang, controller, action, id
            FROM mapping
            WHERE url = ?;',
            array($path)
        );
    }

    /**
     * Assembles a URL path defined by this route
     *
     * @param array   $data    An array of variable and value pairs used as parameters
     * @param boolean $reset   Not used (required by interface)
     * @param boolean $encode  Not used (required by interface)
     * @param boolean $partial Not used (required by interface)
     *
     * @return string|false Route path with user submitted parameters
     */
    public function assemble($data = array(), $reset = false, $encode = false, $partial = false)
    {
        if (!isset($data['language'])) {
            if (Zend_Registry::isRegistered('Zend_Locale')) {
                $locale = Zend_Registry::get('Zend_Locale');
                $data['language'] = $locale->getLanguage();
            } else {
                return false;
            }
        }
        if (   !isset($data['controller'])
            || !isset($data['action'])
            || !isset($data['id'])
        ) {
            return false;
        }

        $result = $this->_db->fetchRow(
            'SELECT url
            FROM mapping
            WHERE lang = ? AND controller = ? AND action = ? AND id = ?;',
            array(
                $data['language'],
                $data['controller'],
                $data['action'],
                $data['id'],
            )
        );
        return $result['url'];
    }
}

Now I can add a new ‘cms’ route to the Router object in the Bootstrap.php file:

protected function _initMappingRoute()
{
    $db = new Zend_Db::factory(...);
    $this->bootstrap('frontController');
    $fc = $this->frontController;
    $router = $fc->getRouter();
    $router->addRoute(
        'cms', new Coudenysj_Controller_Router_Route_Mapping($db)
    );
}

Remember that the route stack is processed backwards, so the ‘cms’ route is the first one to try and route the url.

This route can now also be used in the view:

echo $this->url(
    array(
        'id' => $id,
        'controller' => $controller,
        'action' => $action
    ),
    'cms' // the name of our route
);

This solution works for me (for now) and suggestions to improve it are very welcome!

3 years ago in Cambridge

Studio 24 logoIt has been around three years now, since I started my internship at Studio 24.

Studio 24 is a Cambrigde (UK) based web agency with over 10 years of expierence, happy to share their knowledge with students willing to learn.

In the last year of my training MCT at Howest (University College West Flanders), I decided to do my internship abroad, so I ended up in the beautiful Cambridge for over 3 months.

With the support of my house mates Leena (SE), Chris (AU), Matthew (UK) & Alex (UK) and my bosses/colleagues Simon, Jonathan, Richard, Edward, David, Dave & Gaëlle, I found my way through Cambridge and the professional web world.

A lot of what I learned in those 3 months, I still use and advocate/implement in my current company. Some examples of technologies, procedures, software, etc… I discoverd or worked with:

Using all these things and the variaty of customers Studio 24 has, resulted in an astonishing list of projects I worked on. So much, I only could use a thirth of them in my presentation for the jury back at home.

Here’s an extraction:

I had a great time, and encourage every student to have a look at the Lifelon Learning programme of the EU, which helped me pay for my accommodation.

Looking back, I’m glad everyone around me ‘pushed’ me in the Cambridge adventure and I would do it again, if I had the change!

Deleting Subversion repository files (for real)

Keeping files and directories in the repository is one of the key principles of Subversion, so once you’ve committed something, it’s there for ever. You can delete files, but they still exist somewhere in the repository, so you can go back in time.

But there is always that time where you’ve (accidentally) committed a password file, a directory full of hi-res images, or some other contents you don’t want other people to see that you want to get rid off. That’s where the hard part starts…

After searching the internet and checking the Subversion FAQ it looks quite hard, but with some guidance, you’ll find out it’s not.

Finding the problems

First you have to do a (complete) checkout of the repository you want to clean:

svn co http://svn.apache.org/repos/asf/ asf

Now you can start to locate the problems and delete the files/directories (not svn delete!):

rm -Rf subversion/trunk/tools/buildbot;
rm -Rf subversion/trunk/README;
rm -Rf subversion/trunk/build;

When you’re done delete files and directories, you can generate a list of ‘missing’ files.

Checking your files:

svn status
!      subversion/trunk/tools/buildbot
!      subversion/trunk/README
!      subversion/trunk/build

Generating that list (outside the working copy):

svn status | sed s/"!      "// > ../filter.txt

Fixing the problems

Now you have a nice list of files to delete (make sure it includes the parent directories, right to the root), you should login on the server hosting the repository.

We first want to make sure there is a backup:

svnadmin dump file:///var/svn/asf > ~/backup_svn/asf.dump

Now we can use that backup file as the input of file for the svndumpfilter command. In combination with the filter list we’ve generated on the client, we can create a filtered dump version:

svndumpfilter exclude `cat filter.txt` < ~/backup_svn/asf.dump > asf_filtered.dump

To load that file back in the repository, we should ‘delete’ the original repository. (The httpd commands are just to make sure no one commits while processing the changes).

/etc/init.d/httpd stop;
mv /var/svn/asf ~/backup_svn/asf;
svnadmin create --fs-type fsfs /var/svn/asf;
svnadmin load /var/svn/asf &lt; asf_filtered.dump;
/etc/init.d/httpd start;

Please note that directories and command line options can be different, but the outcome should be the same.

Now we have the same repository, without the (accidentally) committed files/directories!

New problems

After the filtering, it is possible that complete revisions are empty. It is possible to skip empty revisions, but then all revisions are renumbered, and that could be problematic for other software (e.g. Trac).

Hostnames in Logwatch reports

Where I work, we have a lot of servers to maintain, and only 2 server admins (me and my colleague). We use Nagios to keep us informed about the server status and Logwatch to analyze to server logs on a daily basis.

We have per server a lot of subdomains/vhosts and these virtual hosts all write into their own log (blog.jachim.be_acces_log, www.jachim.be_error_log, etc…).

The log entries look like this:

192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /a/i/red_cube.png HTTP/1.0" 200 190
192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /a/i/search/search_icon.gif HTTP/1.0" 200 428
192.168.200.6 - - [10/Nov/2009:09:55:41 +0100] "GET /index.php HTTP/1.0" 200 6541

When Logwatch merges all the httpd log files, the host information (in the log filename) is lost, resulting in Logwatch reports like this:

Requests with error response codes
    401 Unauthorized
       /: 4 Time(s)
       /a/i/blue_cube.png: 1 Time(s)
       /favicon.ico: 2 Time(s)
       /wp/login: 2 Time(s)
...

We actually want reports like this:

Requests with error response codes
    401 Unauthorized
       www.jachim.be/: 4 Time(s)
       jachim.be/a/i/blue_cube.png: 1 Time(s)
       blog.jachim.be/favicon.ico: 2 Time(s)
       blog.jachim.be/wp/login: 2 Time(s)
...

Now we have all the information we want and are able to fix the possible problems much easier.

Because this is not possible in Logwatch (see mailinglist), I’ve added it in the Apache logs.

I’ve added a new logformat named logwatch in httpd.conf:

LogFormat "%h %l %u %t \"%m %{Host}i%U%q %H\" %>s %b" logwatch

Now the new format is available and can be used in the Virtual Host:

CustomLog logs/www.jachim.be-access_log logwatch

Resources: