Motivation

The problem

As a systems integrator I found that the increasing presence of object-oriented systems requiring well structured data meant that I was constantly writing integration code to transform raw output into a suitable format for these systems - which quite often was the XML format.

I also found that seasoned application developers tended to dislike writing code for handling this kind of integration issue as it often detracted from their primary focus of pure applications development.

The immediate solution

To transform raw data into XML format I found myself writing specific and custom shell scripts, awk scripts, Java stream readers, etc all of which is easy to do but introduces new issues e.g. the script must be packaged within the application, it must be maintained and version controlled in the source base, it may also have its own set of dependencies, time must be set aside to write and test the script, etc, etc.

This immediate solution would often start off using just a small amount of code and quickly grow rather hairy turning into several hundred lines of hacked style coding whose implementation was limited by design to specific customer requirements.

The immediate solution is fine as a one-off, however because I found the frequency of doing such integration work is steadily increasing I now consider the immediate solution as a bit of a concern.

Other alternatives

Searching through the web in 2008 I found three significant efforts to address this problem:

  • The libexpat and libxml APIs

    These are sophisticated and API rich software libraries that cater for most if not all XML transformations. My experiences with libexpat and libxml so far have been:

    • They are powerful and can sometimes get a little complex.
    • They require a sound working knowledge of XML concepts.
    • They require you to write a C program and link into it.
    • They may not be installed or available on the target host.
    • They have allowed the development of interesting tools like xmlgawk and perl's XML parser.

  • Java XML processors

    A myriad of mostly heavyweight choices are available, but going down this path requires you to lug and load a JVM, or at least make it a dependency - which may not be practical.

  • The Un*x XML concept

    This concept is also motivated from the problem I mentioned above. The idea is that all Un*x commands are XML aware and can pipe in and out XML formatted data.

    This is a very ambitious task. Sun Microsystems have sort of done something similar with it's new Java managed operating system and Apple have some MacOSX shell commands that can handle XML data very well, however the challenges are many and remain quite difficult e.g. you will require a standards body to define the big Un*x schema.

Unfortunately these alternatives were not going to solve the above problem the way I wanted.

The xmlfy idea is born!

I wanted something lightweight and flexible that would have the minimum amount of dependencies and would also work on as many platforms as possible without the need for customisations.

I wanted something to put an end to writing those shell scripts, awk scripts, etc for this purpose but still have the flexibility to adapt for differing requirements.

I only wanted something that mainly worked one way (unformatted text to XML format), I wanted to keep the code base portable, and I wanted something that was easy to adopt, use, and produce consistent outcomes across new, old, and different platforms.

Essentially, I just wanted to XML-fy my data quickly, easily and reliably.