Known limitations

Size limits

  • xmlfy reads standard input (stdin) and writes to standard output (stdout). This effectively means that xmlfy has no limits to the input and output size.
  • xmlfy processes one record at a time and uses dynamic memory allocation for the input record and its output fields. This effectively means that xmlfy has very large limits (limited to the available memory on your computer) for handling very large input records.

Performance limits

  • xmlfy uses block IO for faster performance, this is most noticeable when using very large input/output streams.
  • xmlfy has been written for fast execution performance.
  • When using a schema file with complex schema record structures, each xpath traversal combination requires an allocation of memory (the helper). The total helper memory required is directly proportional to the schema tree complexity and can become significant. You can use the debug option to ascertain how much memory xmlfy allocated to this resource for any given schema.
  • xmlfy has been successfully tested on average hardware with input records containing over 10,000,000 fields whilst using a complex schema tree structure and multi level delimiters.

Invocation limits

  • xmlfy requires that you specify your command line options as described in the manual and that you specify your combination of arguments in a sensical way. This is because xmlfy deliberately doesn't use the getopt library in order to keep the code base as portable as possible.
  • You can specify as many XML depth levels and delimiters as you desire (limited to the available memory on your computer).

Character encoding limits

  • xmlfy supports ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE input and output including the null character.
  • xmlfy also provides BOM (Byte-Order-Mark) handling to determine the encoding in use.
  • Code points comprised of multiple bytes are supported and are each treated as one column position.
  • Only the following input characters that are reserved for XML are re-represented in their escaped form. This default behaviour can be modified by specifying options.
    • Character & (ampersand) becomes string &
    • Character < (less-than) becomes string &lt;
    • Character > (greater-than) becomes string &gt;
    • Character " (quote) becomes string &quot;
    • Character ' (apostrophe) becomes string &apos;
    • Character ¦ (broken vertical bar) becomes string &brvbar;
  • xmlfy does not do code-point error checking.

Schema file limits

DTD schema

  • Only recognises the <!ELEMENT> directive and ignores all others.
  • The first valid <!ELEMENT> definition becomes the root element.
  • Element fields that don't have an element definition default to being (#PCDATA).
  • Elements defined as (#PCDATA) or (#CDATA) are ignored causing the referring field to default to (#PCDATA) however it is good practice to include these elements in order to furnish a complete DTD schema.
  • Only honours the +, ? and * wildcard tokens.
  • At this stage does not honour field group sets () and or-ing ¦ syntax tokens.

RNC schema

  • Only recognises named directives and ignores all others.
  • The element named "start" becomes the root element.
  • Element fields that don't have an element definition default to being { text }.
  • Elements defined as { text } are ignored causing the referring field to default to { text } however it is good practice to include these elements in order to furnish a complete RNC schema.
  • Only honours the +, ? and * wildcard tokens.
  • At this stage does not honour field group sets () and or-ing ¦ syntax tokens.

XSD schema

  • Only recognises the <schema>, <element>, <complexType>, <ref>, <sequence>, and <choice> directives and ignores all others.
  • The recognised directives are not fully implemented and their use should be kept straightforward.
  • The first valid <element> definition becomes the root element.
  • Element types that are not of matchable complexType are treated as "xsi:string" regardless of what type is specified.
  • Only honours the minOccurs="0", maxOccurs="0" and maxOccurs="unbounded" wildcard attributes.
  • At this stage does not honour group sets but does do limited support with choices.

All schema types

  • xmlfy only understands ASCII schema files (although it can handle most UTF-8 characters inside a schema file).
  • xmlfy can be easily duped with a schema file syntax that it is not familiar with. This is the area that I hope gets improved upon the most. In the meantime please stick to the schema file conventions as depicted in the documentation.
  • You cannot specify a URI with the --schema options. This is deliberately done so that the xmlfy source code remains simple and portable. Use purpose built tools like wget, curl, scp, ftp, etc to fetch a copy of the schema file from the network and then operate on it locally.
    • E.g.
      #!/bin/sh
      DTD_URI="http://xmlfy.sourceforge.net/schemata/examples/football.dtd" ; DTD_TEMPFILE=/tmp/dtd_temp$$
      wget --output-document=$DTD_TEMPFILE $DTD_URI
      xmlfy --schemadtd $DTD_TEMPFILE -I $DTD_URI < round19.raw > round19.xml

Platform limits

  • Only downloads for 32 bit platforms are provided, however xmlfy successfully ports to 64 bit platforms with no modifications to the source code.
  • Some porting is not complete due to the lack of access to suitable hardware.