An XML-based language supported structured regular expressions for parsing semi-structured documents:
Example:
MOLECULAR POINT GROUP : C2
MOLECULAR POINT GROUP : D3h
...template...
<primitive name="pointgroup.p"
regexp=" MOLECULAR POINT GROUP :\s+(.*\S)">
<scalar dictRef="cml:pointgroup">{$1}</scalar>
</primitive>
...output...
<scalar dictRef="cml:pointgroup">C2</scalar> <scalar dictRef="cml:pointgroup">D3h</scalar>tested on 750000 MOPAC jobs