AustenX (sometimes just called "Austen") is a parser generator that uses Parsing Expression Grammars (PEGs), and a Packrat Parsing derived algorithm. Unlike other PEG parsers Austen currently uses an initial tokenisation step to convert the input into tokens, which are then handled by the grammar parser. This tokenisation can be done as part of the Austen package, and allows a particular token to be a member of more than one token class.
In essence, Austen is a tool for generating program code that can be used to parse text files based on a specialised language describing the syntax and grammar of the text to be read. Currently, on Java code can be generated.
AustenX has a number of significant features. These can be summarised as follows:
- Tokenisation, allowing possible gains in memory efficiency.
- The ability to exclude memorisation for particular PEG rules, and even to avoid full memorisation completely.
- A strong functioning support for rules with direct and indirect left-recursion, that works with or without memorisation.
- A better implementation for handling left-recursion that makes sense of the order of options, and allows complex parse trees with precedence to be easily created and specified.
- The specification language is not linked to any specific target language (though only Java is supported as a target currently).
- Extensions to the PEG framework that allow some semantic parsing, and potentially parsing of a greater class of input.
- Austen binary (including runtime library, and example)
- Runtime library binary (including Solar)
- Runtime library binary (minus Solar)
- Austen source only
- Austen runtime source only
Please see also, the common libraries