The abnfgen parses ABNF definitions from source file and generates
output as state machine definition for Ragel. The aim is to make
simplier implementation of parsers defined by ABNF language which is used in
many RFCs.
ABNF
The abnfgen takes one or more ABNF definitions specified at command line. Unless name is recognized as internal rule list name then is treated as name of file containing rule declarations according in ABNF format (RFC2234). If no name is provided or "-" name is used then standard input is read. Input files are processed in order as provided. Because more input files may be processed a conflict may happen when rule name being read already exists in rule list. In such case latter rule overrides former one. But if "=/" is used then rules re merged together. Internal rule list cannot override any other rule Syntax declaration are often located in plain text so must be manually separated and checked if rule definition starting at the begining of line. Even ABNF defines mandatory CRLF line ends then abnfgen accepts also simple LF. Unfortunately even particular RFCs reference ABNF's RFC when declaring own syntax then in many cases the syntax has deviations and won't pass through strict parser, e.g. alternation delimiter "|" instead of "/", numbers are declared as "0xFF" instead of "%xFF" etc. Because many RFCs references ABNF core definitions then these rules are built-in abnfgen as internal "core" rule list. If a problem is detected then a message is written to stderr and no other output is generated unless "-F" option is specifies. It ofter happen if unknown rule name is referenced. There are multiple formats from abnfgen: - ABNF: normalized ABNF format - Ragel state machine definition: it's the main product and will be discussed bellow - abnfgen structures which may be used for compiling internal rule list in abnfgen itselfRagel
Ragel (http://www.cs.queensu.ca/~thurston/ragel/) is fast GNU state machine compiler enabling calling custom actions in any state of processing. The abnfgen generates state machine definition in format known to Ragel but developer must know logic and check if such definition meets requirments. The main problem is ambiguity. It's often mind bending problem recognize which state is ambigious and causes the machine will produce false results or even jump in never ending loop. Note the very good Ragel instrument is scanner which helps to overcome many ambiguities. Also priorities help but it's a kind of magic. The Ragel does not support circular references (easily detectable because raises error message. Such case must be corrected manually probably as separate state machine. The circularily dependend machines will call each other using fcall/fret commands. The rule which the abnfgen takes to be main rule is located as main:= instance. It's simply the last rule which does not depend on any other rule.Examples
# pipe from stdin to stdout abnfgen -f abnf # load my.txt abnfgen my.txt -f ragel -o my.rl # load built-in rules core + abnf + my.txt abnfgen core abnf my.txt -f ragel -o my.rl # read RFC2234 core rules and RFC3261 and print to stdout abnfgen core rfc3261.txt -f ragel
Input
list = 1*(item CRLF) item = name ":" *SP body name = ALPHA *( ALPHA / DIGIT / "_") body = *( %x20-7e )
Ragel
abnfgen core test.txt -f ragel # Generated by abnfgen at Sun Aug 12 15:03:04 2007 # Sources: # core # test.txt %%{ # write your name machine generated_from_abnf; # generated rules, define required actions ALPHA = 0x41..0x5a | 0x61..0x7a; BIT = "0" | "1"; CHAR = 0x01..0x7f; CR = "\r"; LF = "\n"; CRLF = CR LF; CTL = 0x00..0x1f | 0x7f; DIGIT = 0x30..0x39; DQUOTE = "\""; HEXDIG = DIGIT | "A"i | "B"i | "C"i | "D"i | "E"i | "F"i; HTAB = "\t"; SP = " "; WSP = SP | HTAB; LWSP = ( WSP | ( CRLF WSP ) )*; OCTET = 0x00..0xff; VCHAR = 0x21..0x7e; name = ALPHA ( ALPHA | DIGIT | "_" )*; body = 0x20..0x7e*; item = name ":" SP* body; list = ( item CRLF )+; # instantiate machine rules main:= list; }%%
ABNF output
abnfgen core test.txt -f abnf ; Generated by abnfgen at Sun Aug 12 14:58:43 2007 ; Sources: ; core ; test.txt ALPHA = %x41-5a / %x61-7a BIT = "0" / "1" CHAR = %x01-7f CR = %x0d CRLF = CR LF CTL = %x00-1f / %x7f DIGIT = %x30-39 DQUOTE = %x22 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" HTAB = %x09 LF = %x0a LWSP = *( WSP / ( CRLF WSP ) ) OCTET = %x00-ff SP = " " VCHAR = %x21-7e WSP = SP / HTAB list = 1*( item CRLF ) item = name ":" *SP body name = ALPHA *( ALPHA / DIGIT / "_" ) body = *%x20-7e
- Pro vkládání komentářů se musíte přihlásit
Navigate