EBNF Parser Library

Installation

NPM package

The library can be installed using the following command:

npm install @liquescens/ebnf

This installation allows you to use the library with type checking and IntelliSense support.

Runtime Configuration

To ensure runtime code support on a web page, you need to configure JavaScript module import mapping.

You can use the version available via CDN for this purpose:

<html>
<head>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@liquescens/ebnf/default.css">
    <script type="importmap">
        {
            "imports":
            {
                "@liquescens/ebnf": "https://cdn.jsdelivr.net/npm/@liquescens/ebnf/index.js",
            }
        }
    </script>
</head>
</html>

Usage Examples

Using the Default Configuration

import * as EBNF from '@liquescens/ebnf';

const grammar_text = 'rule = "a" | "b";';
const grammar = EBNF.parse(grammar_text);

Choosing a Predefined Configuration

import * as EBNF from '@liquescens/ebnf';

const grammar_text = 'rule = "a" | "b";';
const lexer_configuration = EBNF.LexerConfiguration.iso_14977;
const lexer = new EBNF.Lexer(grammar_text, lexer_configuration);
const grammar = new EBNF.Parser(lexer).parse();

Generating DOM and string

import * as EBNF from '@liquescens/ebnf';

const grammar_text = `(* example *)
digit = "0" | "1" | ? another digit symbol ?;
number = digit, { digit };
sum = number, "+", number;
mul = sum, "*", sum;`;
const grammar = EBNF.parse(grammar_text);

const grammar_as_dom = EBNF.toDom(grammar);
const grammar_as_text = grammar.toString();

// This code is responsible for rendering 
// the example in real time on the documentation page.
const current_section = window.page.section(import.meta);
current_section?.append(grammar_as_dom);
current_section?.appendPreformattedText(grammar_as_text);

Removing Whitespaces

import * as EBNF from '@liquescens/ebnf';

const grammar_url = 'grammars/iso-iec-14977-1996/ebnf-from-wikipedia-corrected.ebnf.txt';
const grammar_text = await (await fetch(grammar_url)).text();
const grammar = EBNF.parse(grammar_text);

EBNF.Utilities.removeGap(grammar, { comments: true });

const grammar_as_dom = EBNF.toDom(grammar);
const grammar_as_text = grammar.toString();
const current_section = window.page.section(import.meta);
current_section?.append(grammar_as_dom);
current_section?.appendPreformattedText(grammar_as_text);

Formatting

import * as EBNF from '@liquescens/ebnf';

const grammar_url = 'grammars/iso-iec-14977-1996/ebnf-from-wikipedia-corrected.ebnf.txt';
const grammar_text = await (await fetch(grammar_url)).text();
const grammar = EBNF.parse(grammar_text);

EBNF.Utilities.removeGap(grammar);
EBNF.Utilities.format(grammar);

const grammar_as_dom = EBNF.toDom(grammar);
const grammar_as_text = grammar.toString();
const current_section = window.page.section(import.meta);
current_section?.append(grammar_as_dom);
current_section?.appendPreformattedText(grammar_as_text);

Normalization

Notation Variations

The following examples demonstrate how to adapt the parser to recognize different syntax styles. This can be useful in simple cases. For more complex scenarios, using a parser generator like @liquescens/parser-generator might be a better option.

Pascal-like (Wikipedia)

Wikipedia provides an example grammar for a language inspired by Pascal.

(* a simple program syntax in EBNF - Wikipedia *)
program = 'PROGRAM', white space, identifier, white space, 
        'BEGIN', white space, 
        { assignment, ";", white space },
        'END.' ;
identifier = alphabetic character, { alphabetic character | digit } ;
number = [ "-" ], digit, { digit } ;
string = '"' , { all characters - '"' }, '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                    | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                    | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                    | "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;

This grammar does not comply with the standard as it allows whitespace characters in identifier names. The example below modifies the set of allowed characters in identifiers based on the ISO standard, enabling proper parsing.

import * as EBNF from '@liquescens/ebnf';

const grammar_url = 'grammars/wikipedia/pascal-like.ebnf.txt';
const grammar_text = await (await fetch(grammar_url)).text();
const identifier_pattern = `[a-zA-Z][^'"=,|()\\[\\]{}\\-.;]*`;
const patterns = EBNF.Variants.ISO_14977.createLexerPatterns();
patterns.identifier = { pattern: identifier_pattern };
const lexer_configuration = new EBNF.LexerConfiguration('Pascal', patterns);
const lexer = new EBNF.Lexer(grammar_text, lexer_configuration);
const grammar = new EBNF.Parser(lexer).parse();

const current_section = window.page.section(import.meta);
current_section?.append(EBNF.toDom(grammar));

Postal address (Wikipedia)

Wikipedia also includes an example of a postal address grammar written in BNF format.

 <postal-address> ::= <name-part> <street-address> <zip-part>
      <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <personal-part> <name-part>
  <personal-part> ::= <first-name> | <initial> "."
 <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>
       <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
    <opt-apt-num> ::= "Apt" <apt-num> | ""

The BNF format differs from the EBNF format in the following ways:

Identifier names are enclosed in angle brackets.
The definition symbol is ::=.
There is no terminator symbol.

Due to these differences, adjustments are required not only in the lexer configuration but also in the parser's logic. The example below overrides the _parseDefinition and _parseTerminal methods.

import * as EBNF from '@liquescens/ebnf';

class BNFParser extends EBNF.Parser
{
    static lexerConfiguration()
    {
        const patterns = EBNF.Variants.ISO_14977.createLexerPatterns();
        patterns.identifier = { pattern: `<[^>]+>` };
        patterns.identifier_beginning = { pattern: `<` };
        patterns.defining = { pattern: '::=' };
        const characters_1 = '[a-zA-Z0-9,=|/!)\\]}\\-\"*?(\\[{;. :+_%@&#\\$<>\\\\\\^`~]|\\*\\)|/\\)|:\\)|\\(/|\\(:';
        const characters_2 = '[a-zA-Z0-9,=|/!)\\]}\\-\'*?(\\[{;. :+_%@&#\\$<>\\\\\\^`~]|\\*\\)|/\\)|:\\)|\\(/|\\(:';
        const pattern_1 = `(?\')(?(${characters_1})*)(?\')`;
        const pattern_2 = `(?\")(?(${characters_2})*)(?\")`;
        patterns.terminal_string = { pattern: `${pattern_1}|${pattern_2}` };
        return new EBNF.LexerConfiguration('BNF', patterns);
    }

    /** @override */
    _parseDefinition()
    {
        const { items: definitions } = this._parseList(() => this._parseTerm());
        return new EBNF.Tree.Definition(definitions);
    }

    /**
     * @override
     * @template {EBNF.KeysMatchingType} T
     * @param {T} symbol_name
     * @returns {EBNF.LexerTokenTypeMap[T]}
     */
    _parseTerminal(symbol_name)
    {
        if (symbol_name === 'terminator')
        {
            const gap = this._parseGap();
            const terminator = new EBNF.Tree.Token('', 'terminator');
            terminator.gap.push(...gap.map(item => item.value));
            // @ts-ignore
            return terminator;
        }
        return super._parseTerminal(symbol_name);
    }
    
    /**
     * @template {EBNF.Tree.Factor | EBNF.Tree.Identifier | EBNF.Tree.BracketedSequence | EBNF.Tree.SpecialSequence | EBNF.Tree.InfixTerm} T
     * @param {() => T} parseItem 
     * @returns {{ items: T[] }}
     */
    _parseList(parseItem)
    {
        const item = parseItem();
        const items = [item];
        while (true)
        {
            if (this._isListTerminator()) break;
            const item = parseItem();
            items.push(item);
        }
        return { items };
    }

    _isListTerminator()
    {
        const token_1_gap = this._parseGap();
        const token_1 = this.lexer.pop();
        if (!token_1) return true;
        const result = token_1.name === 'terminator' || token_1.name === 'definition_separator' ? true 
            : token_1.name !== 'identifier' ? false : undefined;
        if (result !== undefined)
        {
            this.lexer.undo(...token_1_gap, token_1);
            return result;
        }
        const token_2_gap = this._parseGap();
        const token_2 = this.lexer.top();
        this.lexer.undo(...token_1_gap, token_1, ...token_2_gap);
        return token_2?.name === 'defining';
    }
}

const grammar_text = await (await fetch('grammars/wikipedia/postal-address.bnf.txt')).text();
const lexer = new EBNF.Lexer(grammar_text, BNFParser.lexerConfiguration());
const grammar = new BNFParser(lexer).parse();

const current_section = window.page.section(import.meta);
current_section?.append(EBNF.toDom(grammar));