Top-down recursive descent backtracking PEG scanner-less JIT parser combinator generator.
A high-performance parser library that compiles grammar definitions into efficient JavaScript parsing functions at runtime. It generates both Concrete Syntax Trees (CST) and Abstract Syntax Trees (AST) from textual input.
import {CodegenGrammar} from 'jit-parser'; import {ParseContext} from 'jit-parser'; // Define a simple grammar const grammar = { start: 'Value', cst: { Value: 'hello' } }; // Compile the grammar to JavaScript const parser = CodegenGrammar.compile(grammar); // Parse input const ctx = new ParseContext('hello', false); const cst = parser(ctx, 0); console.log(cst); // CST node representing the parse result
JIT Parser supports five main grammar node types for defining parsing rules. Grammar rules can be fully defined in JSON, making them language-agnostic and easy to serialize.
1. RefNode (Reference Node)References a named node defined elsewhere in the grammar.
Interface:
type RefNode<Name extends string = string> = {r: Name};
Syntax:
Example:
const grammar = { start: 'Program', cst: { Program: {r: 'Statement'}, Statement: 'return;' } };2. TerminalNode (Terminal Node)
Matches literal strings, regular expressions, or arrays of strings. Terminal nodes are leaf nodes in the parse tree.
Interface:
interface TerminalNode { type?: string; // Type name (default: "Text") t: RegExp | string | '' | string[]; // Pattern(s) to match repeat?: '*' | '+'; // Repetition (only for string arrays) sample?: string; // Sample text for generation ast?: AstNodeExpression; // AST transformation } // Shorthand: string, RegExp, or empty string type TerminalNodeShorthand = RegExp | string | '';
Syntax:
// String literal 'hello' // Regular expression /[a-z]+/ // Array of alternatives {t: ['true', 'false']} // With repetition {t: [' ', '\t', '\n'], repeat: '*'} // Full terminal node { t: /\d+/, type: 'Number', sample: '123' }
Examples:
// Simple string terminal Value: 'null' // RegExp terminal Number: /\-?\d+(\.\d+)?/ // Alternative strings Boolean: {t: ['true', 'false']} // Repeating whitespace WS: {t: [' ', '\t', '\n'], repeat: '*'}3. ProductionNode (Production Node)
Matches a sequence of grammar nodes in order. All nodes in the sequence must match for the production to succeed.
Interface:
interface ProductionNode { p: GrammarNode[]; // Sequence of nodes to match type?: string; // Type name (default: "Production") children?: Record<number, string>; // Child index to property mapping ast?: AstNodeExpression; // AST transformation } // Shorthand: array of grammar nodes type ProductionNodeShorthand = GrammarNode[];
Syntax:
// Shorthand array ['{', {r: 'Content'}, '}'] // Full production node { p: ['{', {r: 'Content'}, '}'], type: 'Block', children: { 1: 'content' // Maps index 1 to 'content' property } }
Examples:
// Function call: func() FunctionCall: ['func', '(', ')'] // Object with named children Object: { p: ['{', {r: 'Members'}, '}'], children: { 1: 'members' } }4. UnionNode (Union Node)
Matches one of several alternative patterns. The first matching alternative is selected (ordered choice).
Interface:
interface UnionNode { u: GrammarNode[]; // Array of alternative nodes type?: string; // Type name (default: "Union") ast?: AstNodeExpression; // AST transformation }
Syntax:
{ u: [pattern1, pattern2, pattern3] }
Examples:
// Literal values Literal: { u: ['null', 'true', 'false', {r: 'Number'}, {r: 'String'}] } // Statement types Statement: { u: [ {r: 'IfStatement'}, {r: 'ReturnStatement'}, {r: 'ExpressionStatement'} ] }
Matches zero or more repetitions of a pattern.
Interface:
interface ListNode { l: GrammarNode; // Node to repeat type?: string; // Type name (default: "List") ast?: AstNodeExpression; // AST transformation }
Syntax:
Examples:
// Zero or more statements Statements: { l: {r: 'Statement'} } // Comma-separated list Arguments: { l: { p: [',', {r: 'Expression'}], ast: ['$', '/children/1'] // Extract the expression, ignore comma } }
JIT Parser works with four types of tree structures:
The grammar definition that describes the parsing rules. These are the node types described above that define how to parse input text.
2. CST (Concrete Syntax Tree)The parse tree that contains every matched token and maintains the complete structure of the parsed input.
Interface:
interface CstNode { ptr: Pattern; // Reference to grammar pattern pos: number; // Start position in input end: number; // End position in input children?: CstNode[]; // Child nodes }
Example CST:
// For input: '{"foo": 123}' { ptr: ObjectPattern, pos: 0, end: 13, children: [ {ptr: TextPattern, pos: 0, end: 1}, // '{' {ptr: MembersPattern, pos: 1, end: 12, // '"foo": 123' children: [...] }, {ptr: TextPattern, pos: 12, end: 13} // '}' ] }3. AST (Abstract Syntax Tree)
A simplified tree structure derived from the CST, typically containing only semantically meaningful nodes.
Default AST Interface:
interface CanonicalAstNode { type: string; // Node type pos: number; // Start position end: number; // End position raw?: string; // Raw matched text children?: (CanonicalAstNode | unknown)[]; // Child nodes }
Example AST:
// For input: '{"foo": 123}' { type: 'Object', pos: 0, end: 13, children: [ { type: 'Entry', key: {type: 'String', value: 'foo'}, value: {type: 'Number', value: 123} } ] }CST to AST Conversion Rules
Default Conversion: Each CST node becomes an AST node with type
, pos
, end
, and children
properties.
AST Expressions: Use ast
property in grammar nodes to customize AST generation:
ast: null
- Skip this node in ASTast: ['$', '/children/0']
- Use first child's ASTast: {...}
- Custom JSON expression for transformationChildren Mapping: Use children
property to map CST child indices to AST properties:
{ children: { 0: 'key', // CST child 0 -> AST property 'key' 2: 'value' // CST child 2 -> AST property 'value' } }
Type Override: Specify custom type
property instead of default node type names.
If debug mode is enabled during compilation, the parser captures all grammar node tree paths that were attempted during parsing. This debug trace tree is useful for debugging parser behavior and improving parser performance by understanding which rules were tried and failed.
Interface:
interface TraceNode { type: string; // Grammar rule name that was attempted pos: number; // Start position where rule was tried end?: number; // End position if rule succeeded children?: TraceNode[]; // Nested rule attempts success: boolean; // Whether the rule matched successfully }
The debug trace captures the complete parsing process, including failed attempts, making it invaluable for understanding complex parsing scenarios and optimizing grammar rules.
Grammars are compiled to efficient JavaScript functions that can parse input strings rapidly.
import {CodegenGrammar} from 'jit-parser'; const grammar = { start: 'Value', cst: { Value: {r: 'Number'}, Number: /\d+/ } }; // Compile to parser function const parser = CodegenGrammar.compile(grammar);
import {CodegenContext} from 'jit-parser'; const ctx = new CodegenContext( true, // positions: Include pos/end in AST true, // astExpressions: Process AST transformations false // debug: Generate debug trace code ); const parser = CodegenGrammar.compile(grammar, ctx);
You can print the grammar structure by converting it to a string:
import {GrammarPrinter} from 'jit-parser'; const grammarString = GrammarPrinter.print(grammar); console.log(grammarString);
Example output:
Value (reference)
└─ Number (terminal): /\d+/
const jsonGrammar = { start: 'Value', cst: { WOpt: {t: [' ', '\n', '\t', '\r'], repeat: '*', ast: null}, Value: [{r: 'WOpt'}, {r: 'TValue'}, {r: 'WOpt'}], TValue: { u: ['null', {r: 'Boolean'}, {r: 'Number'}, {r: 'String'}, {r: 'Object'}, {r: 'Array'}] }, Boolean: {t: ['true', 'false']}, Number: /\-?\d+(\.\d+)?([eE][\+\-]?\d+)?/, String: /"[^"\\]*(?:\\.[^"\\]*)*"/, Object: ['{', {r: 'Members'}, '}'], Members: { u: [ { p: [{r: 'Entry'}, {l: {p: [',', {r: 'Entry'}], ast: ['$', '/children/1']}}], ast: ['concat', ['push', [[]], ['$', '/children/0']], ['$', '/children/1']] }, {r: 'WOpt'} ] }, Entry: { p: [{r: 'String'}, ':', {r: 'Value'}], children: {0: 'key', 2: 'value'} }, Array: ['[', {r: 'Elements'}, ']'] // ... more rules }, ast: { Value: ['$', '/children/1'], // Extract middle child (TValue) Boolean: ['==', ['$', '/raw'], 'true'], // Convert to boolean Number: ['num', ['$', '/raw']] // Convert to number } }; const parser = CodegenGrammar.compile(jsonGrammar); console.log(GrammarPrinter.print(jsonGrammar));
Debug mode captures a trace of the parsing process, showing which grammar rules were attempted at each position.
import {CodegenContext, ParseContext} from 'jit-parser'; // Enable debug during compilation const debugCtx = new CodegenContext(true, true, true); // debug = true const parser = CodegenGrammar.compile(grammar, debugCtx); // Create trace collection const rootTrace = {pos: 0, children: []}; const parseCtx = new ParseContext('input text', false, [rootTrace]); // Parse with debug trace const cst = parser(parseCtx, 0); // Print debug trace import {printTraceNode} from 'jit-parser'; console.log(printTraceNode(rootTrace, '', 'input text'));
The debug trace shows:
Example trace output:
Root
└─ Value 0:22 → ' {"foo": ["bar", 123]}'
├─ WOpt 0:1 → " "
├─ TValue 1:22 → '{"foo": ["bar", 123]}'
│ ├─ Null
│ ├─ Boolean
│ ├─ String
│ └─ Object 1:22 → '{"foo": ["bar", 123]}'
│ ├─ Text 1:2 → "{"
│ ├─ Members 2:21 → '"foo": ["bar", 123]'
│ │ └─ Production 2:21 → '"foo": ["bar", 123]'
│ │ ├─ Entry 2:21 → '"foo": ["bar", 123]'
│ │ │ ├─ String 2:7 → '"foo"'
│ │ │ ├─ Text 7:8 → ":"
│ │ │ └─ Value 8:21 → ' ["bar", 123]'
│ │ │ └─ ...
│ │ └─ List 21:21 → ""
│ └─ Text 21:22 → "}"
└─ WOpt 22:22 → ""
1. Simple Expression Parser
const exprGrammar = { start: 'Expression', cst: { Expression: {r: 'Number'}, Number: { t: /\d+/, type: 'Number' } } }; const parser = CodegenGrammar.compile(exprGrammar); const ctx = new ParseContext('42', true); const cst = parser(ctx, 0); const ast = cst.ptr.toAst(cst, '42'); console.log(ast); // {type: 'Number', pos: 0, end: 2, raw: '42'}
import {grammar as jsonGrammar} from 'jit-parser/lib/grammars/json'; const parser = CodegenGrammar.compile(jsonGrammar); const json = '{"name": "John", "age": 30}'; const ctx = new ParseContext(json, true); const cst = parser(ctx, 0); const ast = cst.ptr.toAst(cst, json); console.log(ast);3. Custom AST Transformation
const grammar = { start: 'KeyValue', cst: { KeyValue: { p: [{r: 'Key'}, '=', {r: 'Value'}], children: {0: 'key', 2: 'value'}, type: 'Assignment' }, Key: /[a-zA-Z]+/, Value: /\d+/ }, ast: { KeyValue: { type: 'Assignment', key: ['$', '/children/0/raw'], value: ['num', ['$', '/children/2/raw']] } } };
const listGrammar = { start: 'List', cst: { List: ['[', {r: 'Items'}, ']'], Items: { u: [ { p: [{r: 'Item'}, {l: {p: [',', {r: 'Item'}], ast: ['$', '/children/1']}}], ast: ['concat', ['push', [[]], ['$', '/children/0']], ['$', '/children/1']] }, '' // Empty list ] }, Item: /\w+/ } };
static compile(grammar: Grammar, ctx?: CodegenContext): Parser
compileRule(ruleName: string): Pattern
constructor(str: string, ast: boolean, trace?: RootTraceNode[])
constructor(positions: boolean, astExpressions: boolean, debug: boolean)
static print(grammar: Grammar, tab?: string): string
printCst(cst: CstNode, tab: string, src: string): string
Print a formatted CST tree
printTraceNode(trace: RootTraceNode | ParseTraceNode, tab: string, src: string): string
Print a formatted debug trace
See the Grammar Node Types section for complete interface definitions.
This parser generator provides a powerful and efficient way to build custom parsers with minimal code while maintaining high performance through JIT compilation.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4