A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/SonarSource/sonar-java/wiki/Regex-Checks below:

Regex Checks · SonarSource/sonar-java Wiki · GitHub

Regular Expression Checks

Automata-based analyses for regular expressions

In order to analyze control flow within regular expressions, they should be represented as an "AST extended with continuation pointers" model as is used by some reDoS analyzers and similar to the representation used in Java's regex engine. This model provides the features of a prioritized (order of alternatives matters) NFA while still providing access to the AST structure. Since this kind of automaton is just an AST with extra pointers, we should extend the current AST classes to support it and then generate it directly in the parser rather than having separate regex -> AST -> automaton phases.

For example:

To represent this, each AST class should have a member continuation that points to whichever state should be matched after this one is matched successfully. To be able to easily and intuitively use this as an NFA, we should also add a method successors which returns the states that can directly follow a given state. With these methods, we get the structure of an NFA, while requiring no changes to existing rules that can simply continue using the existing structure without changes.

For easy construction in the parser, the continuation member should be null when constructed and then set using a setter method. However, after parsing the states should be considered immutable and the setter should not be called anymore.

To represent some aspects of Java regexen, we need synthetic states that don't correspond directly to parts of the AST. Therefore we should introduce an interface AutomatonState such that RegexTree implements AutomatonState, all AutomatonState s have a continuation and a list of successors (which are also AutomatonState s) and RegexTree s have children (which are RegexTree s). The children member can also be used to simplify the implementation BaseRegexTreeVisitor (which should still only visit RegexTree s - the synthetic states should not be visible when iterating a regex's syntactic structure only when iterating the states of the automaton via successors and/or continuation). The following synthetic AutomatonState s should be created:

Each AutomatonState should have an transitionType to represent the type of its incoming transitions (all incoming transitions of a state will be of the same type (and with the same label) in this representation). The following transition types should exist:

The continuations and successors of states should be set as follows by the parser:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4