How to add a new language to PMD using ANTLR grammar.
Table of Contents Before you startâ¦This is really a big contribution and canât be done with a drive by contribution. It requires dedicated passion and long commitment to implement support for a new language.
This step-by-step guide is just a small intro to get the basics started, and itâs also not necessarily up-to-date or complete. You have to be able to fill in the blanks.
Currently, the Antlr integration has some basic limitations compared to JavaCC: The output of the Antlr parser generator is not an abstract syntax tree (AST) but a parse tree (also known as CST, concrete syntax tree). As such, a parse tree is much more fine-grained than what a typical JavaCC grammar will produce. This means that the parse tree is much deeper and contains nodes down to the different token types.
The Antlr nodes are context objects and serve a different abstraction than nodes in an AST. These context objects themselves donât have any attributes because they themselves represent the attributes (as nodes or leaves in the parse tree). As they donât have attributes, there are no attributes that can be used in XPath based rules.
The current implementation of the languages using ANTLR use these context objects as nodes in PMDâs AST representation.
In order to overcome these limitations, one would need to implement a post-processing step that transforms a parse tree into an abstract syntax tree and introducing real nodes on a higher abstraction level. These real nodes can then have attributes which are available in XPath based rules. The transformation can happen with a visitor, but the implementation of the AST is a manual step. This step is not described in this guide.
After the basic support for a language is there, there are lots of missing features left. Typical features that can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.
Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class type of each used type, following along method calls (including overloaded and overwritten methods), allowing to query subtypes and type hierarchy. This requires additional configuration of an auxiliary classpath. Call and data flow analysis keep track of the data as it is moving through different execution paths a program has.
These features are out of scope of this guide. Type resolution and data flow are features that definitely donât come for free. It is much effort and requires perseverance to implement.
1. Start with a new sub-module<module>
entry, so that it is built alongside the other languages.src/main/antlr4
in the appropriate sub package ast
of the language. E.g. for swift, the grammar file is Swift.g4 and is placed in the package net.sourceforge.pmd.lang.swift.ast
.AntlrNode
.SwiftNode
as an example.BaseAntlrInnerNode
. And example is SwiftInnerNode
. Note that this language specific inner node is package-private, as it is only the base class for the concrete nodes generated by ANLTR.SwiftRootNode
. Note that this language specific root node is package-private, as it is only the base class for the concrete node generated by ANLTR.SwiftTerminalNode
.SwiftErrorNode
.@parser::members
DICO
which creates a new instance of your language name dictionary using the vocabulary from the generated parser (VOCABULARY
).AntlrGeneratedParserBase
and need to be implemented here for the concrete language: createPmdTerminal()
and createPmdError()
.antlr4-wrapper.xml
and does not need to be adjusted - it has plenty of parameters that can be configured. The ant script is added in the language moduleâs pom.xml
where the parameters are set (e.g. name of root name class). Have a look at Swiftâs example: pmd-swift/pom.xml
.SwiftInnerNode
) that are available on all nodes. But on most cases you wonât need to do anything.<antlr4.visitor>true</antlr4.visitor>
in your pom.xml
file.generate-sources
. So you can just call e.g. ./mvnw generate-sources -pl pmd-swift
to have the parser generated.target/generated-sources/antlr4
and will not be committed to source control.pmd-swift/pom.xml
.AntlrTokenManager
.SwiftTokenizer
.If you wish to filter specific tokens (e.g. comments to support CPD suppression via âCPD-OFFâ and âCPD-ONâ) you can create your own implementation of AntlrTokenFilter
. Youâll need to override then the protected method getTokenFilter(AntlrTokenManager)
and return your custom filter. See the tokenizer for C# as an exmaple: CsTokenizer
.
If you donât need a custom token filter, you donât need to override the method. It returns the default AntlrTokenFilter
which doesnât filter anything.
AntlrBaseParser
implementation that you need to extend to create your own adapter as we do with PmdSwiftParser
.SwiftHandler
.AstVisitor
.SwiftVisitor
.SwiftVisitorBase
as an example.net.sourceforge.pmd.lang.impl.SimpleLanguageModuleBase
, see Swift as an example: SwiftLanguageModule
.addVersion
in your language moduleâs constructor. Use addDefaultVersion
for defining the default version.src/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language
. Add your fully qualified class name as a single line into it.AbstractSwiftRule
as an example.AbstractVisitorRule
and only redefines the abstract buildVisitor()
method to return our own type of visitor. In this case our SwiftVisitor
is used. While there is no real functionality added, every language should have its own base class for rules. This helps to organize the code.buildVisitor()
for analyzing the AST. The provided visitor only implements the visit methods for specific AST nodes. The other node types use the default behavior, and you donât need to care about them.Rule
and should be placed in a package the corresponds to their category.externalInfoUrl
attribute of a rule. E.g. we use ${pmd.website.baseurl}
to point to the correct webpage (depending on the PMD version). In order for this to work, you need to add a resource filtering configuration in the language moduleâs pom.xml
. Under <build>
add the following lines:
<resources>
<resource>
<directory>${project.basedir}/src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>
PmdRuleTst
(see UnavailableFunctionTest
for example)pmd-swift/src/main/resources/bestpractices.xml
for example)UnavailableFunction.xml
for example)To verify the validity of all the created rulesets, create a subclass of AbstractRuleSetFactoryTest
(see RuleSetFactoryTest
in pmd-swift for example). This will load all rulesets and verify, that all required attributes are provided.
Note: Youâll need to add your ruleset to categories.properties
, so that it can be found.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4