How to add a new language to PMD using JavaCC grammar.
Table of Contents Before you startâ¦This is really a big contribution and canât be done with a drive by contribution. It requires dedicated passion and long commitment to implement support for a new language.
This step-by-step guide is just a small intro to get the basics started, and itâs also not necessarily up-to-date or complete. You have to be able to fill in the blanks.
After the basic support for a language is there, there are lots of missing features left. Typical features that can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.
Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class type of each used type, following along method calls (including overloaded and overwritten methods), allowing to query subtypes and type hierarchy. This requires additional configuration of an auxiliary classpath. Call and data flow analysis keep track of the data as it is moving through different execution paths a program has.
These features are out of scope of this guide. Type resolution and data flow are features that definitely donât come for free. It is much effort and requires perseverance to implement.
Steps 1. Start with a new sub-module<module>
entry, so that it is built alongside the other languages.int id
.javacc-wrapper.xml
file in the top-level pmd sources.maven-antrun-plugin
. Add this plugin to your pom.xml
file and configure it the language name. You can use pmd-java/pom.xml
as an example.generate-sources
whenever the whole project is built. But you can call ./mvnw generate-sources
directly for your module if you want your parser to be generated.JjtreeParserAdapter
.tokenBehavior
method should return a new instance of TokenDocumentBehavior
constructed with the list of tokes in your language. The compile step #4 will generate a class $langTokenKinds
which has all the available tokens in the field TOKEN_NAMES
.parseImpl
method should return the root node of the AST tree obtained by parsing the CharStream sourceVmParser
class as an exampleAbstractPmdLanguageVersionHandler
(see VmHandler for example)ViolationDecorator
s, to add additional language specific information to the created violations. The Java language module uses this to provide the method name or class name, where the violation occurred.VmHandler
class as an exampleAstVisitor
.VmVisitor
.VmVisitorBase
as an example.net.sourceforge.pmd.lang.impl.SimpleLanguageModuleBase
. (see VmLanguageModule or JavaLanguageModule as an example)addVersion
in your language moduleâs constructor. Use addDefaultVersion
for defining the default version.src/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language
. Add your fully qualified class name as a single line into it.For languages, that use an external library for parsing, the AST can easily change when upgrading the library. Also for languages, where we have the grammar under our control, it is useful to have such tests.
The tests parse one or more source files and generate a textual representation of the AST. This text is compared against a previously recorded version. If there are differences, the test fails.
This helps to detect anything in the AST structure that changed, maybe unexpectedly.
net.sourceforge.pmd.lang.$lang.ast
with the name $langTreeDumpTest
.net.sourceforge.pmd.lang.test.ast.BaseTreeDumpTest
. Note: This class is written in kotlin and is available in the module âlang-testâ.Add a default constructor, that calls the super constructor like so:
public $langTreeDumpTest() {
super(NodePrintersKt.getSimpleNodePrinter(), ".$extension");
}
Replace â$langâ and â$extensionâ accordingly.
getParser()
. It must return a subclass of net.sourceforge.pmd.lang.test.ast.BaseParsingHelper
. See net.sourceforge.pmd.lang.ecmascript.ast.JsParsingHelper
for an example. With this parser helper you can also specify, where the test files are searched, by using the method withResourceContext(Class<?>, String)
.Add one or more test methods. Each test method parses one file and compares the result. The base class has a helper method doTest(String)
that does all the work. This method just needs to be called:
@Test
public void myFirstAstTest() {
doTest("filename-without-extension");
}
.txt
) is created, that records the current AST. On the next run, the text file is used as comparison and the test should pass. Donât forget to commit the generated text file.A complete example can be seen in the JavaScript module: net.sourceforge.pmd.lang.ecmascript.ast.JsTreeDumpTest
. The test resources are in the subpackage âtestdataâ: pmd-javascript/src/test/resources/net/sourceforge/pmd/lang/ecmascript/ast/testdata/
.
The Scala module also has a test, written in Kotlin instead of Java: net.sourceforge.pmd.lang.scala.ast.ScalaParserTests
.
AbstractRule
and implement the parser visitor interface for your language (see AbstractVmRule for example)EmptyForeachStmtRule
for example)PmdRuleTst
(see AvoidReassigningParametersTest in pmd-vm for example)AvoidReassigningParameters.xml
for example)To verify the validity of the created ruleset, create a subclass of AbstractRuleSetFactoryTest
(see RuleSetFactoryTest
in pmd-vm for example). This will load all rulesets and verify, that all required attributes are provided.
Note: Youâll need to add your category ruleset to categories.properties
, so that it can be found.
Finishing up your new language module by adding a page in the documentation. Create a new markdown file <langId>.md
in docs/pages/pmd/languages/
. This file should have the following frontmatter:
---
title: <Language Name>
permalink: pmd_languages_<langId>.html
last_updated: <Month> <Year> (<PMD Version>)
tags: [languages, PmdCapableLanguage, CpdCapableLanguage]
---
On this page, language specifics can be documented, e.g. when the language was first supported by PMD. There is also the following Jekyll Include, that creates summary box for the language:
{% include language_info.html name='<Language Name>' id='<langId>' implementation='<langId>::lang.<langId>.<langId>LanguageModule' supports_cpd=true supports_pmd=true %}
XPath integration
PMD exposes the AST nodes for use by XPath based rules (see DOM representation of ASTs). Most Java getters in the AST classes are made available by default. These getters constitute the API of the language. If a getter method is renamed, then every XPath rule that uses this getter also needs to be adjusted. In order to have more control over this, there are two annotations that can be used for AST classes and their methods:
DeprecatedAttribute
: Getters might be annotated with that indicating, that this getter method should not be used in XPath rules. When a XPath rule uses such a method, a warning is issued. If the method additionally has the standard Java @Deprecated
annotation, then the getter is also deprecated for java usage. Otherwise, the getter is only deprecated for usage in XPath rules.
When a getter is deprecated and there is a different getter to be used instead, then the attribute replaceWith
should be used.
NoAttribute
: This annotation can be used on an AST node type or on individual methods in order to filter out which methods are available for XPath rules. When used on a type, either all methods can be filtered or only inherited methods (see attribute scope
). When used directly on an individual method, then only this method will be filtered out. That way methods can be added in AST nodes, that should only be used in Java rules, e.g. as auxiliary methods.
Note: Not all getters are available for XPath rules. It depends on the result type. Especially Lists or Collections in general are not supported.
Only the following Java result types are supported:
When implementing your grammar it may be very useful to see how PMD parses your example files. This can be achieved with Rule Designer:
getXPathNodeName
in your AST nodes for Designer to show node names.jjtOpen
and jjtClose
in your AST node base class so that they set both start and end line and column for proper node bound highlighting.net.sourceforge.pmd.util.fxdesigner.util.codearea.syntaxhighlighting
(you could use Java as an example).AvailableSyntaxHighlighters
enumeration.target/pmd-designer-<version>-SNAPSHOT.jar
to the lib
directory inside your pmd-bin-...
distribution (you have to delete old pmd-designer-*.jar
from there).If you want to add support for computing metrics:
lang.<langname>.metrics
<langname>Metrics
getLanguageMetricsProvider
, to make the metrics available in the designer.See JavaMetrics
for an example.
A symbol table keeps track of variables and their usages. It is part of semantic analysis and would be executed in your parser adapter as an additional pass after you got the initial AST.
There is no general language independent API in PMD core. For now, each language will need to implement its own solution. The symbol information that has been resolved in the additional parser pass can be made available on the AST nodes via extra methods, e.g. getSymbolTable()
, getSymbol()
, or getUsages()
.
Currently only Java provides an implementation for symbol table, see Java-specific features and guidance.
Note:With PMD 7.0.0 the symbol table and type resolution implementation has been rewritten from scratch. There is still an old API for symbol table support, that is used by PLSQL, see
net.sourceforge.pmd.lang.symboltable
. This will be deprecated and should not be used.
Type resolutionFor typed languages like Java type information can be useful for writing rules, that trigger only on specific types. Resolving types of expressions and variables would be done after in your parser adapter as yet another additional pass, potentially after resolving the symbol table.
Type resolution tries to find the actual class type of each used type, following along method calls (including overloaded and overwritten methods), allowing to query subtypes and type hierarchy. This might require additional configuration for the language, e.g. in Java you need to configure an auxiliary classpath.
There is no general language independent API in PMD core. For now, each language will need to implement its own solution. The type information can be made available on the AST nodes via extra methods, e.g. getType()
.
Currently only Java provides an implementation for type resolution, see Java-specific features and guidance.
Call and data flow analysisCall and data flow analysis keep track of the data as it is moving through different execution paths a program has. This would be yet another analysis pass.
There is no general language independent API in PMD core. For now, each language will need to implement its own solution.
Currently Java has some limited support for data flow analysis, see Java-specific features and guidance.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4