A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.pmd-code.org/pmd-doc-7.3.0/pmd_devdocs_major_adding_new_language_antlr.html below:

Adding PMD support for a new ANTLR grammar based language

How to add a new language to PMD using ANTLR grammar.

Table of Contents Before you start…

This is really a big contribution and can’t be done with a drive by contribution. It requires dedicated passion and long commitment to implement support for a new language.

This step-by-step guide is just a small intro to get the basics started, and it’s also not necessarily up-to-date or complete. You have to be able to fill in the blanks.

Currently, the Antlr integration has some basic limitations compared to JavaCC: The output of the Antlr parser generator is not an abstract syntax tree (AST) but a parse tree (also known as CST, concrete syntax tree). As such, a parse tree is much more fine-grained than what a typical JavaCC grammar will produce. This means that the parse tree is much deeper and contains nodes down to the different token types.

The Antlr nodes are context objects and serve a different abstraction than nodes in an AST. These context objects themselves don’t have any attributes because they themselves represent the attributes (as nodes or leaves in the parse tree). As they don’t have attributes, there are no attributes that can be used in XPath based rules.

The current implementation of the languages using ANTLR use these context objects as nodes in PMD’s AST representation.

In order to overcome these limitations, one would need to implement a post-processing step that transforms a parse tree into an abstract syntax tree and introducing real nodes on a higher abstraction level. These real nodes can then have attributes which are available in XPath based rules. The transformation can happen with a visitor, but the implementation of the AST is a manual step. This step is not described in this guide.

After the basic support for a language is there, there are lots of missing features left. Typical features that can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.

Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class type of each used type, following along method calls (including overloaded and overwritten methods), allowing to query subtypes and type hierarchy. This requires additional configuration of an auxiliary classpath. Call and data flow analysis keep track of the data as it is moving through different execution paths a program has.

These features are out of scope of this guide. Type resolution and data flow are features that definitely don’t come for free. It is much effort and requires perseverance to implement.

Steps 1. Start with a new sub-module 2. Implement an AST parser for your language 3. Create AST node classes 4. Generate your parser (using ANTLR) 5. Create a TokenManager 6. Create a PMD parser “adapter” 7. Create a language version handler 8. Create a base visitor 9. Make PMD recognize your language 10. Create an abstract rule class for the language 11. Create rules 12. Test the rules 13. Create documentation page

Finishing up your new language module by adding a page in the documentation. Create a new markdown file <langId>.md in docs/pages/pmd/languages/. This file should have the following frontmatter:

---
title: <Language Name>
permalink: pmd_languages_<langId>.html
last_updated: <Month> <Year> (<PMD Version>)
tags: [languages, PmdCapableLanguage, CpdCapableLanguage]
---

On this page, language specifics can be documented, e.g. when the language was first supported by PMD. There is also the following Jekyll Include, that creates summary box for the language:


{% include language_info.html name='<Language Name>' id='<langId>' implementation='<langId>::lang.<langId>.<langId>LanguageModule' supports_cpd=true supports_pmd=true %}

Optional features

See Optional features in JavaCC based languages.

In order to implement these, most likely an AST needs to be developed first. The parse tree (CST, concrete syntax tree) is not suitable to add methods such as getSymbol() to the node classes.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4