We currently parse XMLs in a greedy and not very efficient way.
This means that, a ruleset such as:
<?xml version="1.0"?> <ruleset name="Coupling" xmlns="http://pmd.sourceforge.net/ruleset/2.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/2.0.0 http://pmd.sourceforge.net/ruleset_2_0_0.xsd"> <description> Rules which find instances of high or inappropriate coupling between objects and packages. </description> <!-- Rules, that have been moved into a category --> <rule ref="category/java/bestpractices.xml/LooseCoupling" deprecated="true" /> <rule ref="category/java/design.xml/CouplingBetweenObjects" deprecated="true" /> <rule ref="category/java/design.xml/ExcessiveImports" deprecated="true" /> <rule ref="category/java/design.xml/LawOfDemeter" deprecated="true" /> <rule ref="category/java/design.xml/LoosePackageCoupling" deprecated="true" /> </ruleset>Will parse completely once looking for category/java/bestpractices.xml
, and then will proceed to completely parse category/java/design.xml
4 times, to get the other rules.
Having multiple such references is common, and will be far more with the ruleset reorganization, since categories are fewer, larger, and intended to be broken up and referenced rule by rule from rulesets.
This also means, it's possible to produce cycles in the ruleset references and produce infinite loops (which end in Stackoverflows / OutOfMemory errors).
I believe we should rework the rule parsing in order to:
One possible approach to this would be:
RuleSetReferenceId
containing all rulesets indicated by user input, with the "include all" flag set (more on sorting later on)RuleSetReferenceId
from closed, put it into open, and get a parser for it
RuleSetReferenceId
and if present in open, an exception is thrown (cycle!) otherwise check if closed has another reference to this xml, and merge it or add it directly (more on merging below)RuleSetReferenceId
. Rule definitions are loaded, rule / ruleset references, are simply translated into RuleSetReferenceId
and if present in open, an exception is thrown (cycle!) otherwise check if closed has another reference to this xml, and merge it or add it directly (more on merging below)
RuleSetReferenceId
are loaded, go back to step 2.ii, otherwise just stop parsing.Notice that the 2 scenarios (include all, described in 2.i and not, described in 2.ii) can be unified into a single one if the haveAllRequiredRulesBeenLoaded()
check for RuleSetReferenceId
returns always false in the include all
scenario.
On sorting:
On merging:
RuleSetReferenceId
should be reworked to be able to reference, 1, many or all rules within a ruleset; and if not including all, keep track of which ones have been loaded. This way we can merge 2 references to different rules inside the same xml to be parsed together.The one scenario that wouldn't be supported, would be a rule within a ruleset that references another rule within the same ruleset. This would be detected as a cycle, which is technically correct since it's a loop in the ruleset graph. We could support it if it was later in the file, but such distinction seems broken to me, I'd rather just throw for cycle.
This refactor would pose a good chance to split rule parsing from ruleset generation, as it has proved to be troublesome as show on https://github.com/pmd/pmd/pull/731/files#r152080633
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4