Affects PMD Version: 6.4.0, although similar behavior seen in 5.0.2 also, other versions not tested
Description: I use PMD CPD to check for C++ code duplication on a code base of >500K lines inside Docker container on a Jenkins CI test. The output of PMD CPD is not consistent with the same code base as input.
I first ran PMD CPD through command line as follows:
run.sh cpd --language cpp --minimum-tokens 100 --format xml --files aaa/src \
--files bbb/src --files ccc/src --exclude ccc/src/ddd --skip-duplicate-files
The Jenkins machine would have around +-5 duplicates reported between different runs.
Because Jenkins copies the code to be tested every time, I figured that the code is placed every time in different order on the disk which affects the result.
So I tried to generate a sorted file list with all C++ files and use --filelist argument instead of --files. This results to much more consistent result, with still a +-1 variance of duplicates between different runs of the test on the same exact code base.
If I compare some of the .xml outputs, I find out another thing. Let's say two places in the code have same exact block of code "if (a==b) { duplicated code }" with the else part having differing code.
One run on that code could report that the duplicated code is "if (a==b) { duplicated code } else" and the other run could report only "if (a==b) { duplicated code }" being duplicate. Even though obviously there still exists the token else which is also duplicate. It shows inconsistency and that the software is not deterministic.
Running PMD through: CLI
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4