RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/pmd/pmd/issues/4396 below:

[core] CPD is always case sensitive · Issue #4396 · pmd/pmd · GitHub

Affects PMD Version: 6.x

Description:

Some languages like PL/SQL or the new T-SQL (#4390) are case-insensitive. When tokenizing, this is working correctly, e.g. the lexers are agnostic to casing. JavaCC has a grammar option and ANTLR since 4.10 as well.

However, when we convert the original tokens into CPD TokenEntries, we don't seem to use the token kind and use the original token text, which contains the original casing. It's therefore very easy to work around duplicated for these languages by just changing the casing:

echo 'select a, b, c, d, e, f from table where x = 1 and y = 2;' > file1.plsql
cp file1.plsql file2.plsql
echo 'sEleCt a, b, c, d, e, f frOm table where x = 1 and y = 2;' > file3.plsql

run.sh cpd --minimum-tokens 20 --language plsql --dir file1.plsql file2.plsql

results correctly in:

Found a 1 line (23 tokens) duplication in the following files: 
Starting at line 1 of /home/andreas/temp/plsql/file1.plsql
Starting at line 1 of /home/andreas/temp/plsql/file2.plsql

select a, b, c, d, e, f from table where x = 1 and y = 2;

since file1.plsql and file2.plsql are identical.

However, comparing file1.plsql and file3.plsql which differ only in casing, shows no duplications:

run.sh cpd --minimum-tokens 20 --language plsql --dir file1.plsql file3.plsql

I think, this problem affects both JavaCC and ANTLR based languages.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4