Skip to content

chore: pmd.cpd.min set to 100000 effectively disables CPD #1339

@joaodinissf

Description

@joaodinissf

Observation

ddk-parent/pom.xml sets:

<pmd.cpd.min>100000</pmd.cpd.min>

This is the minimumTokens threshold for PMD's Copy/Paste Detector (CPD). At this value, CPD is effectively disabled — pmd:cpd-check will pass on virtually any duplication.

Empirical evidence

Modified com.avaloq.tools.ddk.test.core/src/.../PostconditionViolation.java to add two identical 8-line methods (~65 tokens each by CPD's count):

public int compute1(int x) {
  int a = x + 1;
  int b = a * 2;
  int c = b - 3;
  int d = c + 4;
  int e = d * 5;
  int f = e - 6;
  return f + a + b + c + d + e;
}

public int compute2(int x) {
  // identical body to compute1
}

Running pmd:cpd-check at the project's configured threshold:

Run Threshold Reactor duplications detected
Current pmd.cpd.min=100000 100 000 tokens 0 (synthetic 65-token duplication missed)
-Dpmd.cpd.min=20 (override) 20 tokens 21 including the synthetic one and real ones in the codebase

The largest pre-existing duplication detected at the lower threshold is 90 lines / 267 tokens — meaningful duplication that the project currently doesn't see.

Suggested fix

PMD's CPD default is 100 tokens. Common production values for Java sit between 50 and 100. A reasonable starting point:

<pmd.cpd.min>100</pmd.cpd.min>

Then run pmd:cpd-check against the reactor, triage the surfaced duplications (some may be legitimate — e.g., generated code, parser tables — and warrant suppression rather than fixing), and tune from there.

Why this matters now

  • Contributors and reviewers may believe CPD enforcement is active when it's actually a no-op.
  • CI redesign work (e.g. SARIF + GitHub Code Scanning experimentation) treats pmd:cpd-check as a gate, but it never fires today.
  • 21 real duplications in the reactor are silently accumulating.

Filed by Claude at João's request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions