Skip to content

Official GPL-2.0 license text not recognized #245

@sdheh

Description

@sdheh

For the license text https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt I get the following:

System.out.println(Arrays.toString(LicenseCompareHelper.matchingStandardLicenseIds(licenseText)));
System.out.println(LicenseCompareHelper.matchingStandardLicenseIdsWithinText(licenseText));

outputs

[]
[GPL-2.0, GPL-2.0-or-later, GPL-2.0-only]

The two outputs should be the same since the GPL-2.0 license spans the whole file.
Tested with version 1.1.11
This problem is similar to 217

Activity

pmonks

pmonks commented on Jul 22, 2024

@pmonks
Collaborator

PR #236 includes unit tests that reproduce this problem, albeit with other license texts - the issue is not limited to just GPL-2.0, or indeed even just GPL family licenses.

See also #234.

sdheh

sdheh commented on Jul 23, 2024

@sdheh
Author

I figured out a problem that could explain this case. I think the tokenization does not work properly.
Example:

String license1 = "<one";
String template1 = "<<beginOptional>><<<endOptional>>one";
String license2 = "< one";
System.out.println("template1, license1: " + LicenseCompareHelper.isTextMatchingTemplate(template1, license1).getDifferenceMessage());
System.out.println("template1, license2: " + LicenseCompareHelper.isTextMatchingTemplate(template1, license2).getDifferenceMessage());

Returns

template1, license1: Normal text of license does not match at end of text when comparing to template text "one
".  Last optional text was not found due to the optional difference: 
	Normal text of license does not match at end of text when comparing to template text "<"
template1, license2: No difference found

When I debug I see that for the first case in org.spdx.utility.compare.CompareTemplateOutputHandler.compareText the matchTokens parameter is ["<one"]. I think it should instead be ["<", "one"] like in the second case.

Also if I remove all < and > from the https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt text (
gpl-2.0-removed-angle-brackets.txt
) or if I add a space before and after every < and > (
gpl-2.0-spaces-between-angle-brackets-and-text.txt
) I get the following result for the code in the issue description:

[GPL-2.0, GPL-2.0-only]
[GPL-2.0, GPL-2.0-or-later, GPL-2.0-only]
goneall

goneall commented on Jul 23, 2024

@goneall
Member

Thanks @sdheh for the analysis! I agree, the tokenization is the issue. I'm still working on the 3.0 update, so I won't have much time over the next week or so to look for a fix, but if you want to create a pull request I can review / merge.

douglasclarke

douglasclarke commented on Oct 2, 2024

@douglasclarke
Contributor

I ran into this issue as well and believe it affects several templates where the optional text would not be tokenized separately. For the above case I did experiment with the following change but am still learning the code base and am unclear on the total impact:

+++ b/src/main/java/org/spdx/utility/compare/CompareTemplateOutputHandler.java
@@ -247,7 +247,13 @@ public class CompareTemplateOutputHandler implements
                                                nextToken = sub.match(matchTokens, nextToken, endToken, originalText, 
                                                                optionalDifference, tokenToLocation);
                                                if (nextToken < 0) {
-                                                       if (!ignoreOptionalDifferences) {
+                                                       // Handle optional text tokenized with license text to handle single char optional text
+                                                       String optionalText = sub.getText();
+                                                       String matchText = LicenseTextHelper.getTokenAt(matchTokens, startToken);
+                                                       if (matchText != null && optionalText != null && matchText.length() > optionalText.length() && matchText.startsWith(optionalText)) {
+                                                               // remove optional text from start of match token
+                                                               matchTokens[startToken] = matchText.substring(optionalText.length());
+                                                       } else if (!ignoreOptionalDifferences) {
                                                                setLastOptionalDifference(optionalDifference);
                                                        }                                       
                                                        return startToken;      // the optional text didn't match, just return the start token

Modifying the parsed tokens feels wrong but could not sort out an easy way to adjust the tokenization of the license text being matched to the template.

pmonks

pmonks commented on Oct 2, 2024

@pmonks
Collaborator

@douglasclarke this was fixed in PR #249, which is merged but awaiting release.

douglasclarke

douglasclarke commented on Oct 2, 2024

@douglasclarke
Contributor

@pmonks thanks. I believe I am testing with the latest code on master and still failing to get this case to work:

		String template = "<<beginOptional>><<<endOptional>>one";
		String sample = "<one";
		
		DifferenceDescription diff = LicenseCompareHelper.isTextMatchingTemplate(template, sample);
goneall

goneall commented on Oct 2, 2024

@goneall
Member

@pmonks thanks. I believe I am testing with the latest code on master and still failing to get this case to work:

		String template = "<<beginOptional>><<<endOptional>>one";
		String sample = "<one";
		
		DifferenceDescription diff = LicenseCompareHelper.isTextMatchingTemplate(template, sample);

Thanks @douglasclarke for reporting this along with the specific example.

This may be a separate issue - let's open a new issue and close this one as resolved since the prior examples seem to be fixed with the latest.

pmonks

pmonks commented on Oct 7, 2024

@pmonks
Collaborator

Just a head's up that this license list XML issue will also cause matching of (recent) GPL-2.0 texts to fail, but not because of anything wrong in Spdx-Java-Library. I only mention it as sometimes it can be time-consuming to discern whether a matching bug is in the code, the templates, or (as in this case) a breaking change to the official license text by the license publisher.

goneall

goneall commented on Apr 16, 2025

@goneall
Member

This will be resolved once spdx/license-list-XML#2632 is merged. Merging the PR depends on generating a new version of the license list publisher - which I'm hoping to get out in the next week or so.

added
bugSomething isn't working
matchingLicense matching and recognition
on Apr 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmatchingLicense matching and recognition

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @pmonks@bact@goneall@douglasclarke@sdheh

        Issue actions

          Official GPL-2.0 license text not recognized · Issue #245 · spdx/Spdx-Java-Library