-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Labels
bugSomething isn't workingSomething isn't workingmatchingLicense matching and recognitionLicense matching and recognition
Description
For the license text https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt I get the following:
System.out.println(Arrays.toString(LicenseCompareHelper.matchingStandardLicenseIds(licenseText)));
System.out.println(LicenseCompareHelper.matchingStandardLicenseIdsWithinText(licenseText));
outputs
[]
[GPL-2.0, GPL-2.0-or-later, GPL-2.0-only]
The two outputs should be the same since the GPL-2.0 license spans the whole file.
Tested with version 1.1.11
This problem is similar to 217
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmatchingLicense matching and recognitionLicense matching and recognition
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
pmonks commentedon Jul 22, 2024
PR #236 includes unit tests that reproduce this problem, albeit with other license texts - the issue is not limited to just
GPL-2.0
, or indeed even just GPL family licenses.See also #234.
sdheh commentedon Jul 23, 2024
I figured out a problem that could explain this case. I think the tokenization does not work properly.
Example:
Returns
When I debug I see that for the first case in
org.spdx.utility.compare.CompareTemplateOutputHandler.compareText
thematchTokens
parameter is ["<one"]. I think it should instead be ["<", "one"] like in the second case.Also if I remove all
<
and>
from the https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt text (gpl-2.0-removed-angle-brackets.txt
) or if I add a space before and after every
<
and>
(gpl-2.0-spaces-between-angle-brackets-and-text.txt
) I get the following result for the code in the issue description:
goneall commentedon Jul 23, 2024
Thanks @sdheh for the analysis! I agree, the tokenization is the issue. I'm still working on the 3.0 update, so I won't have much time over the next week or so to look for a fix, but if you want to create a pull request I can review / merge.
douglasclarke commentedon Oct 2, 2024
I ran into this issue as well and believe it affects several templates where the optional text would not be tokenized separately. For the above case I did experiment with the following change but am still learning the code base and am unclear on the total impact:
Modifying the parsed tokens feels wrong but could not sort out an easy way to adjust the tokenization of the license text being matched to the template.
pmonks commentedon Oct 2, 2024
@douglasclarke this was fixed in PR #249, which is merged but awaiting release.
douglasclarke commentedon Oct 2, 2024
@pmonks thanks. I believe I am testing with the latest code on master and still failing to get this case to work:
goneall commentedon Oct 2, 2024
Thanks @douglasclarke for reporting this along with the specific example.
This may be a separate issue - let's open a new issue and close this one as resolved since the prior examples seem to be fixed with the latest.
pmonks commentedon Oct 7, 2024
Just a head's up that this license list XML issue will also cause matching of (recent) GPL-2.0 texts to fail, but not because of anything wrong in
Spdx-Java-Library
. I only mention it as sometimes it can be time-consuming to discern whether a matching bug is in the code, the templates, or (as in this case) a breaking change to the official license text by the license publisher.goneall commentedon Apr 16, 2025
This will be resolved once spdx/license-list-XML#2632 is merged. Merging the PR depends on generating a new version of the license list publisher - which I'm hoping to get out in the next week or so.