You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When org.spdx.utility.compare.LicenseCompareHelper.matchingStandardLicenseIdsWithinText() is run on the official Apache-1.1 license text, it fails to find any matches, and I believe I've narrowed down the problem to the Clause5 alternative text tag in the template; if I remove the example header from the license text, and run org.spdx.utility.compare.LicenseCompareHelper.isTextStandardLicense().getDifferenceMessage() on it, I get:
Variable text rule combined-bullet-Clause5 did not match the compare text starting at line #31 column #1 "5" while processing rule var: combined-bullet-Clause5
When I manually converted that <alt> tag into a Java regex, and bullet 5 from the Apache 1.1 license text is manually cleansed of comment characters and newlines, I do get a match, so I'm pretty confident the problem is in the library rather than the template. Beyond that I'm not really sure what the root cause might be - whether it has to do with comment character handling, regexification of that particular <alt> tag, or something else entirely.
This was reproduced with Spdx-Java-Library v1.11 and SPDX license list v3.23.
It it's helpful, I'm also seeing similar failures with the official Apache-1.0 license text too, though I haven't troubleshooted that to the same level of detail is I did with Apache-1.1.
This issue is due to line breaks in the original text not being taken into account in the regex in the license XML variable text tag.
Here is the regex:
(.{0,20})\s*((Products derived from this software may not be called\s.+nor may\s.+appear in their name, without prior written permission of\s.+.|Products may not include\s.+in their name, without prior written permission of\s.+.))
And the text it is attempting to match against (with the leading asterisks removed):
5. Products derived from this software may not be called "Apache",
nor may "Apache" appear in their name, without prior written
permission of the Apache Software Foundation.
If the line breaks are removed:
5. Products derived from this software may not be called "Apache", nor may "Apache" appear in their name, without prior written permission of the Apache Software Foundation.
the license text matches.
This can be fixed in the template by replacing the spaces with [\r\n\s]+. e.g.:
I think we have to be careful where we draw the line on overwriting the regexes to implement the matching guidelines - it could get very complicated if we include all the matching guidelines. Overwriting the whitespace may make practical senses, however.
Yeah I asked about the potential need for precedence rules (both within the templates and across the matching guidelines) in spdx/license-list-XML#1617, but nobody responded at that time. FWIW from my initial attempts at implementing matching (before I switched to Spdx-Java-Library) I was already starting to run into this kind of problem - how to interpret a given template and its regex fragments, while also ensuring that the general matching guidelines were being correctly implemented.
I just retested this with version 2.0.0 of Spdx-Java-Library, and it seems to still be happening. Interestingly, org.spdx.utility.compare.LicenseCompareHelper.isTextStandardLicense() is working correctly, but org.spdx.utility.compare.LicenseCompareHelper.matchingStandardLicenseIdsWithinText() is not. This is occurring with both the Apache-1.0 and Apache-1.1 official license texts.
Official Apache-1.1 license text is not being matched correctly by LicenseCompareHelper.matchingStandardLicenseIdsWithinText() · Issue #230 · spdx/Spdx-Java-Library
Activity
pmonks commentedon Mar 14, 2024
It it's helpful, I'm also seeing similar failures with the official Apache-1.0 license text too, though I haven't troubleshooted that to the same level of detail is I did with Apache-1.1.
goneall commentedon Feb 26, 2025
This issue is due to line breaks in the original text not being taken into account in the regex in the license XML variable text tag.
Here is the regex:
And the text it is attempting to match against (with the leading asterisks removed):
If the line breaks are removed:
the license text matches.
This can be fixed in the template by replacing the spaces with
[\r\n\s]+
. e.g.:pmonks commentedon Feb 27, 2025
\s
should match\r
and\n
, at least according to the JavaDocs, which implies the character class[\s\r\n]
is redundant.[edit] unless Unix newlines mode is enabled, it seems like.
goneall commentedon Feb 28, 2025
Good point - I just tested using
\s+
and it worked.It looks like this can be fixed with a change to the license list XML source. I'll add a PR.
Match official Apache-1.1 text
pmonks commentedon Feb 28, 2025
Or might this be covered by the Whitespace section of the matching guidelines? It reads:
goneall commentedon Feb 28, 2025
See the discussion in spdx/license-list-XML#2669
I think we have to be careful where we draw the line on overwriting the regexes to implement the matching guidelines - it could get very complicated if we include all the matching guidelines. Overwriting the whitespace may make practical senses, however.
pmonks commentedon Feb 28, 2025
Yeah I asked about the potential need for precedence rules (both within the templates and across the matching guidelines) in spdx/license-list-XML#1617, but nobody responded at that time. FWIW from my initial attempts at implementing matching (before I switched to Spdx-Java-Library) I was already starting to run into this kind of problem - how to interpret a given template and its regex fragments, while also ensuring that the general matching guidelines were being correctly implemented.
Match official Apache-1.1 text
goneall commentedon Mar 6, 2025
Fixed with spdx/license-list-XML#2669
pmonks commentedon Apr 30, 2025
I just retested this with version 2.0.0 of Spdx-Java-Library, and it seems to still be happening. Interestingly,
org.spdx.utility.compare.LicenseCompareHelper.isTextStandardLicense()
is working correctly, butorg.spdx.utility.compare.LicenseCompareHelper.matchingStandardLicenseIdsWithinText()
is not. This is occurring with both theApache-1.0
andApache-1.1
official license texts.