Skip to content

Small spamsum similarity is zero #3

@paz-csis

Description

@paz-csis

Hi,

We've noticed when using spamsum (along with pyspamsum), that when trying to compute spamsums for spamsum strings that are small (or converted to small ones) and identical, the computed result is 0 instead of 100:

printf("%d\n", spamsum_match("96:q66666666666666666666666666666666666666666666666666666666666666I:N",
      "96:q66666666666666666666666666666666666666666666666666666666666666I:N'"));
0

Problem seems to lie in the has_common_substring, here: https://github.com/tridge/junkcode/blob/master/spamsum/spamsum.c#L260

Basically, trying to run has_common_substrings on identical strings that are less than 6 chars results to returning zero, even though it should return one.

I know that we probably don't want to return substrings that are less than 7 characters, but maybe we could hook a strstr(s1, s2); strstr(s2, s1) somewhere, to at least cover the case where the strings are identical, but small?
I can submit a PR if you wish :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions