Skip to content

Reduced some allocations in QRCodeGenerator (NETCORE_APP only) #595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gfoidl
Copy link
Contributor

@gfoidl gfoidl commented Jun 16, 2025

Profiling showed that there are some allocations in QRCodeGenerator that can be quite easily avoided.

simple console app for profiling
using System.Diagnostics;
using QRCoder;

string payload = $"""
    Time    : {DateTimeOffset.Now:O}
    Machine : {Environment.MachineName}
    Activity: {Activity.Current?.RootId}
    """;

int len = 0;

for (int i = 0; i < 1_000; ++i)
{
    string imgSrc = GetQRCode("sdfsdf");
    len += imgSrc.Length;
}

Console.WriteLine(len);

static string GetQRCode(string payload)
{
    using QRCodeGenerator qrCodeGenerator = new();
    using QRCodeData qrCodeData = qrCodeGenerator.CreateQrCode(payload, QRCodeGenerator.ECCLevel.Q);
    using Base64QRCode qrCodeBase64 = new(qrCodeData);
    string base64QRCode = qrCodeBase64.GetGraphic(20);
    return $"data:image/png;base64,{base64QRCode}";
}

Allocations are removed / avoided for:

  • Dictionary<,>.Entry[]
  • QRCodeGenerator.PolynomItem[]
  • Dictionary<,>

The change is done only for .NET (Core) targets, as Span<T> is used.
By adding a reference to System.Memory package this change could also be done for .NET Desktop.

Profile

Before

before

After

after

Benchmarks

Before

| Method              | Mean        | Error     | StdDev    | Gen0   | Allocated |
|-------------------- |------------:|----------:|----------:|-------:|----------:|
| CreateQRCode        |    235.2 μs |   4.50 μs |   5.00 μs | 2.1973 |    7.2 KB |
| CreateQRCodeLong    |  2,684.9 μs |  48.00 μs |  44.89 μs | 7.8125 |  33.25 KB |
| CreateQRCodeLongest | 16,264.4 μs | 317.90 μs | 485.47 μs |      - |  79.69 KB |

After

| Method              | Mean        | Error     | StdDev    | Gen0   | Allocated |
|-------------------- |------------:|----------:|----------:|-------:|----------:|
| CreateQRCode        |    232.0 μs |   4.63 μs |   6.64 μs | 0.9766 |   4.38 KB |
| CreateQRCodeLong    |  2,574.1 μs |  41.97 μs |  39.26 μs | 3.9063 |     12 KB |
| CreateQRCodeLongest | 15,573.7 μs | 292.90 μs | 259.65 μs |      - |  48.24 KB |

@@ -1017,8 +1018,15 @@ private static Polynom MultiplyAlphaPolynoms(Polynom polynomBase, Polynom polyno
}

// Identify and merge terms with the same exponent.
#if NETCOREAPP
var toGlue = GetNotUniqueExponents(resultPolynom, resultPolynom.Count <= 128 ? stackalloc int[128].Slice(0, resultPolynom.Count) : new int[resultPolynom.Count]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't know too little about the QR-specs, but the resultPolynom.Count <= 128 could be avoided if by spec the count can't be as high. Or we can change the threshold to a higher value.

If the count can't be as high, then the fallback to the array allocation could also be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the tests, it doesn't go over 64. Seems fine to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests showed the same (what a wonder 😉).
But in that regard I'm a bit paranoid...in the sense wenn the fallback is removed what will happen when a bigger buffer is needed (maybe now by some special inputs or anytime in the future when new additions may be done)? This would result in an exception. W/ the fallback there's a safety net.

Except it can be proven via the QR-spec that the count never will exceed a certain threshold (but does that hold in the future too?)

If the non-stackalloc path would be very frequent, then renting from the array-pool would be an option, but here it's a assumed to be rare / never taken path.

Would you still remove the fallback or let's just leave it as is?

Copy link
Contributor

@Shane32 Shane32 Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I’d definitely leave the fallback. There’s just no reason to remove it. The JIT will realize it’s not a common pathway and optimize it accordingly. But as you say, we neither of us know enough about how these polynomials work to know if it will always be < 128.

if (toGlue.Contains(resultPolynom[i].Exponent))
#else
if (Array.IndexOf(toGlue, resultPolynom[i].Exponent) >= 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before Linq was used w/ it's generic Contains method.

  • for AOT Linq requires way more code
  • it's an interface dispatch that isn't needed

Note: for .NET (Core) the span-based Contains is used.

@@ -1046,20 +1058,55 @@ private static Polynom MultiplyAlphaPolynoms(Polynom polynomBase, Polynom polyno
return resultPolynom;

// Auxiliary function to identify exponents that appear more than once in the polynomial.
int[] GetNotUniqueExponents(Polynom list)
#if NETCOREAPP
static ReadOnlySpan<int> GetNotUniqueExponents(Polynom list, Span<int> buffer)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation doesn't need a Dictionary<,> to determine the non unique exponents.

It works as follows:

  1. a scratch buffer of the same size as the list is passed in
  2. exponents are written / copied to that scratch buffer
  3. scratch buffer is sorted, thus the exponents are in order
  4. for each item in the scratch buffer (= ordered exponents) it's compared w/ the previous one
  • if equal, then increment a counter
  • else check if the counter is $&gt;0$ and if so write the exponent to the result

For writing the result the same scratch buffer is used, as by definition the index to write the result is <= the iteration index, so no overlap, etc. can occur.

That way we avoid the need for a second scratch buffer.


Should someting like this be added as comment or is it cleare enough how it works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function isn't long; so long as the method comment is descriptive enough, I think it's fine.

{
var dic = new Dictionary<int, bool>(list.Count);
Debug.Assert(list.Count == buffer.Length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to leave the Debug.Assert code in here? Just wondering; it will get removed from release builds anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have some Debug.Asserts

  • they check some invariants in debug builds and while running tests
  • are a bit of self-documenting the intention (so why write a comment, when the assert can also be done?)

Especially here the buffer is passed in as argument and must have the correct size.
If the size doesn't match, then the tests (under debug) will fail and one knows why. Otherwise it may be hard to track down the bug.

So I'd leave them in the code.

As you said: for !DEBUG these asserts won't have any effect.

@Shane32
Copy link
Contributor

Shane32 commented Jun 16, 2025

By adding a reference to System.Memory package this change could also be done for .NET Desktop.

I wouldn't. There has been so many performance enhancements in .NET Core since .NET Framework, that if anyone wants better performance, they should use .NET Core. And I'm sure the fallback performance is plenty good enough anyway.

@Shane32
Copy link
Contributor

Shane32 commented Jun 16, 2025

The reduction in allocations is very impressive!

@Shane32
Copy link
Contributor

Shane32 commented Jun 16, 2025

I fixed the CI scripts in #592 btw

Copy link
Contributor

@Shane32 Shane32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! (subject to tests passing, of course)

@gfoidl
Copy link
Contributor Author

gfoidl commented Jun 17, 2025

I fixed the CI scripts in #592 btw

Should that change be separated into a own PR to fix CI?

Otherwise every PR has to re-do the same until your PR gets merged.


@Shane32 thanks for your review and comments 👍🏻

@Shane32
Copy link
Contributor

Shane32 commented Jun 17, 2025

Should that change be separated into a own PR to fix CI?

Honestly yes…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants