Skip to content

Improve custom iterator performance#119080

Open
aurpine wants to merge 1 commit intogodotengine:masterfrom
aurpine:improve-custom-iterator-performance
Open

Improve custom iterator performance#119080
aurpine wants to merge 1 commit intogodotengine:masterfrom
aurpine:improve-custom-iterator-performance

Conversation

@aurpine
Copy link
Copy Markdown
Contributor

@aurpine aurpine commented Apr 29, 2026

Overview

By removing an Array allocation on every iteration for custom objects, we see a 40% reduction of overhead time in the more common typed case and a 30% reduction in the generic case.

Fixes #42053.

Background

Custom iterators are implemented by overriding the _iter_init, _iter_next and _iter_get methods. To allow updating the iterator state, the first two methods have an array parameter as a pseudo-pointer. However internally only the state—the singular element in the array—is persisted. The array passed to _iter_init and _iter_next is actually created each time the methods are called. These allocations make up a significant part of the iteration overhead.

There are two code-paths in the GDScript VM where custom iterators are used. One is in gdscript_vm.cpp with the specialized opcodes OPCODE_ITERATE_BEGIN_OBJECT and OPCODE_ITERATE_OBJECT. The other is in the generic iterator opcodes OPCODE_ITERATE_BEGIN and OPCODE_ITERATE which calls Variant::iter_* methods that reside in variant_setget.cpp. I presume the specialized opcode is used when the compiler knows the expression used in the for-loop is an object.

The two code-paths are similar and both have the array allocation inefficiency.

Solution

Instead of storing only the singular element, we now store the entire array. This way we only need to allocate once. All three of the _iter_* methods will have to be used differently in both code-paths to accommodate this change.

There are some slight differences due to the implementation, though they may be considered edge cases.

  • If debug is not enabled, the user may be able to append items to the array and have them persist. Before they would be discarded. This only affects the object opcode case as the variant case has a size check.
  • The parameter passed to _iter_get and _iter_next within a loop is the same Array instance whereas before it was always a newly created one.

Reviewer hints

I recommend viewing the diff in split layout as opposed to unified.

variant_setget.cpp

  • Each operation is separated nicely in this file: iter_init, iter_next and iter_get

gdscript_vm.cpp

  • Changes are in two blocks
    1. OPCODE_ITERATE_BEGIN_OBJECT: calls _iter_init and _iter_get
    2. OPCODE_ITERATE_OBJECT: calls _iter_next and _iter_get

Tests:

  • Added a new test for custom iterator correctness. This covers the success case for both code-paths.

Performance

I tested a local benchmark project using release export templates.

Old project: iterator-test.zip
New project for batch testing: custom-iterator-benchmark.zip.

Note: Output is delimited with tabs for easy spreadsheet calculations. Also, the batching of test runs seems to have made scenario A and B slower.

CustomIterable class implementation
class CustomIterable:
	var n: int

	func _init(n: int) -> void:
		self.n = n

	func _iter_init(iter: Array) -> bool:
		iter[0] = 0
		return iter[0] < n

	func _iter_next(iter: Array) -> bool:
		iter[0] += 1
		return iter[0] < n

	func _iter_get(iter: Variant) -> Variant:
		return iter
See the four testing scenarios A, B, C, D

A (int w/ int opcode)

Regular range for loop

for i in N:
    pass

B (int w/ generic opcode)

func iterate(iterator: Variant):
    for i in iterator:
        pass
iterate(N)

C (object w/ object opcode)

Object typed with custom iterator

for i in CustomIterable.new(N):
    pass

D (object w/ generic opcode)

iterate(CustomIterable.new(N))

Results

Scenarios A and B are unaffected but are shown for comparison. Each test is repeated T times with N iterations and then averaged.

Expand for detailed test summaries

All tests are run on Windows 11 x86_64. Old 320e818 new e53d192.

T = 20, N = 10,000,000, MSVC

Scenario Old (s) New (s) Diff
A 0.1282 0.1305 -
B 0.1279 0.1290 -
C 2.173 1.282 -41%
D 1.961 1.282 -35%

T = 10, N = 100,000,000, MSVC

Scenario Old (s) New (s) Diff
A 1.2714 1.3122 -
B 1.2725 1.2931 -
C 21.928 12.874 -41%
D 19.783 12.931 -35%

T = 20, N = 10,000,000, MinGW

Scenario Old (s) New (s) Diff
A 0.0997 0.1005 -
B 0.0997 0.1005 -
C 2.0533 1.2376 -40%
D 1.7763 1.2619 -29%

T = 10, N = 100,000,000, MinGW

Scenario Old (s) New (s) Diff
A 1.0092 0.9990 -
B 1.0076 0.9985 -
C 20.722 12.422 -40%
D 17.834 12.321 -31%

A similar improvement is seen across both MSVC and MinGW builds. Scenario C (OPCODE_ITERATE_OBJECT_BEGIN/OPCODE_ITERATE_OBJECT) runs ~40% faster, while scenario D (OPCODE_ITERATE_BEGIN/OPCODE_ITERATE) runs ~30% faster. Feel free to share your own results 🙂

Interestingly, scenario D was slightly faster than scenario C but now it is marginally slower. This new outcome is not surprising since we expect the specialized opcode to be more performant.

Profiling

I used Tracy to profile a single iteration of the typed opcode with N=1,000,000 on MSVC. Only the _iter_* methods show up because the engine opcodes are not annotated. Zooming into the zones, we can see that the gaps before an _iter_next call is greatly minimized in the change.

image

Top: before the change, bottom: with the change

Looking at the cumulative data, the _iter_* method call times remain within a margin of error.

image

Left: before the change, right: with the change

@AThousandShips AThousandShips added this to the 4.x milestone Apr 29, 2026
@aurpine aurpine marked this pull request as ready for review April 29, 2026 18:45
@aurpine aurpine requested review from a team as code owners April 29, 2026 18:45
@aurpine aurpine force-pushed the improve-custom-iterator-performance branch 2 times, most recently from e53d192 to 30da9c1 Compare April 30, 2026 14:30
@aurpine aurpine force-pushed the improve-custom-iterator-performance branch from 30da9c1 to b0e277e Compare April 30, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom iterator performance is very poor compared to allocating an array and traversing it

2 participants