Improve custom iterator performance by aurpine · Pull Request #119080 · godotengine/godot

aurpine · 2026-04-29T08:09:39Z

Overview

By removing an Array allocation on every iteration for custom objects, we see a 40% reduction of overhead time in the more common typed case and a 30% reduction in the generic case.

Fixes #42053.

Background

Custom iterators are implemented by overriding the _iter_init, _iter_next and _iter_get methods. To allow updating the iterator state, the first two methods have an array parameter as a pseudo-pointer. However internally only the state—the singular element in the array—is persisted. The array passed to _iter_init and _iter_next is actually created each time the methods are called. These allocations make up a significant part of the iteration overhead.

There are two code-paths in the GDScript VM where custom iterators are used. One is in gdscript_vm.cpp with the specialized opcodes OPCODE_ITERATE_BEGIN_OBJECT and OPCODE_ITERATE_OBJECT. The other is in the generic iterator opcodes OPCODE_ITERATE_BEGIN and OPCODE_ITERATE which calls Variant::iter_* methods that reside in variant_setget.cpp. I presume the specialized opcode is used when the compiler knows the expression used in the for-loop is an object.

The two code-paths are similar and both have the array allocation inefficiency.

Solution

Instead of storing only the singular element, we now store the entire array. This way we only need to allocate once. All three of the _iter_* methods will have to be used differently in both code-paths to accommodate this change.

There are some slight differences due to the implementation, though they may be considered edge cases.

If debug is not enabled, the user may be able to append items to the array and have them persist. Before they would be discarded. This only affects the object opcode case as the variant case has a size check.
The parameter passed to _iter_get and _iter_next within a loop is the same Array instance whereas before it was always a newly created one.

Reviewer hints

I recommend viewing the diff in split layout as opposed to unified.

variant_setget.cpp

Each operation is separated nicely in this file: iter_init, iter_next and iter_get

gdscript_vm.cpp

Changes are in two blocks
1. OPCODE_ITERATE_BEGIN_OBJECT: calls _iter_init and _iter_get
2. OPCODE_ITERATE_OBJECT: calls _iter_next and _iter_get

Tests:

Added a new test for custom iterator correctness. This covers the success case for both code-paths.

Performance

I tested a local benchmark project using release export templates.

Old project: iterator-test.zip
New project for batch testing: custom-iterator-benchmark.zip.

Note: Output is delimited with tabs for easy spreadsheet calculations. Also, the batching of test runs seems to have made scenario A and B slower.

CustomIterable class implementation

class CustomIterable:
	var n: int

	func _init(n: int) -> void:
		self.n = n

	func _iter_init(iter: Array) -> bool:
		iter[0] = 0
		return iter[0] < n

	func _iter_next(iter: Array) -> bool:
		iter[0] += 1
		return iter[0] < n

	func _iter_get(iter: Variant) -> Variant:
		return iter

See the four testing scenarios A, B, C, D

A (int w/ int opcode)

Regular range for loop

for i in N:
    pass

B (int w/ generic opcode)

func iterate(iterator: Variant):
    for i in iterator:
        pass

iterate(N)

C (object w/ object opcode)

Object typed with custom iterator

for i in CustomIterable.new(N):
    pass

D (object w/ generic opcode)

iterate(CustomIterable.new(N))

Results

Scenarios A and B are unaffected but are shown for comparison. Each test is repeated T times with N iterations and then averaged.

Expand for detailed test summaries

All tests are run on Windows 11 x86_64. Old 320e818 new e53d192.

`T` = 20, `N` = 10,000,000, MSVC

Scenario	Old (s)	New (s)	Diff
A	0.1282	0.1305	-
B	0.1279	0.1290	-
C	2.173	1.282	-41%
D	1.961	1.282	-35%

`T` = 10, `N` = 100,000,000, MSVC

Scenario	Old (s)	New (s)	Diff
A	1.2714	1.3122	-
B	1.2725	1.2931	-
C	21.928	12.874	-41%
D	19.783	12.931	-35%

`T` = 20, `N` = 10,000,000, MinGW

Scenario	Old (s)	New (s)	Diff
A	0.0997	0.1005	-
B	0.0997	0.1005	-
C	2.0533	1.2376	-40%
D	1.7763	1.2619	-29%

`T` = 10, `N` = 100,000,000, MinGW

Scenario	Old (s)	New (s)	Diff
A	1.0092	0.9990	-
B	1.0076	0.9985	-
C	20.722	12.422	-40%
D	17.834	12.321	-31%

A similar improvement is seen across both MSVC and MinGW builds. Scenario C (OPCODE_ITERATE_OBJECT_BEGIN/OPCODE_ITERATE_OBJECT) runs ~40% faster, while scenario D (OPCODE_ITERATE_BEGIN/OPCODE_ITERATE) runs ~30% faster. Feel free to share your own results 🙂

Interestingly, scenario D was slightly faster than scenario C but now it is marginally slower. This new outcome is not surprising since we expect the specialized opcode to be more performant.

Profiling

I used Tracy to profile a single iteration of the typed opcode with N=1,000,000 on MSVC. Only the _iter_* methods show up because the engine opcodes are not annotated. Zooming into the zones, we can see that the gaps before an _iter_next call is greatly minimized in the change.

Top: before the change, bottom: with the change

Looking at the cumulative data, the _iter_* method call times remain within a margin of error.

Left: before the change, right: with the change

AThousandShips added enhancement topic:core performance labels Apr 29, 2026

AThousandShips added this to the 4.x milestone Apr 29, 2026

aurpine marked this pull request as ready for review April 29, 2026 18:45

aurpine requested review from a team as code owners April 29, 2026 18:45

aurpine force-pushed the improve-custom-iterator-performance branch 2 times, most recently from e53d192 to 30da9c1 Compare April 30, 2026 14:30

Improve object custom iterator performance by removing Array allocations

b0e277e

aurpine force-pushed the improve-custom-iterator-performance branch from 30da9c1 to b0e277e Compare April 30, 2026 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve custom iterator performance#119080

Improve custom iterator performance#119080
aurpine wants to merge 1 commit intogodotengine:masterfrom
aurpine:improve-custom-iterator-performance

aurpine commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

aurpine commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Background

Solution

Reviewer hints

Performance

A (int w/ int opcode)

B (int w/ generic opcode)

C (object w/ object opcode)

D (object w/ generic opcode)

Results

T = 20, N = 10,000,000, MSVC

T = 10, N = 100,000,000, MSVC

T = 20, N = 10,000,000, MinGW

T = 10, N = 100,000,000, MinGW

Profiling

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aurpine commented Apr 29, 2026 •

edited

Loading

`T` = 20, `N` = 10,000,000, MSVC

`T` = 10, `N` = 100,000,000, MSVC

`T` = 20, `N` = 10,000,000, MinGW

`T` = 10, `N` = 100,000,000, MinGW