Introduce generic dynamic array with unit tests #212

AW-AlanWu · 2025-06-02T14:46:39Z

Motivation

Fixed-size buffers such as TYPES and PH2_IR_FLATTEN are hard to extend.
strbuf_t stores bytes only; it cannot handle pointers or large structs (type_t, etc.).
Using a contiguous dynamic array for hash-map buckets improves cache locality and therefore runtime performance.

What’s in this PR

Add generic dynamic array
- dynarr_t stores size, capacity, elem_size, and an arena pointer.
- Helper APIs: init, reserve, resize, push_*, extend, get_*, set_raw.
- Bounds checks give safety with negligible cost.
Add unit tests (tests/test_dynarr.c)
- Cover init, growth, bulk append, random access, mutation and error paths.
- Act as living documentation.
Replace strbuf_t *SOURCE in src/global.c with dynarr_t *
- Demonstrates real use and unblocks future clean-ups.

Advantages

Uniform memory model via the existing arena allocator.
Built-in range checks catch OOB(Out-Of-Bound) bugs early (temporarily commented in dynarr_get_byte until SOURCE OOB issue is fixed).
API names follow common dynamic-array conventions, easing onboarding.

Known issues

test_dynarr.c currently #include "src/globals.c", so any change in globals.c requires make update-snapshots to pass snapshot check
shecc currently has OOB access on SOURCE bytes.

Next steps

Migrate all strbuf_t useses to dynarr_t.
Migrate remaining fixed-size arrays to dynarr_t.
Re-enable the commented bounds check once the OOB bug is resolved.
Clean-up codebase, remove redundant function and structures.

Performance impact

benchmark

Benchmark Machine

            .-/+oossssoo+/-.               alanhacker@alanhacker-ASUS-TUF-Gaming-A15-FA507NV-FA507NV 
        `:+ssssssssssssssssss+:`           --------------------------------------------------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 24.04.2 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: ASUS TUF Gaming A15 FA507NV_FA507NV 1.0 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.11.0-26-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 9 hours, 38 mins 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 2345 (dpkg), 10 (flatpak), 14 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.2.21 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 1920x1080 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   DE: GNOME 46.0 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   WM: Mutter 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   WM Theme: Adwaita 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Theme: Orchis-Dark-Compact [GTK2/3] 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    Icons: Yaru [GTK2/3] 
  +sssssssssdmydMMMMMMMMddddyssssssss+     Terminal: gnome-terminal 
   /ssssssssssshdmNNNNmyNMMMMhssssss/      CPU: AMD Ryzen 5 7535HS with Radeon Graphics (12) @ 4.603GHz 
    .ossssssssssssssssssdMMMNysssso.       GPU: NVIDIA GeForce RTX 4060 Max-Q / Mobile 
      -+sssssssssssssssssyyyssss+-         GPU: AMD ATI Radeon 680M 
        `:+ssssssssssssssssss+:`           Memory: 7766MiB / 31328MiB 
            .-/+oossssoo+/-.

Benchmark Script

#!/bin/bash
export LC_ALL=C

n=15
warmup=5

total_user=0
total_sys=0
total_elapsed=0
total_rss=0

tmp_file="$(mktemp)"

echo "Warming up $warmup times..."
for i in $(seq 1 $warmup); do
    ./out/shecc ./src/main.c >/dev/null 2>&1
done

echo "Running $n benchmarks..."
for i in $(seq 1 $n); do
    /usr/bin/time -f "%U %S %e %M" -o "$tmp_file" ./out/shecc ./src/main.c >/dev/null 2>&1

    read user sys elapsed rss < "$tmp_file"

    echo "Run $i: user=${user}s sys=${sys}s elapsed=${elapsed}s maxrss=${rss}KB"

    total_user=$(awk "BEGIN {printf \"%.6f\", $total_user + $user}")
    total_sys=$(awk "BEGIN {printf \"%.6f\", $total_sys + $sys}")
    total_elapsed=$(awk "BEGIN {printf \"%.6f\", $total_elapsed + $elapsed}")
    total_rss=$(awk "BEGIN {print $total_rss + $rss}")
done

rm -f "$tmp_file"

avg_user=$(awk "BEGIN {printf \"%.6f\", $total_user / $n}")
avg_sys=$(awk "BEGIN {printf \"%.6f\", $total_sys / $n}")
avg_elapsed=$(awk "BEGIN {printf \"%.6f\", $total_elapsed / $n}")
avg_rss=$(awk "BEGIN {printf \"%.2f\", $total_rss / $n}")

echo "----------------------------------"
echo "Average user time:    ${avg_user}s"
echo "Average system time:  ${avg_sys}s"
echo "Average elapsed time: ${avg_elapsed}s"
echo "Average max RSS:      ${avg_rss} KB"

Before

Result

Warming up 5 times...
Running 15 benchmarks...
Run 1: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 2: user=0.06s sys=0.15s elapsed=0.21s maxrss=294408KB
Run 3: user=0.06s sys=0.15s elapsed=0.22s maxrss=294216KB
Run 4: user=0.06s sys=0.15s elapsed=0.22s maxrss=294472KB
Run 5: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 6: user=0.05s sys=0.15s elapsed=0.21s maxrss=294472KB
Run 7: user=0.05s sys=0.16s elapsed=0.22s maxrss=294408KB
Run 8: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 9: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 10: user=0.06s sys=0.15s elapsed=0.22s maxrss=294344KB
Run 11: user=0.05s sys=0.16s elapsed=0.22s maxrss=294408KB
Run 12: user=0.06s sys=0.15s elapsed=0.22s maxrss=294344KB
Run 13: user=0.06s sys=0.16s elapsed=0.22s maxrss=294408KB
Run 14: user=0.06s sys=0.15s elapsed=0.22s maxrss=294216KB
Run 15: user=0.06s sys=0.15s elapsed=0.22s maxrss=294280KB
----------------------------------
Average user time:    0.058000s
Average system time:  0.152000s
Average elapsed time: 0.218667s
Average max RSS:      294373.87 KB

After

Result

Warming up 5 times...
Running 15 benchmarks...
Run 1: user=0.06s sys=0.15s elapsed=0.22s maxrss=298248KB
Run 2: user=0.06s sys=0.15s elapsed=0.22s maxrss=298244KB
Run 3: user=0.06s sys=0.15s elapsed=0.22s maxrss=298120KB
Run 4: user=0.07s sys=0.14s elapsed=0.22s maxrss=298120KB
Run 5: user=0.06s sys=0.15s elapsed=0.22s maxrss=297992KB
Run 6: user=0.07s sys=0.15s elapsed=0.22s maxrss=298184KB
Run 7: user=0.07s sys=0.15s elapsed=0.23s maxrss=298120KB
Run 8: user=0.06s sys=0.15s elapsed=0.22s maxrss=298056KB
Run 9: user=0.06s sys=0.15s elapsed=0.22s maxrss=297612KB
Run 10: user=0.07s sys=0.15s elapsed=0.23s maxrss=298056KB
Run 11: user=0.06s sys=0.16s elapsed=0.22s maxrss=298120KB
Run 12: user=0.06s sys=0.15s elapsed=0.22s maxrss=298308KB
Run 13: user=0.06s sys=0.16s elapsed=0.23s maxrss=298052KB
Run 14: user=0.06s sys=0.16s elapsed=0.22s maxrss=297992KB
Run 15: user=0.06s sys=0.17s elapsed=0.23s maxrss=298184KB
----------------------------------
Average user time:    0.062667s
Average system time:  0.152667s
Average elapsed time: 0.222667s
Average max RSS:      298093.87 KB

After benchmark, we observed an average runtime difference of around 15ms in stage-0 compilation.
Considering the total build time and the scope of changes, this delta seems reasonable and within acceptable limits.

Summary by Bito

This pull request introduces a generic dynamic array implementation, replacing fixed-size buffers to enhance flexibility, performance, and memory management. It includes comprehensive unit tests covering various functionalities and updates to global state management, lexer, and parser, improving overall code quality and maintainability.

src/globals.c

AW-AlanWu · 2025-06-03T00:44:16Z

Known issues

test_dynarr.c currently #include "src/globals.c", so any change in globals.c requires make update-snapshots to pass snapshot check

I’ve come up with a possible solution for this issue. Maybe we can create a new unittest/ directory under tests/, and put all the unit test related files there.
Then we can add a script like tests/unittest/driver.sh to handle running the unit tests consistently. We could also update the Makefile to add a target for compiling the tests/unittest/*.c files and call tests/unittest/driver.sh.

This structure would also make it easy to add unit tests for other components like the arena_t and hashmap_t in the future, to ensure they are correctly implemented.

Would this approach be acceptable?

If so, I’ll include this change in the Add unit tests for dynarr_t commit later.

jserv · 2025-06-03T01:15:00Z

I’ve come up with a possible solution for this issue. Maybe we can create a new unittest/ directory under tests/, and put all the unit test related files there. Then we can add a script like tests/unittest/driver.sh to handle running the unit tests consistently. We could also update the Makefile to add a target for compiling the tests/unittest/*.c files and call tests/unittest/driver.sh.

It is not necessary to create unittest directory inside tests directory. You can simply put the test cases in tests directory. The tests/driver.sh should trigger all available tests.

AW-AlanWu · 2025-06-03T01:36:32Z

It is not necessary to create unittest directory inside tests directory. You can simply put the test cases in tests directory. The tests/driver.sh should trigger all available tests.

Thank you for the guidance!
However, I’d like to ask for some clarification regarding the concrete approach:

Do you mean writing the unit tests directly in driver.sh, or should I write .c files under the tests directory?

If it’s writing them in driver.sh, how should I go about including ../src/globals.c (e.g., using #include "../src/globals.c")?

If it’s the latter (writing .c files), I’m currently facing two main issues:

Every time I modify globals.c, I have to run update-snapshots, which can be a bit inconvenient.
I sometimes want to write unit tests for behaviors that are expected to fail at runtime — for example, intentionally passing NULL to functions that require non-NULL arguments, or accessing invalid indices. In such cases, writing only .c files is not enough, and maybe I need to coordinate with the shell script to verify these runtime failures.

Is there a recommended workaround or a more flexible way to handle these kinds of tests?

jserv · 2025-06-03T01:40:22Z

Every time I modify globals.c, I have to run update-snapshots, which can be a bit inconvenient.

You can improve Makefile to automate this process.

I sometimes want to write unit tests for behaviors that are expected to fail at runtime — for example, intentionally passing NULL to functions that require non-NULL arguments, or accessing invalid indices. In such cases, writing only .c files is not enough, and maybe I need to coordinate with the shell script to verify these runtime failures.

You can create another shell script like tests/run.sh to specify the item(s) you want to examine. See https://github.com/jserv/amacc/blob/master/scripts/runtest.py

ChAoSUnItY · 2025-06-03T09:49:33Z

Bounds checks give safety with negligible cost.

Although, as you stated previously, shecc currently has an out-of-bounds access on SOURCE, which causes the boundary check logic to appear unused, this is not entirely accurate. As far as I know:

Your implementation does not have actual boundary check logic in dynarr_get_byte(dynarr_t*, int).
Related to the above, your benchmark only shows that GCC with -O1 likely inlines dynarr_get_byte(dynarr_t*, int) at the call site. This results in a minor performance difference—but not due to the boundary check itself.
Based on my experience and this post, boundary check will have nonzero impact, but may be insignificant.

Therefore, I suggest removing this statement unless you've confirmed this behavior through further testing.

ChAoSUnItY · 2025-06-03T10:33:29Z

test_dynarr.c currently #include "src/globals.c", so any change in globals.c requires make update-snapshots to pass snapshot check

I'm afraid that the inclusion of src/globals.c is not the major cause but the nature of update/check snapshot script. In this case, I would suggest by referring V Lang's approach on unit testing (in fact, rust does similarly), put snapshot test (input one and output one) in tests/snapshots, and unit tests in other file (name it unittest), so that:

Only files in snapshots folder would need to have snapshot jsons.
By integrating new testing script, we can still test C source files under snapshots and unittest with runtime output validation, and later migrate unit tests in tests/driver.sh to unittest in near future so it could enhance the test report and expandability of testing.

What do you think? @jserv

jserv · 2025-06-03T10:42:17Z

Only files in snapshots folder would need to have snapshot jsons.

By integrating new testing script, we can still test C source files under snapshots and unittest with runtime output validation, and later migrate unit tests in tests/driver.sh to unittest in near future so it could enhance the test report and expandability of testing.

It sounds fine. However the snapshot was a hack when we lack of analysis tools for IR. Maybe we can drop it later.

Introduce a reusable, arena-allocated dynamic array implementation. The new 'dynarr_t' type encapsulates element size, current size, capacity, and a pointer to the arena allocator. Core operations include: - 'dynarr_init': initialize array with specified capacity. - 'dynarr_reserve': reserve given capacity. - 'dynarr_resize': adjust array size, growing if needed. - 'dynarr_push_raw/byte/word': append elements of arbitrary types. - 'dynarr_extend': bulk append buffer of elements. - 'dynarr_get_raw/byte/word': retrieve elements by index with checks. - 'dynarr_set_raw': overwrite an element at a given index. Relying on the existing arena allocator ensures proper byte alignment and eliminates failure checks. This implementation can replace the current 'strbuf_t' and most fixed-size arrays. In addition to improving consistency in memory management, the built-in boundary checks enhance safety, while the impact on performance remains within an acceptable margin. Co-authored-by: Jim Hsu <[email protected]>

Replaces the "strbuf_t *SOURCE" in "src/global.c" with "dynarr_t *". Co-authored-by: Jim Hsu <[email protected]>

AW-AlanWu · 2025-06-23T18:16:29Z

Your implementation does not have actual boundary check logic in dynarr_get_byte(dynarr_t*, int).
Therefore, I suggest removing this statement unless you've confirmed this behavior through further testing.

I’ve fixed the OOB issue in SOURCE access—it was caused by read_char and skip_whitespace in lexer.c not checking if source_idx exceeded the boundary.

And thanks to @ChAoSUnItY’s suggestion, I removed the unit tests for dynarr_t, and replaced the abort() call in the OOB check. Now, out-of-bounds get and set operations in dynarr_t no longer cause an immediate crash, but instead return a default value.

If this approach is acceptable, I will update the PR title and description accordingly.

jserv · 2025-07-06T16:44:56Z

tools/inliner.c

@@ -167,7 +167,7 @@ int main(int argc, char *argv[])
     *   __c("}\n");
     */
    write_str("void __c(char *src) {\n");
-    write_str("    strbuf_puts(SOURCE, src);\n");
+    write_str("    dynarr_extend(SOURCE, src, strlen(src));\n");


It is weird to invoke dynarr_extend here. Conceptually, it is used for code generation, implying put/generate/output.

jserv

Rebase the latest master branch.

AW-AlanWu force-pushed the feat/dynarr branch from 72e5979 to abddb46 Compare June 2, 2025 15:00

jserv requested review from ChAoSUnItY, DrXiao, fennecJ and vacantron June 2, 2025 15:13

DrXiao reviewed Jun 2, 2025

View reviewed changes

src/globals.c Outdated Show resolved Hide resolved

AW-AlanWu force-pushed the feat/dynarr branch from abddb46 to 250ffeb Compare June 2, 2025 15:57

AW-AlanWu and others added 3 commits June 24, 2025 01:52

Replace SOURCE with dynarr_t

4073625

Replaces the "strbuf_t *SOURCE" in "src/global.c" with "dynarr_t *". Co-authored-by: Jim Hsu <[email protected]>

Fix SOURCE Out-Of-Bound issue

e31ad36

AW-AlanWu force-pushed the feat/dynarr branch from 250ffeb to e31ad36 Compare June 23, 2025 18:13

jserv reviewed Jul 6, 2025

View reviewed changes

jserv requested changes Aug 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce generic dynamic array with unit tests #212

Introduce generic dynamic array with unit tests #212

AW-AlanWu commented Jun 2, 2025 •

edited by bito-code-review bot

Loading

Uh oh!

Uh oh!

AW-AlanWu commented Jun 3, 2025

Known issues

Uh oh!

jserv commented Jun 3, 2025

Uh oh!

AW-AlanWu commented Jun 3, 2025

Uh oh!

jserv commented Jun 3, 2025

Uh oh!

ChAoSUnItY commented Jun 3, 2025

Uh oh!

ChAoSUnItY commented Jun 3, 2025 •

edited

Loading

Uh oh!

jserv commented Jun 3, 2025

Uh oh!

AW-AlanWu commented Jun 23, 2025

Uh oh!

jserv Jul 6, 2025

Uh oh!

jserv left a comment

Uh oh!

Uh oh!

Introduce generic dynamic array with unit tests #212

Are you sure you want to change the base?

Introduce generic dynamic array with unit tests #212

Conversation

AW-AlanWu commented Jun 2, 2025 • edited by bito-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What’s in this PR

Advantages

Known issues

Next steps

Performance impact

Benchmark Machine

Benchmark Script

Before

After

Summary by Bito

Uh oh!

Uh oh!

AW-AlanWu commented Jun 3, 2025

Known issues

Uh oh!

jserv commented Jun 3, 2025

Uh oh!

AW-AlanWu commented Jun 3, 2025

Uh oh!

jserv commented Jun 3, 2025

Uh oh!

ChAoSUnItY commented Jun 3, 2025

Uh oh!

ChAoSUnItY commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jserv commented Jun 3, 2025

Uh oh!

AW-AlanWu commented Jun 23, 2025

Uh oh!

jserv Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AW-AlanWu commented Jun 2, 2025 •

edited by bito-code-review bot

Loading

ChAoSUnItY commented Jun 3, 2025 •

edited

Loading