Skip to content

Introduce generic dynamic array with unit tests #212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AW-AlanWu
Copy link
Contributor

@AW-AlanWu AW-AlanWu commented Jun 2, 2025

Motivation

  • Fixed-size buffers such as TYPES and PH2_IR_FLATTEN are hard to extend.
  • strbuf_t stores bytes only; it cannot handle pointers or large structs (type_t, etc.).
  • Using a contiguous dynamic array for hash-map buckets improves cache locality and therefore runtime performance.

What’s in this PR

  1. Add generic dynamic array

    • dynarr_t stores size, capacity, elem_size, and an arena pointer.
    • Helper APIs: init, reserve, resize, push_*, extend, get_*, set_raw.
    • Bounds checks give safety with negligible cost.
  2. Add unit tests (tests/test_dynarr.c)

    • Cover init, growth, bulk append, random access, mutation and error paths.
    • Act as living documentation.
  3. Replace strbuf_t *SOURCE in src/global.c with dynarr_t *

    • Demonstrates real use and unblocks future clean-ups.

Advantages

  • Uniform memory model via the existing arena allocator.
  • Built-in range checks catch OOB(Out-Of-Bound) bugs early (temporarily commented in dynarr_get_byte until SOURCE OOB issue is fixed).
  • API names follow common dynamic-array conventions, easing onboarding.

Known issues

  1. test_dynarr.c currently #include "src/globals.c", so any change in globals.c requires make update-snapshots to pass snapshot check

  2. shecc currently has OOB access on SOURCE bytes.


Next steps

  • Migrate all strbuf_t useses to dynarr_t.
  • Migrate remaining fixed-size arrays to dynarr_t.
  • Re-enable the commented bounds check once the OOB bug is resolved.
  • Clean-up codebase, remove redundant function and structures.

Performance impact

benchmark

Benchmark Machine

Benchmark Machine
            .-/+oossssoo+/-.               alanhacker@alanhacker-ASUS-TUF-Gaming-A15-FA507NV-FA507NV 
        `:+ssssssssssssssssss+:`           --------------------------------------------------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 24.04.2 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: ASUS TUF Gaming A15 FA507NV_FA507NV 1.0 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.11.0-26-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 9 hours, 38 mins 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 2345 (dpkg), 10 (flatpak), 14 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.2.21 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 1920x1080 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   DE: GNOME 46.0 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   WM: Mutter 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   WM Theme: Adwaita 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Theme: Orchis-Dark-Compact [GTK2/3] 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    Icons: Yaru [GTK2/3] 
  +sssssssssdmydMMMMMMMMddddyssssssss+     Terminal: gnome-terminal 
   /ssssssssssshdmNNNNmyNMMMMhssssss/      CPU: AMD Ryzen 5 7535HS with Radeon Graphics (12) @ 4.603GHz 
    .ossssssssssssssssssdMMMNysssso.       GPU: NVIDIA GeForce RTX 4060 Max-Q / Mobile 
      -+sssssssssssssssssyyyssss+-         GPU: AMD ATI Radeon 680M 
        `:+ssssssssssssssssss+:`           Memory: 7766MiB / 31328MiB 
            .-/+oossssoo+/-.

Benchmark Script

Benchmark Script
#!/bin/bash
export LC_ALL=C

n=15
warmup=5

total_user=0
total_sys=0
total_elapsed=0
total_rss=0

tmp_file="$(mktemp)"

echo "Warming up $warmup times..."
for i in $(seq 1 $warmup); do
    ./out/shecc ./src/main.c >/dev/null 2>&1
done

echo "Running $n benchmarks..."
for i in $(seq 1 $n); do
    /usr/bin/time -f "%U %S %e %M" -o "$tmp_file" ./out/shecc ./src/main.c >/dev/null 2>&1

    read user sys elapsed rss < "$tmp_file"

    echo "Run $i: user=${user}s sys=${sys}s elapsed=${elapsed}s maxrss=${rss}KB"

    total_user=$(awk "BEGIN {printf \"%.6f\", $total_user + $user}")
    total_sys=$(awk "BEGIN {printf \"%.6f\", $total_sys + $sys}")
    total_elapsed=$(awk "BEGIN {printf \"%.6f\", $total_elapsed + $elapsed}")
    total_rss=$(awk "BEGIN {print $total_rss + $rss}")
done

rm -f "$tmp_file"

avg_user=$(awk "BEGIN {printf \"%.6f\", $total_user / $n}")
avg_sys=$(awk "BEGIN {printf \"%.6f\", $total_sys / $n}")
avg_elapsed=$(awk "BEGIN {printf \"%.6f\", $total_elapsed / $n}")
avg_rss=$(awk "BEGIN {printf \"%.2f\", $total_rss / $n}")

echo "----------------------------------"
echo "Average user time:    ${avg_user}s"
echo "Average system time:  ${avg_sys}s"
echo "Average elapsed time: ${avg_elapsed}s"
echo "Average max RSS:      ${avg_rss} KB"

Before

Result
Warming up 5 times...
Running 15 benchmarks...
Run 1: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 2: user=0.06s sys=0.15s elapsed=0.21s maxrss=294408KB
Run 3: user=0.06s sys=0.15s elapsed=0.22s maxrss=294216KB
Run 4: user=0.06s sys=0.15s elapsed=0.22s maxrss=294472KB
Run 5: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 6: user=0.05s sys=0.15s elapsed=0.21s maxrss=294472KB
Run 7: user=0.05s sys=0.16s elapsed=0.22s maxrss=294408KB
Run 8: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 9: user=0.06s sys=0.15s elapsed=0.22s maxrss=294408KB
Run 10: user=0.06s sys=0.15s elapsed=0.22s maxrss=294344KB
Run 11: user=0.05s sys=0.16s elapsed=0.22s maxrss=294408KB
Run 12: user=0.06s sys=0.15s elapsed=0.22s maxrss=294344KB
Run 13: user=0.06s sys=0.16s elapsed=0.22s maxrss=294408KB
Run 14: user=0.06s sys=0.15s elapsed=0.22s maxrss=294216KB
Run 15: user=0.06s sys=0.15s elapsed=0.22s maxrss=294280KB
----------------------------------
Average user time:    0.058000s
Average system time:  0.152000s
Average elapsed time: 0.218667s
Average max RSS:      294373.87 KB

After

Result
Warming up 5 times...
Running 15 benchmarks...
Run 1: user=0.06s sys=0.15s elapsed=0.22s maxrss=298248KB
Run 2: user=0.06s sys=0.15s elapsed=0.22s maxrss=298244KB
Run 3: user=0.06s sys=0.15s elapsed=0.22s maxrss=298120KB
Run 4: user=0.07s sys=0.14s elapsed=0.22s maxrss=298120KB
Run 5: user=0.06s sys=0.15s elapsed=0.22s maxrss=297992KB
Run 6: user=0.07s sys=0.15s elapsed=0.22s maxrss=298184KB
Run 7: user=0.07s sys=0.15s elapsed=0.23s maxrss=298120KB
Run 8: user=0.06s sys=0.15s elapsed=0.22s maxrss=298056KB
Run 9: user=0.06s sys=0.15s elapsed=0.22s maxrss=297612KB
Run 10: user=0.07s sys=0.15s elapsed=0.23s maxrss=298056KB
Run 11: user=0.06s sys=0.16s elapsed=0.22s maxrss=298120KB
Run 12: user=0.06s sys=0.15s elapsed=0.22s maxrss=298308KB
Run 13: user=0.06s sys=0.16s elapsed=0.23s maxrss=298052KB
Run 14: user=0.06s sys=0.16s elapsed=0.22s maxrss=297992KB
Run 15: user=0.06s sys=0.17s elapsed=0.23s maxrss=298184KB
----------------------------------
Average user time:    0.062667s
Average system time:  0.152667s
Average elapsed time: 0.222667s
Average max RSS:      298093.87 KB

After benchmark, we observed an average runtime difference of around 15ms in stage-0 compilation.
Considering the total build time and the scope of changes, this delta seems reasonable and within acceptable limits.

Summary by Bito

This pull request introduces a generic dynamic array implementation, replacing fixed-size buffers to enhance flexibility, performance, and memory management. It includes comprehensive unit tests covering various functionalities and updates to global state management, lexer, and parser to utilize the new structure, improving overall code quality.

AW-AlanWu and others added 3 commits June 2, 2025 23:51
Introduce a reusable, arena-allocated dynamic array implementation.
The new 'dynarr_t' type encapsulates element size, current size,
capacity, and a pointer to the arena allocator.

Core operations include:
  - 'dynarr_init': initialize array with specified capacity.
  - 'dynarr_reserve': reserve given capacity.
  - 'dynarr_resize': adjust array size, growing if needed.
  - 'dynarr_push_raw/byte/word': append elements of arbitrary types.
  - 'dynarr_extend': bulk append buffer of elements.
  - 'dynarr_get_raw/byte/word': retrieve elements by index with checks.
  - 'dynarr_set_raw': overwrite an element at a given index.

Relying on the existing arena allocator ensures proper byte alignment
and eliminates failure checks.

This implementation can replace the current 'strbuf_t' and most
fixed-size arrays.
In addition to improving consistency in memory management, the built-in
boundary checks enhance safety, while the impact on performance remains
within an acceptable margin.

Co-authored-by: Jim Hsu <[email protected]>
Add a set of unit tests for 'dynarr_t' to verify correctness and serve as
usage examples. These tests cover initialization, resizing, appending,
retrieval, and error conditions, helping contributors understand how to
use the API safely and correctly.
Replaces the "strbuf_t *SOURCE" in "src/global.c" with "dynarr_t *".

Co-authored-by: Jim Hsu <[email protected]>
@AW-AlanWu
Copy link
Contributor Author

Known issues

  1. test_dynarr.c currently #include "src/globals.c", so any change in globals.c requires make update-snapshots to pass snapshot check

I’ve come up with a possible solution for this issue. Maybe we can create a new unittest/ directory under tests/, and put all the unit test related files there.
Then we can add a script like tests/unittest/driver.sh to handle running the unit tests consistently. We could also update the Makefile to add a target for compiling the tests/unittest/*.c files and call tests/unittest/driver.sh.

This structure would also make it easy to add unit tests for other components like the arena_t and hashmap_t in the future, to ensure they are correctly implemented.

Would this approach be acceptable?

If so, I’ll include this change in the Add unit tests for dynarr_t commit later.

@jserv
Copy link
Collaborator

jserv commented Jun 3, 2025

I’ve come up with a possible solution for this issue. Maybe we can create a new unittest/ directory under tests/, and put all the unit test related files there. Then we can add a script like tests/unittest/driver.sh to handle running the unit tests consistently. We could also update the Makefile to add a target for compiling the tests/unittest/*.c files and call tests/unittest/driver.sh.

It is not necessary to create unittest directory inside tests directory. You can simply put the test cases in tests directory. The tests/driver.sh should trigger all available tests.

@AW-AlanWu
Copy link
Contributor Author

It is not necessary to create unittest directory inside tests directory. You can simply put the test cases in tests directory. The tests/driver.sh should trigger all available tests.

Thank you for the guidance!
However, I’d like to ask for some clarification regarding the concrete approach:

Do you mean writing the unit tests directly in driver.sh, or should I write .c files under the tests directory?

If it’s writing them in driver.sh, how should I go about including ../src/globals.c (e.g., using #include "../src/globals.c")?

If it’s the latter (writing .c files), I’m currently facing two main issues:

  1. Every time I modify globals.c, I have to run update-snapshots, which can be a bit inconvenient.
  2. I sometimes want to write unit tests for behaviors that are expected to fail at runtime — for example, intentionally passing NULL to functions that require non-NULL arguments, or accessing invalid indices. In such cases, writing only .c files is not enough, and maybe I need to coordinate with the shell script to verify these runtime failures.

Is there a recommended workaround or a more flexible way to handle these kinds of tests?

@jserv
Copy link
Collaborator

jserv commented Jun 3, 2025

  1. Every time I modify globals.c, I have to run update-snapshots, which can be a bit inconvenient.

You can improve Makefile to automate this process.

  1. I sometimes want to write unit tests for behaviors that are expected to fail at runtime — for example, intentionally passing NULL to functions that require non-NULL arguments, or accessing invalid indices. In such cases, writing only .c files is not enough, and maybe I need to coordinate with the shell script to verify these runtime failures.

You can create another shell script like tests/run.sh to specify the item(s) you want to examine. See https://github.com/jserv/amacc/blob/master/scripts/runtest.py

@ChAoSUnItY
Copy link
Collaborator

Bounds checks give safety with negligible cost.

Although, as you stated previously, shecc currently has an out-of-bounds access on SOURCE, which causes the boundary check logic to appear unused, this is not entirely accurate. As far as I know:

  1. Your implementation does not have actual boundary check logic in dynarr_get_byte(dynarr_t*, int).
  2. Related to the above, your benchmark only shows that GCC with -O1 likely inlines dynarr_get_byte(dynarr_t*, int) at the call site. This results in a minor performance difference—but not due to the boundary check itself.
  3. Based on my experience and this post, boundary check will have nonzero impact, but may be insignificant.

Therefore, I suggest removing this statement unless you've confirmed this behavior through further testing.

@ChAoSUnItY
Copy link
Collaborator

ChAoSUnItY commented Jun 3, 2025

test_dynarr.c currently #include "src/globals.c", so any change in globals.c requires make update-snapshots to pass snapshot check

I'm afraid that the inclusion of src/globals.c is not the major cause but the nature of update/check snapshot script. In this case, I would suggest by referring V Lang's approach on unit testing (in fact, rust does similarly), put snapshot test (input one and output one) in tests/snapshots, and unit tests in other file (name it unittest), so that:

  1. Only files in snapshots folder would need to have snapshot jsons.
  2. By integrating new testing script, we can still test C source files under snapshots and unittest with runtime output validation, and later migrate unit tests in tests/driver.sh to unittest in near future so it could enhance the test report and expandability of testing.

What do you think? @jserv

@jserv
Copy link
Collaborator

jserv commented Jun 3, 2025

  1. Only files in snapshots folder would need to have snapshot jsons.
  2. By integrating new testing script, we can still test C source files under snapshots and unittest with runtime output validation, and later migrate unit tests in tests/driver.sh to unittest in near future so it could enhance the test report and expandability of testing.

It sounds fine. However the snapshot was a hack when we lack of analysis tools for IR. Maybe we can drop it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants