Skip to content

fix and feature: clean MVMextractmode#381

Draft
DasVinch wants to merge 17 commits into
framework-devfrom
vd/clean_mvm
Draft

fix and feature: clean MVMextractmode#381
DasVinch wants to merge 17 commits into
framework-devfrom
vd/clean_mvm

Conversation

@DasVinch

@DasVinch DasVinch commented Jun 2, 2026

Copy link
Copy Markdown
Member

THIS IS A DEMO PR WITH NO INTENTION TO MERGE AS AS

This PR is a re-write of MVMextractModes.c that showcases a number of interesting approaches for future development.
Summary of changes:

  • Refactor linalgebra/MVMextractModes.c to finish removing unused or irrelevant attributes. Sanitize a few segfaults, remove references to fetching images into milk-data by ImageID. [must go to upstream]

  • Remove test file that was not actually testing our production code [upstream]

  • There's also some fixing to streamDelay.c that got bundled here through a few merges.

  • Create a separate module linalgebra2 really just for linalgebra2/MVMextractModes.cpp as a workspace

  • Create a test environment adapted to pytest used in conjuction with pyMilk under /testing

What I showcase

C++

  • We can introduce C++ code TUs in a (strong) C environment. That needed ~5 changes to the rest of the codebase (restrict / __restrict and (char*) forceful string literal casts)
  • Functionalization and a moderate amount of object-oriented C++ greatly helps readability of large functions, and not getting lost in long if/else block.
  • Calculations are displaced to a secondary file. API contracts are clear: mask is loaded, matrix is loaded, and the function backend::matrixMul() has a clear API contract: take data from the float-casted input array and write it to the output stream. See mvm_auxiliaries.hpp|cpp
  • Functions that are extracted, some of which are purely technical in nature e.g. the cuda initialization, can be library-fied for better reuse.
  • All data in C++ object can be arranged to be de'alloced automatically (that's the unique_ptr). Since the logic is in the destructor, when arguments are optionally used / optionally nullptr, it's much easier to keep track.
  • The rewrite helped to see that the current MVMextract was not abiding by it's arguments very well, and not consistently (masking = 1 + axmode = 1, etc)

TEST SUITE

  • testing/ is configured as a python module, for tests. testing/tests is a pytest suite (that contains nothing meaningful but the layout) that can be run with /testing>$ pytest
  • testing/tests/mains files are not executed by default, but pytest can be used to invoke them and run dev sessions. This is what I did for now, in a single file experiment_testing.
  • Using parametrization, it's easy to run the same test function many times with different options
  • Using fixtures, it's easy to cause changes to the environment, e.g. spoofing MILK_SHM_DIR during the tests
  • Using fixtures, it's easy to offer reproducible / recyclable setup/teardown codes to a number of tests. Here I use it to serve the FPS for the MVM to 72 different executions of the function
def mvm_correctness_all_params(
    fps_m: FPS,
    axmode: int,
    normalize: bool,
    GPUindex: int,
    masking: bool,
    input_dtype: np.typing.DTypeLike,
):
  • Finally, this has allowed me to 1/ check that the new MVM in cpp is correct compared to a very concise python implementation for all combinations of axmode, normalization, gpu/cpu/OPENMP/BLAS, input dtype, masking.
  • and that the existing MVM is correct at least for axmode 0, BLAS/GPU, masking, normalization
  • and get some timing printouts:
>$ # LINALG_ORIGINAL CPU
>$ pytest -sx tests/mains/experiment_deploy.py -k 'test_mvm_correctness[False-99-False] and TestsAxmode0' | grep ELA
ELAPSED [MODE EXTRACT] [GPUindex=99  normalize=0     masking=0     input_dtype=float32 ] 1477.60 ms
>$ # LINALG_ORIGINAL GPU
>$ pytest -sx tests/mains/experiment_deploy.py -k 'test_mvm_correctness[False-0-False] and TestsAxmode0' | grep ELA
ELAPSED [MODE EXTRACT] [GPUindex=0   normalize=0     masking=0     input_dtype=float32 ] 507.18 ms
>$ # LINALG_NEW CPU
>$ pytest -sx tests/mains/experiment_deploy.py -k 'test_mvm_correctness[False-99-False] and TestsAxmode0' | grep ELA
ELAPSED [MODE EXTRACT] [GPUindex=99  normalize=0     masking=0     input_dtype=float32 ] 1006.73 ms
>$ # LINALG_NEW GPU
>$ pytest -sx tests/mains/experiment_deploy.py -k 'test_mvm_correctness[False-0-False] and TestsAxmode0' | grep ELA
ELAPSED [MODE EXTRACT] [GPUindex=0   normalize=0     masking=0     input_dtype=float32 ] 499.78 ms
>$ # LINALG_ORIGINAL GPU MASKED
>$ pytest -sx tests/mains/experiment_deploy.py -k 'test_mvm_correctness[True-0-False] and TestsAxmode0' | grep ELA
ELAPSED [MODE EXTRACT] [GPUindex=0   normalize=0     masking=1     input_dtype=float32 ] 292.73 ms
>$ # LINALG_NEW GPU MASKED
>$ pytest -sx tests/mains/experiment_deploy.py -k 'test_mvm_correctness[True-0-False] and TestsAxmode0' | grep ELA
ELAPSED [MODE EXTRACT] [GPUindex=0   normalize=0     masking=1     input_dtype=float32 ] 296.04 ms

Timings 1/3 better on CPU (mostly because masking got factored in), a few % on GPU.

What's not great (in this impementation of the MVM): error checking.

Nits:

  • Re-added Ctrl+A in fpsCTRL to attach tmux session.
  • Many string buffers for printing need increases in size to get rid of compiler warnings. I just touched one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants