Matrix Multiplication Kernel Specification

Version: v1.2.3 | Status: Active | Last Updated: March 2026

Overview

Provides a tiled (cache-efficient) matrix multiplication kernel in pure Python/NumPy. Implements BLAS-style blocked matmul for improved cache locality, plus batched multiplication and FLOP counting.

Functional Requirements

Tiled matrix multiplication with configurable block size for cache efficiency
Batched matrix multiplication support for 3D tensor inputs
FLOP counting via matmul_flops(M, K, N) = 2MK*N

Interface

from codomyrmex.matmul_kernel import tiled_matmul, batched_matmul, matmul_flops

C = tiled_matmul(A, B, tile_size=32)
flops = matmul_flops(M=128, K=64, N=256)

Exports

tiled_matmul, batched_matmul, matmul_flops

Navigation

Source README | AGENTS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix Multiplication Kernel Specification

Overview

Functional Requirements

Interface

Exports

Navigation

FilesExpand file tree

SPEC.md

Latest commit

History

SPEC.md

File metadata and controls

Matrix Multiplication Kernel Specification

Overview

Functional Requirements

Interface

Exports

Navigation