Skip to content

[Proposal] Introducing a Standard Library for Leo #29478

Description

@mitchmindtree

Context

The Leo compiler currently hardcodes a large public surface that conceptually belongs in a library, not the compiler. Concretely:

  • ~600 intrinsic symbols registered in crates/ast/src/functions/intrinsic.rs (hashing: BHP/Poseidon/Keccak/SHA3/Pedersen across multiple output types; commitments; ECDSA; SNARK verification; ChaCha randomness; Mapping/Vector/Optional ops; serialize/deserialize).
  • ~30 env/context accessors (self.caller, block.height, network.id, etc.) wired through Intrinsic::* variants and the type checker.
  • Namespace-style call dispatch like Poseidon2::hash_to_field(x) is not real type-method resolution - the compiler recognises Poseidon2 as a magic name in crates/span/symbols.txt and dispatches via Intrinsic::from_symbol(...) in crates/passes/src/type_checking/visitor.rs.
  • No prelude / no Leo-visible declarations for any of this. The compiler is the documentation.

This is a maintenance burden, makes evolution painful (every new hash or container op is a compiler change), prevents users from reading their stdlib as source, and conflates "language" with "library".

Goal: push as much of this as possible into a Leo-source std library, leaving the compiler with a thin intrinsic layer (analogous to Rust's core::intrinsics::*). std ships embedded in the leo binary, materialises to a well-known path on first use, and is implicitly available to every package unless explicitly disabled in the manifest.

User decisions captured upfront:

  • Backward compat: existing Poseidon2::hash_to_field-style names remain, lowered to std calls, with deprecation warnings. Hard cutover deferred to a future major version.
  • Std structure: one std library package with multi-file modules (requires adding mod declarations to the language).
  • Distribution: source bundled via include_dir!, extracted on first use to ~/.aleo/std/<leo-version>/std/. Version directory prevents stale caches across upgrades.

Approach

A four-layer architecture, introduced over staged phases. Each phase is independently shippable.

┌─────────────────────────────────────────────────────────────┐
│ Layer 4: User Leo programs                                  │
│   import std;  std::hash::poseidon2_to_field(x)             │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: leo-std crate (Rust shell embedding Leo source)    │
│   crates/leo-std/std/src/{lib,hash,mapping,env,crypto,…}.leo│
├─────────────────────────────────────────────────────────────┤
│ Layer 2: Multi-file library + module support (parser/pkg)   │
│   `mod hash;` in lib.leo  →  src/hash.leo                   │
├─────────────────────────────────────────────────────────────┤
│ Layer 1: Compiler intrinsics (thin, unstable, std-only)     │
│   __intrinsic_poseidon2_hash_to_field, __intrinsic_…        │
└─────────────────────────────────────────────────────────────┘

Layer 1 - Thin compiler intrinsic surface

  • Today crates/ast/src/functions/intrinsic.rs maps two flavours of names to Intrinsic variants: the user-facing Poseidon2::hash_to_field form and an underscore-prefixed form like _poseidon2_hash_to_field (already present in crates/span/symbols.txt). Promote the underscored form to the canonical, unstable intrinsic surface (rename pass: _*__intrinsic_* to make the boundary unmistakable).
  • Add a per-file or per-package marker that permits calling __intrinsic_* symbols (analogous to #![feature(core_intrinsics)]). Default-deny in user code; std's package manifest declares "unstable_intrinsics": true.
  • Keep the existing namespace dispatch (Poseidon2::hash_to_field, Mapping::get, ChaCha::rand_field, signature::verify, etc.) but reroute it to emit a deprecation warning and lower it to a call into std (std::hash::poseidon2_to_field, std::mapping::get, …). Implementation: in crates/passes/src/type_checking/visitor.rs, when Intrinsic::from_symbol(...) succeeds via a non-intrinsic name, attach a DeprecatedNamespaceCall { suggested_path } warning and rewrite the call node to a std path in a new pass.
  • Member-style calls on values that currently dispatch to intrinsics (s.verify(addr, msg) on Signature) follow the same model: rewrite to std::crypto::signature::verify(s, addr, msg).

Layer 2 - Multi-file libraries and modules

Required to make std readable as more than one huge file.

  • Parser: extend crates/parser-rowan/src/parser/items.rs to accept mod <ident>; (and pub mod <ident>;) at file scope inside a Library package. Wire up the matching LALRPOP grammar in crates/parser/ for legacy compat. Mirror in crates/ast/src/ with a new ModuleDecl item kind.
  • Package resolver: in crates/package/src/compilation_unit.rs (from_package_path, currently expecting a single lib.leo), recurse mod declarations: mod hash; in src/lib.leo resolves to src/hash.leo (or src/hash/mod.leo if the directory form is needed). Build a tree of source files belonging to one logical compilation unit.
  • Path resolution: extend Leo's expression-path syntax so that std::hash::poseidon2_to_field parses as a 3-segment path: <imported program/lib>::<module path>::<item>. Today the parser only handles 2 segments (Type::method or program.aleo::fn). Add an N-segment path resolved in the new name-resolution pass; modules form intermediate namespaces. Use :: consistently (no slashes, no .aleo suffix for libraries).
  • Visibility: introduce pub/pub(crate) on mod and on items inside modules. Default-private. Walk the module tree during name resolution.
  • Scope: keep this minimal - just enough to support std's needs. Generics over types are not required (Leo only has const generics today); std will have one function per hash variant × output type, accepting a degree of repetition that mirrors the underlying intrinsic surface.

Layer 3 - The leo-std crate (Rust shell + Leo source)

New workspace member at crates/leo-std/. The crate is a thin Rust shell whose sole job is to embed the Leo std source so it can be re-used by any other Rust crate in the workspace (and, in principle, published independently). The leo CLI crate depends on leo-std and pulls the embedded source from it.

crates/leo-std/
├── Cargo.toml                # depends on include_dir
├── src/
│   └── lib.rs                # `pub static STD_SOURCE: include_dir::Dir = include_dir!("$CARGO_MANIFEST_DIR/std");`
│                             #  plus a small helper `pub fn materialise(dir: &Path) -> io::Result<()>`
└── std/                      # The Leo std package itself - the tree that gets extracted at runtime
    ├── program.json          # Library manifest, name "std", version pinned to leo version
    ├── src/
    │   ├── lib.leo           # pub mod hash; pub mod commit; pub mod mapping; ...
    │   ├── hash.leo          # poseidon2_to_field, bhp256_to_field, keccak256_to_field, ...
    │   ├── commit.leo        # bhp256_commit_to_address, pedersen64_commit_to_field, ...
    │   ├── crypto/
    │   │   ├── mod.leo       # pub mod signature; pub mod ecdsa; pub mod snark;
    │   │   ├── signature.leo
    │   │   ├── ecdsa.leo
    │   │   └── snark.leo
    │   ├── mapping.leo       # get, set, remove, contains, get_or_use
    │   ├── vector.leo        # get, set, push, pop, len, clear, swap_remove
    │   ├── option.leo        # unwrap, unwrap_or
    │   ├── serialize.leo     # to_bits, from_bits, *_raw variants
    │   ├── rand.leo          # ChaCha::rand_* wrappers
    │   └── env.leo           # self_caller, self_signer, block_height, network_id, ...
    └── tests/                # Test programs that exercise every public std fn

Rationale for the Rust crate wrapper:

  • Keeps the embedding mechanism (include_dir!) and the materialisation helper out of the CLI crate, so any tooling that needs the std source (LSP, formatter, doc generator, future test runners) can depend on leo-std directly.
  • A single source of truth: leo-std::STD_SOURCE is the only place the bundled tree appears. Bumping the std contents touches one crate.
  • The Leo package root is crates/leo-std/std/, named std from the Leo side - so user-facing import std; and the on-disk crate name leo-std stay distinct and unambiguous (Rust crate vs. Leo package).
  • The crate is small enough to publish to crates.io independently if Provable ever wants out-of-tree consumers (e.g. a third-party Leo IDE) to embed the same std.

Each std function is a small inline fn that calls a single __intrinsic_* and is generic only over const params where the underlying intrinsic varies by const. Example:

// crates/leo-std/std/src/hash.leo
inline fn poseidon2_to_field(input: field) -> field {
    return __intrinsic_poseidon2_hash_to_field(input);
}

inline fn bhp256_to_address(input: field) -> address {
    return __intrinsic_bhp256_hash_to_address(input);
}

Because functions are inlined, std imposes zero runtime cost - the generated Aleo bytecode is identical to today.

Layer 4 - Packaging: implicit dep, opt-out, bundling

  • Embed source: crates/leo-std/src/lib.rs calls include_dir!("$CARGO_MANIFEST_DIR/std"), exposing the tree as leo_std::STD_SOURCE. crates/leo-std/src/lib.rs also provides a materialise(dest: &Path) -> io::Result<()> helper that writes the embedded tree to disk, plus a VERSION constant pulled from CARGO_PKG_VERSION.
  • Materialisation: crates/leo/Cargo.toml adds leo-std as a workspace dep. At the start of leo build (or any command that resolves a package), crates/leo/src/cli/helpers/context.rs calls leo_std::materialise(...) to ensure ~/.aleo/std/<LEO_VERSION>/std/{program.json,src/...} exists; if missing or version mismatch, the helper rewrites the tree. Materialisation is idempotent and uses a sentinel file (e.g. .leo-version) to detect stale extractions.
  • Implicit dep injection: in crates/package/src/package.rs::collect_declared_deps_recursive, after reading the user manifest, prepend a synthetic Dependency { name: "std", location: Location::Local, path: Some(materialised_path), edition: None } unless the manifest opts out.
  • Manifest opt-out: extend Manifest in crates/package/src/manifest.rs with an optional "std": false field (default true). Used by std's own manifest (std cannot depend on itself) and by minimal compiler-test fixtures. Name bikeshed: "no_std": true (Rust-familiar) is the alternative.
  • Network deps unchanged: std is Location::Local only - never fetched from the network and never deployed on-chain. This keeps the implicit dep cost zero and prevents accidental on-chain coupling.
  • CLI: add leo std path (prints materialised dir) and leo std reset (force re-materialise). Optionally leo std doc later to render the docs from the .leo sources.

Backward compatibility & deprecation

  • Phase out the namespace-syntax intrinsics by emitting a LeoWarning::DeprecatedIntrinsicNamespace { old, replacement } from the type-checker pass that recognises them.
  • Warnings include the suggested std path and a stable opt-out flag for downstream tooling that wants to suppress them during migration.
  • Removal is not in scope for this plan - it should be a separate decision in a future Leo major release once the ecosystem has migrated.

Phased rollout

Five phases. Each ends with a working compiler and a runnable test suite.

  1. Layer 2: multi-file libraries alone. No std yet. Add mod, multi-file resolution, N-segment paths, visibility. Land with parser tests + one fixture library that exercises the new surface.
  2. Layer 1: intrinsic boundary. Rename _foo symbols to __intrinsic_foo; gate them behind a per-package unstable flag; keep existing namespace syntax working as today (no rewrite yet, no deprecation).
  3. Layer 3: leo-std crate. Create crates/leo-std/ (Rust shell + Leo source under crates/leo-std/std/) with wrappers for the full intrinsic surface, exercising layers 1 + 2. Hand-build tests that import std as a local path dep before implicit injection lands.
  4. Layer 4: bundling + implicit dep. Add include_dir!, materialisation, implicit dep injection, manifest opt-out, CLI commands.
  5. Deprecation pass: rewrite namespace-syntax calls into std calls during type checking; emit warnings. Update existing tests and tutorials to use std::*.

Critical files to modify

Single representative path per concern - others follow the same pattern.

Intrinsic boundary (Phase 2):

  • crates/ast/src/functions/intrinsic.rs - rename symbol-table entries, add per-symbol "internal" marker.
  • crates/span/symbols.txt - introduce __intrinsic_* symbols alongside existing ones.
  • crates/passes/src/type_checking/visitor.rs::visit_call - require the unstable flag for __intrinsic_* calls.
  • crates/package/src/manifest.rs - add unstable_intrinsics: bool for std's manifest only.

Multi-file libraries (Phase 1):

  • crates/parser-rowan/src/parser/items.rs - parse mod foo; and pub mod foo;.
  • crates/parser/src/... (LALRPOP) - mirror the grammar.
  • crates/ast/src/program/ - new Module / ModuleDecl AST nodes.
  • crates/package/src/compilation_unit.rs::from_package_path - walk module declarations, load referenced files into one CompilationUnit.
  • crates/passes/src/symbol_table_creation/ and crates/passes/src/type_checking/ - resolve N-segment paths through module trees and apply visibility.

leo-std crate + Leo source (Phase 3):

  • New Rust crate crates/leo-std/ with Cargo.toml, src/lib.rs (exposes STD_SOURCE, VERSION, materialise).
  • Leo package tree under crates/leo-std/std/: program.json, src/lib.leo, src/hash.leo, ... See Layer 3 above. Pattern: one inline fn per intrinsic, no logic beyond the call.
  • Register leo-std in top-level Cargo.toml workspace members.

Bundling + implicit dep (Phase 4):

  • crates/leo-std/src/lib.rs - STD_SOURCE: include_dir::Dir and materialise helper (already added in Phase 3; wired into the CLI here).
  • crates/leo/Cargo.toml - add leo-std as a workspace dep.
  • crates/leo/src/cli/helpers/context.rs::home - add std_path() returning ~/.aleo/std/<version>/std/ and a materialise_std() that delegates to leo_std::materialise on first use.
  • crates/package/src/package.rs::collect_declared_deps_recursive - prepend synthetic std dep unless manifest opts out.
  • crates/package/src/manifest.rs::Manifest - add std: Option<bool> (defaulting to true).
  • crates/leo/src/cli/commands/ - new std.rs for leo std path / leo std reset.

Deprecation (Phase 5):

  • crates/passes/src/type_checking/visitor.rs - new sub-pass that, after intrinsic resolution, rewrites the call expression to a std path and emits LeoWarning::DeprecatedIntrinsicNamespace.
  • crates/errors/src/... - new warning variant.
  • tests/tests/**/*.leo - migrate to std::* calls; keep a small fixture covering the deprecation warning path.

Reused utilities

  • aleo_std::aleo_dir() for ~/.aleo resolution (already used in context.rs).
  • include_dir::Dir for embedding source (add as a dep in crates/leo-std/Cargo.toml).
  • Existing Package::from_directory / CompilationUnit::from_package_path for treating ~/.aleo/std/<version>/std/ exactly like any local library dep.
  • Intrinsic::from_symbol for symbol-to-intrinsic mapping (kept; only the gating changes).
  • LeoWarning infra in crates/errors/ for deprecation messages.

Verification

End-to-end checks at each phase. Run as cargo test -p leo-parser, cargo test -p leo-passes, cargo test -p leo-package, plus integration tests in tests/tests/.

  1. Multi-file libs: a fixture library tests/tests/library-modules/ with lib.leo declaring mod foo; and src/foo.leo defining pub fn bar(). Verify a downstream program calls lib::foo::bar() and rejects lib::foo::private_fn().
  2. Intrinsics gating: a fixture with unstable_intrinsics: false calling __intrinsic_poseidon2_hash_to_field must fail with a clear error; std (with the flag) must compile.
  3. std self-test: cargo test -p leo-std (or a sibling leo-std-tests crate) runs every public function with a representative input and asserts the output matches the existing namespace-syntax intrinsic for that operation. This is the regression net for the whole migration.
  4. Implicit dep: leo new foo && cd foo && cat > src/main.leo <<'EOF' ...uses std... EOF && leo build succeeds on a fresh project with no manual dep entry. Opt-out fixture ("std": false in manifest) succeeds for trivial programs and fails when std is referenced.
  5. Materialisation: delete ~/.aleo/std/<version>/, run leo build, confirm files are recreated. Bump leo-std's Cargo.toml version (mirrored into crates/leo-std/std/program.json), confirm the old dir is left in place and a new one is created at the new version path.
  6. Deprecation warnings: build the entire tests/tests/execution/ fixture set; assert that every previously-passing program still passes, but with the expected deprecation warning count. Existing fixtures updated to use std should produce zero new warnings.
  7. Smoke test against a real example: pick a representative program (e.g. tests/tests/execution/hashes.leo) and confirm the compiled Aleo bytecode is byte-identical before and after migration to std (because std calls are inline fns lowering to the same intrinsic).

Open questions / non-goals

  • Generics over types: not added here. If/when Leo grows them, std can collapse the per-output-type duplication (*_to_field / *_to_group / ...) into one generic function.
  • On-chain std: std is not a deployable Aleo program; it is purely compile-time. If a future requirement is to deploy std (so multiple programs share its bytecode on-chain), that is a separate, larger design.
  • Doc generation: leo std doc is mentioned above as a stretch goal, not a required deliverable.
  • Manifest field name: "std": false vs "no_std": true - decide during Phase 4 implementation; either works.

Metadata

Metadata

Assignees

Labels

proposalA proposal for something new.🧱 Core CompilerAnything related to the core compiler including parsing, analysis, transforms, codegen, etc.🧹 Code QualityAnything related to code refactoring, repo enhancements, etc.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions