Skip to content

feat: blob storage for Mops packages#477

Draft
rvanasa wants to merge 3 commits intomainfrom
registry-blob-storage
Draft

feat: blob storage for Mops packages#477
rvanasa wants to merge 3 commits intomainfrom
registry-blob-storage

Conversation

@rvanasa
Copy link
Copy Markdown
Collaborator

@rvanasa rvanasa commented Apr 9, 2026

Migrates the Mops registry from on-chain per-file storage (in dynamically spawned Storage canisters) to Caffeine Immutable Object Storage. Packages are now archived as .tar.gz, uploaded to the Caffeine gateway, and referenced by a single SHA-256 content hash stored on-chain. Existing on-chain packages remain fully accessible via the legacy download path.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Mops registry publish/download flow from per-file on-chain storage canisters to Caffeine Immutable Object Storage by uploading a single .tar.gz blob and persisting its SHA-256 content hash on-chain, while keeping legacy on-chain packages accessible.

Changes:

  • Bump API version to 1.4 and extend the main canister interface with blob publish and blob-hash query endpoints.
  • Update CLI publishing to archive package files and upload the archive to blob storage, then finalize publish with the blob hash.
  • Extend mops.lock to v4 to support blob-hash-locked packages alongside legacy per-file hashes.

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
package-lock.json Dependency lockfile adjustments.
cli/mops.ts Bump CLI API version to 1.4.
cli/integrity.ts Add lockfile v4 with blobHashes support and blob-hash integrity checks.
cli/declarations/main/main.did.js Regenerated Candid JS declarations for new blob APIs.
cli/declarations/main/main.did.d.ts Regenerated TS declarations for new blob APIs.
cli/declarations/main/main.did Regenerated Candid interface with blob APIs.
cli/commands/publish.ts Publish via .tar.gz + blob upload + finalize with blob hash.
cli/api/storageClient.ts New client for merkle-tree hashing + chunked blob upload/download via gateway.
cli/api/downloadPackageFiles.ts Add blob-based download path (tar extraction) with legacy fallback.
backend/main/types.mo Add CreateCertificateResult type for blob storage protocol.
backend/main/registry/Registry.mo Store blobHashByPackageId and create blob-based package releases.
backend/main/PackagePublisher.mo Add startBlobPublish/finishBlobPublish flow for blob publishing.
backend/main/main-canister.mo Blob storage protocol endpoints, blob publish endpoints, and backup schema bump.
Comments suppressed due to low confidence (1)

backend/main/main-canister.mo:485

  • Returning #ok([]) from getFileIds when a blob hash exists will cause legacy clients that still depend on getFileIds for installation to silently proceed with an empty file set (resulting in broken installs without an explicit error). Consider returning an error that instructs clients to upgrade/use the blob download path, or providing a backwards-compatible way to download blob-based packages.
  };

  public query func getPackageVersionHistory(name : PackageName) : async [PackageSummaryWithChanges] {
    let ?versions = registry.getPackageVersions(name) else Debug.trap("Package '" # name # "' not found");
    versions
    |> Array.reverse(_)
    |> Array.map<PackageVersion, PackageSummaryWithChanges>(
      _,
      func(version) {
        let ?summary = _getPackageSummaryWithChanges(name, version) else Debug.trap("Package '" # name # "' not found");
        summary;
      },

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +410 to +423
// Create tar.gz archive of package files (excluding docs.tgz)
let sourceFiles = files.filter((f) => f !== docsFile);
console.log("Creating package archive...");
let archivePath = path.join(rootDir, ".mops/.publish-archive.tar.gz");
fs.mkdirSync(path.dirname(archivePath), { recursive: true });
await tarCreate(
{
gzip: true,
file: archivePath,
cwd: rootDir,
portable: true,
},
sourceFiles,
);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sourceFiles comes from globbySync() (which uses process.cwd() by default), but the archive is created with cwd: rootDir. If mops publish is executed from a subdirectory, the file paths may not be relative to rootDir, causing tarCreate to miss files or throw. Consider calling globbySync(..., { cwd: rootDir }) (or normalizing each path with path.relative(rootDir, ...)) so the tar input paths match the archive cwd consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +466 to +468
rootHash = await uploadBlob(archiveData, identity, (pct) => {
logUpdate(
`Uploading to blob storage ${progressBar(Math.round((pct / 100) * 80) + 2 * (100 / total), 100)}`,
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The progress bar math inside the uploadBlob callback can exceed the declared total (e.g. the computed value can go above 100), which makes progressBar() render beyond its intended width and conflicts with the step-based progress updates. Consider using a single consistent scale (either step-based or percent-based) and clamping the value passed to progressBar() to the chosen total.

Suggested change
rootHash = await uploadBlob(archiveData, identity, (pct) => {
logUpdate(
`Uploading to blob storage ${progressBar(Math.round((pct / 100) * 80) + 2 * (100 / total), 100)}`,
const uploadStepStart = 2 * (100 / total);
const uploadStepSize = 100 / total;
rootHash = await uploadBlob(archiveData, identity, (pct) => {
const uploadProgress = Math.min(
100,
Math.max(
uploadStepStart,
Math.round(uploadStepStart + (pct / 100) * uploadStepSize),
),
);
logUpdate(
`Uploading to blob storage ${progressBar(uploadProgress, 100)}`,

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +48

let filesData = new Map<string, Array<number>>();

await new Promise<void>((resolve, reject) => {
let parser = new TarParser();
parser.on("entry", (entry: ReadEntry) => {
let entryPath = sanitizeTarPath(entry.path);
if (!entryPath) {
entry.resume();
return;
}
let chunks: Buffer[] = [];
entry.on("data", (chunk: Buffer) => chunks.push(chunk));
entry.on("end", () => {
let data = Buffer.concat(chunks);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archiveData is a .tar.gz, but it is fed directly into tar's parser without gunzipping or enabling gzip support. This will fail to parse (or emit garbage) for gzipped archives. Consider either gunzipping first (e.g. via zlib.gunzipSync) or constructing the parser with gzip support (if supported by the tar version).

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +42

await new Promise<void>((resolve, reject) => {
let parser = new TarParser();
parser.on("entry", (entry: ReadEntry) => {
let entryPath = sanitizeTarPath(entry.path);
if (!entryPath) {
entry.resume();
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entries from the tar archive are stored using entry.path without validation. A crafted archive can include paths like ../... or absolute paths, which could lead to path traversal when callers later write these files to disk. Consider rejecting absolute paths, normalizing with path.posix.normalize, and ensuring the final path stays within the intended package root; also consider skipping non-file entries (directories, symlinks) explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +223
): Promise<Response> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
return fetch(url, { ...init, signal: controller.signal }).finally(() =>
clearTimeout(timeout),
);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default MOPS_STORAGE_PROJECT_ID fallback ("0000000-0000-0000-0000-00000000000") does not match the typical 8-4-4-4-12 UUID-like format (it appears to be the wrong length). If the gateway expects a strict format, this default will cause hard-to-debug upload/download failures. Consider correcting the default value (or making it required and failing fast with a clear error).

Copilot uses AI. Check for mistakes.
Comment on lines 530 to +551
publishingFiles.delete(publishingId);
publishingPackages.delete(publishingId);
publishingPackageFileStats.delete(publishingId);
publishingTestStats.delete(publishingId);
publishingNotes.delete(publishingId);
publishingBenchmarks.delete(publishingId);
publishingDocsCoverage.delete(publishingId);

#ok({
config = publishing.config;
publication;
isNewPackage;
});
};

public func finishBlobPublish(caller : Principal, publishingId : PublishingId, blobHash : Text) : async Result.Result<{ config : PackageConfigV3; publication : PackagePublication; isNewPackage : Bool }, PublishingErr> {
assert (not Principal.isAnonymous(caller));

if (not _isValidBlobHash(blobHash)) {
return #err("Invalid blob hash format. Expected 'sha256:<64-lowercase-hex-chars>'");
};

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finishBlobPublish bypasses the existing package size enforcement (MAX_PACKAGE_SIZE) used for per-file uploads, and publishingPackageFileStats stays at its default values for blob publishes. This enables publishing arbitrarily large archives and makes stored file stats inaccurate. Consider passing blob size (and ideally file count) from the CLI to the canister and validating it here (or adding an equivalent enforcement mechanism for blob uploads).

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +142
let publication = {
user = newRelease.userId;
time = Time.now();
storage = Principal.fromText("aaaaa-aa");
};
packagePublications.put(packageId, publication);

blobHashByPackageId.put(packageId, newRelease.blobHash);

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blob package publications set publication.storage to the management canister principal (aaaaa-aa). This value is used by the CLI (e.g. getPackageFilesInfo()/install flow) to create a storage actor, and can lead to confusing failures or silent installs when blob-based packages are encountered. Consider using a dedicated sentinel value that clients can reliably detect (or extending the API to omit/optionally provide storage for blob-based packages).

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +165

function buildMerkleTree(
fileData: Uint8Array,
contentType: string,
): {
chunks: Uint8Array[];
chunkHashes: Uint8Array[];
blobTree: BlobHashTreeJSON;
rootHash: string;
} {
if (fileData.length === 0) {
throw new Error("Cannot build merkle tree from empty data");
}

const chunks = splitChunks(fileData);
const chunkHashes = chunks.map((c) => chunkHash(c));

let level: TreeNode[] = chunkHashes.map((h) => ({
hash: h,
left: null,
right: null,
}));

while (level.length > 1) {
const nextLevel: TreeNode[] = [];
for (let i = 0; i < level.length; i += 2) {
const left = level[i]!;
const right = level[i + 1] ?? null;
const parent = nodeHash(left.hash, right ? right.hash : null);
nextLevel.push({ hash: parent, left, right });
}
level = nextLevel;
}

const chunksRoot = level[0]!;

const headers: Record<string, string> = {
"Content-Type": contentType,
"Content-Length": fileData.length.toString(),
};
const headerLines = Object.entries(headers).map(
([k, v]) => `${k.trim()}: ${v.trim()}`,
);
headerLines.sort();

const metaHash = metadataHash(headers);
const metaNode: TreeNode = { hash: metaHash, left: null, right: null };
const combinedHash = nodeHash(chunksRoot.hash, metaNode.hash);
const combinedRoot: TreeNode = {
hash: combinedHash,
left: chunksRoot,
right: metaNode,
};

const blobTree: BlobHashTreeJSON = {
tree_type: "DSBMTWH",
chunk_hashes: chunkHashes.map(hashToShaString),
tree: nodeToJSON(combinedRoot),
headers: headerLines,
};

return {
chunks,
chunkHashes,
blobTree,
rootHash: hashToShaString(combinedRoot.hash),
};
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new storage client implements hashing, certificate retrieval, and multipart upload logic, but there are currently no CLI tests exercising it (couldn’t find any references under cli/tests). Given the risk of subtle protocol/encoding regressions, consider adding unit tests for buildMerkleTree (root hash stability) and mocked-network tests for uploadBlob/downloadBlob error handling/retries.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants