Conversation
There was a problem hiding this comment.
Pull request overview
This PR migrates the Mops registry publish/download flow from per-file on-chain storage canisters to Caffeine Immutable Object Storage by uploading a single .tar.gz blob and persisting its SHA-256 content hash on-chain, while keeping legacy on-chain packages accessible.
Changes:
- Bump API version to
1.4and extend the main canister interface with blob publish and blob-hash query endpoints. - Update CLI publishing to archive package files and upload the archive to blob storage, then finalize publish with the blob hash.
- Extend
mops.lockto v4 to support blob-hash-locked packages alongside legacy per-file hashes.
Reviewed changes
Copilot reviewed 20 out of 22 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| package-lock.json | Dependency lockfile adjustments. |
| cli/mops.ts | Bump CLI API version to 1.4. |
| cli/integrity.ts | Add lockfile v4 with blobHashes support and blob-hash integrity checks. |
| cli/declarations/main/main.did.js | Regenerated Candid JS declarations for new blob APIs. |
| cli/declarations/main/main.did.d.ts | Regenerated TS declarations for new blob APIs. |
| cli/declarations/main/main.did | Regenerated Candid interface with blob APIs. |
| cli/commands/publish.ts | Publish via .tar.gz + blob upload + finalize with blob hash. |
| cli/api/storageClient.ts | New client for merkle-tree hashing + chunked blob upload/download via gateway. |
| cli/api/downloadPackageFiles.ts | Add blob-based download path (tar extraction) with legacy fallback. |
| backend/main/types.mo | Add CreateCertificateResult type for blob storage protocol. |
| backend/main/registry/Registry.mo | Store blobHashByPackageId and create blob-based package releases. |
| backend/main/PackagePublisher.mo | Add startBlobPublish/finishBlobPublish flow for blob publishing. |
| backend/main/main-canister.mo | Blob storage protocol endpoints, blob publish endpoints, and backup schema bump. |
Comments suppressed due to low confidence (1)
backend/main/main-canister.mo:485
- Returning
#ok([])fromgetFileIdswhen a blob hash exists will cause legacy clients that still depend ongetFileIdsfor installation to silently proceed with an empty file set (resulting in broken installs without an explicit error). Consider returning an error that instructs clients to upgrade/use the blob download path, or providing a backwards-compatible way to download blob-based packages.
};
public query func getPackageVersionHistory(name : PackageName) : async [PackageSummaryWithChanges] {
let ?versions = registry.getPackageVersions(name) else Debug.trap("Package '" # name # "' not found");
versions
|> Array.reverse(_)
|> Array.map<PackageVersion, PackageSummaryWithChanges>(
_,
func(version) {
let ?summary = _getPackageSummaryWithChanges(name, version) else Debug.trap("Package '" # name # "' not found");
summary;
},
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Create tar.gz archive of package files (excluding docs.tgz) | ||
| let sourceFiles = files.filter((f) => f !== docsFile); | ||
| console.log("Creating package archive..."); | ||
| let archivePath = path.join(rootDir, ".mops/.publish-archive.tar.gz"); | ||
| fs.mkdirSync(path.dirname(archivePath), { recursive: true }); | ||
| await tarCreate( | ||
| { | ||
| gzip: true, | ||
| file: archivePath, | ||
| cwd: rootDir, | ||
| portable: true, | ||
| }, | ||
| sourceFiles, | ||
| ); |
There was a problem hiding this comment.
sourceFiles comes from globbySync() (which uses process.cwd() by default), but the archive is created with cwd: rootDir. If mops publish is executed from a subdirectory, the file paths may not be relative to rootDir, causing tarCreate to miss files or throw. Consider calling globbySync(..., { cwd: rootDir }) (or normalizing each path with path.relative(rootDir, ...)) so the tar input paths match the archive cwd consistently.
| rootHash = await uploadBlob(archiveData, identity, (pct) => { | ||
| logUpdate( | ||
| `Uploading to blob storage ${progressBar(Math.round((pct / 100) * 80) + 2 * (100 / total), 100)}`, |
There was a problem hiding this comment.
The progress bar math inside the uploadBlob callback can exceed the declared total (e.g. the computed value can go above 100), which makes progressBar() render beyond its intended width and conflicts with the step-based progress updates. Consider using a single consistent scale (either step-based or percent-based) and clamping the value passed to progressBar() to the chosen total.
| rootHash = await uploadBlob(archiveData, identity, (pct) => { | |
| logUpdate( | |
| `Uploading to blob storage ${progressBar(Math.round((pct / 100) * 80) + 2 * (100 / total), 100)}`, | |
| const uploadStepStart = 2 * (100 / total); | |
| const uploadStepSize = 100 / total; | |
| rootHash = await uploadBlob(archiveData, identity, (pct) => { | |
| const uploadProgress = Math.min( | |
| 100, | |
| Math.max( | |
| uploadStepStart, | |
| Math.round(uploadStepStart + (pct / 100) * uploadStepSize), | |
| ), | |
| ); | |
| logUpdate( | |
| `Uploading to blob storage ${progressBar(uploadProgress, 100)}`, |
|
|
||
| let filesData = new Map<string, Array<number>>(); | ||
|
|
||
| await new Promise<void>((resolve, reject) => { | ||
| let parser = new TarParser(); | ||
| parser.on("entry", (entry: ReadEntry) => { | ||
| let entryPath = sanitizeTarPath(entry.path); | ||
| if (!entryPath) { | ||
| entry.resume(); | ||
| return; | ||
| } | ||
| let chunks: Buffer[] = []; | ||
| entry.on("data", (chunk: Buffer) => chunks.push(chunk)); | ||
| entry.on("end", () => { | ||
| let data = Buffer.concat(chunks); |
There was a problem hiding this comment.
archiveData is a .tar.gz, but it is fed directly into tar's parser without gunzipping or enabling gzip support. This will fail to parse (or emit garbage) for gzipped archives. Consider either gunzipping first (e.g. via zlib.gunzipSync) or constructing the parser with gzip support (if supported by the tar version).
|
|
||
| await new Promise<void>((resolve, reject) => { | ||
| let parser = new TarParser(); | ||
| parser.on("entry", (entry: ReadEntry) => { | ||
| let entryPath = sanitizeTarPath(entry.path); | ||
| if (!entryPath) { | ||
| entry.resume(); |
There was a problem hiding this comment.
Entries from the tar archive are stored using entry.path without validation. A crafted archive can include paths like ../... or absolute paths, which could lead to path traversal when callers later write these files to disk. Consider rejecting absolute paths, normalizing with path.posix.normalize, and ensuring the final path stays within the intended package root; also consider skipping non-file entries (directories, symlinks) explicitly.
| ): Promise<Response> { | ||
| const controller = new AbortController(); | ||
| const timeout = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS); | ||
| return fetch(url, { ...init, signal: controller.signal }).finally(() => | ||
| clearTimeout(timeout), | ||
| ); |
There was a problem hiding this comment.
The default MOPS_STORAGE_PROJECT_ID fallback ("0000000-0000-0000-0000-00000000000") does not match the typical 8-4-4-4-12 UUID-like format (it appears to be the wrong length). If the gateway expects a strict format, this default will cause hard-to-debug upload/download failures. Consider correcting the default value (or making it required and failing fast with a clear error).
| publishingFiles.delete(publishingId); | ||
| publishingPackages.delete(publishingId); | ||
| publishingPackageFileStats.delete(publishingId); | ||
| publishingTestStats.delete(publishingId); | ||
| publishingNotes.delete(publishingId); | ||
| publishingBenchmarks.delete(publishingId); | ||
| publishingDocsCoverage.delete(publishingId); | ||
|
|
||
| #ok({ | ||
| config = publishing.config; | ||
| publication; | ||
| isNewPackage; | ||
| }); | ||
| }; | ||
|
|
||
| public func finishBlobPublish(caller : Principal, publishingId : PublishingId, blobHash : Text) : async Result.Result<{ config : PackageConfigV3; publication : PackagePublication; isNewPackage : Bool }, PublishingErr> { | ||
| assert (not Principal.isAnonymous(caller)); | ||
|
|
||
| if (not _isValidBlobHash(blobHash)) { | ||
| return #err("Invalid blob hash format. Expected 'sha256:<64-lowercase-hex-chars>'"); | ||
| }; | ||
|
|
There was a problem hiding this comment.
finishBlobPublish bypasses the existing package size enforcement (MAX_PACKAGE_SIZE) used for per-file uploads, and publishingPackageFileStats stays at its default values for blob publishes. This enables publishing arbitrarily large archives and makes stored file stats inaccurate. Consider passing blob size (and ideally file count) from the CLI to the canister and validating it here (or adding an equivalent enforcement mechanism for blob uploads).
| let publication = { | ||
| user = newRelease.userId; | ||
| time = Time.now(); | ||
| storage = Principal.fromText("aaaaa-aa"); | ||
| }; | ||
| packagePublications.put(packageId, publication); | ||
|
|
||
| blobHashByPackageId.put(packageId, newRelease.blobHash); | ||
|
|
There was a problem hiding this comment.
Blob package publications set publication.storage to the management canister principal (aaaaa-aa). This value is used by the CLI (e.g. getPackageFilesInfo()/install flow) to create a storage actor, and can lead to confusing failures or silent installs when blob-based packages are encountered. Consider using a dedicated sentinel value that clients can reliably detect (or extending the API to omit/optionally provide storage for blob-based packages).
|
|
||
| function buildMerkleTree( | ||
| fileData: Uint8Array, | ||
| contentType: string, | ||
| ): { | ||
| chunks: Uint8Array[]; | ||
| chunkHashes: Uint8Array[]; | ||
| blobTree: BlobHashTreeJSON; | ||
| rootHash: string; | ||
| } { | ||
| if (fileData.length === 0) { | ||
| throw new Error("Cannot build merkle tree from empty data"); | ||
| } | ||
|
|
||
| const chunks = splitChunks(fileData); | ||
| const chunkHashes = chunks.map((c) => chunkHash(c)); | ||
|
|
||
| let level: TreeNode[] = chunkHashes.map((h) => ({ | ||
| hash: h, | ||
| left: null, | ||
| right: null, | ||
| })); | ||
|
|
||
| while (level.length > 1) { | ||
| const nextLevel: TreeNode[] = []; | ||
| for (let i = 0; i < level.length; i += 2) { | ||
| const left = level[i]!; | ||
| const right = level[i + 1] ?? null; | ||
| const parent = nodeHash(left.hash, right ? right.hash : null); | ||
| nextLevel.push({ hash: parent, left, right }); | ||
| } | ||
| level = nextLevel; | ||
| } | ||
|
|
||
| const chunksRoot = level[0]!; | ||
|
|
||
| const headers: Record<string, string> = { | ||
| "Content-Type": contentType, | ||
| "Content-Length": fileData.length.toString(), | ||
| }; | ||
| const headerLines = Object.entries(headers).map( | ||
| ([k, v]) => `${k.trim()}: ${v.trim()}`, | ||
| ); | ||
| headerLines.sort(); | ||
|
|
||
| const metaHash = metadataHash(headers); | ||
| const metaNode: TreeNode = { hash: metaHash, left: null, right: null }; | ||
| const combinedHash = nodeHash(chunksRoot.hash, metaNode.hash); | ||
| const combinedRoot: TreeNode = { | ||
| hash: combinedHash, | ||
| left: chunksRoot, | ||
| right: metaNode, | ||
| }; | ||
|
|
||
| const blobTree: BlobHashTreeJSON = { | ||
| tree_type: "DSBMTWH", | ||
| chunk_hashes: chunkHashes.map(hashToShaString), | ||
| tree: nodeToJSON(combinedRoot), | ||
| headers: headerLines, | ||
| }; | ||
|
|
||
| return { | ||
| chunks, | ||
| chunkHashes, | ||
| blobTree, | ||
| rootHash: hashToShaString(combinedRoot.hash), | ||
| }; |
There was a problem hiding this comment.
This new storage client implements hashing, certificate retrieval, and multipart upload logic, but there are currently no CLI tests exercising it (couldn’t find any references under cli/tests). Given the risk of subtle protocol/encoding regressions, consider adding unit tests for buildMerkleTree (root hash stability) and mocked-network tests for uploadBlob/downloadBlob error handling/retries.
Migrates the Mops registry from on-chain per-file storage (in dynamically spawned
Storagecanisters) to Caffeine Immutable Object Storage. Packages are now archived as.tar.gz, uploaded to the Caffeine gateway, and referenced by a single SHA-256 content hash stored on-chain. Existing on-chain packages remain fully accessible via the legacy download path.