feat(cantor-multicloudj): add multi-cloud object storage module via multicloudj#172
Open
p-konduru wants to merge 1 commit into
Open
feat(cantor-multicloudj): add multi-cloud object storage module via multicloudj#172p-konduru wants to merge 1 commit into
p-konduru wants to merge 1 commit into
Conversation
|
Thanks for the contribution! Unfortunately we can't verify the commit author(s): p-konduru <p***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request. |
Closed
7 tasks
e9e1c82 to
f9a0527
Compare
…server integration New Maven module cantor-multicloudj that implements Cantor's Objects and Events interfaces on top of com.salesforce.multicloudj BucketClient. Supports AWS S3, Alibaba Cloud OSS, and GCP Cloud Storage through a single cloud-agnostic abstraction. Module (cantor-multicloudj): - CantorOnMulticloudj facade with BucketClient and convenience constructors - ObjectsOnMulticloudj: full Objects contract (store/get/delete/keys/stream) - EventsOnMulticloudj: buffer-and-flush with client-side filtering - MulticloudjUtils: shared helpers (listing, batched deletes, namespace trimming) - Security hardening: bounded downloads, restrictive buffer permissions, path traversal validation - 118 tests using blob-inmemory provider (no cloud credentials needed) - Module README with Known Limitations / Trade-offs Server integration (cantor-server): - CantorFactory wired with 'multicloudj' storage type - Full config support: provider, bucket, region, endpoint override, proxy, buffer directory - Default cantor-server.conf template - Provider runtimes (blob-aws, blob-ali, blob-gcp) as optional runtime deps
f9a0527 to
7a1a8e8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Introduces
cantor-multicloudj, a new Maven module that implements Cantor'sObjectsandEventsinterfaces on top ofcom.salesforce.multicloudjBucketClient. One codebase targets AWS S3, Alibaba Cloud OSS, and GCP Cloud Storage through a single cloud-agnostic abstraction, so deployments can switch backends by swapping the underlyingBucketClientwithout touching Cantor application code.This change is purely additive. The only edit outside the new module is registering it in the parent
pom.xml; no existing module's behavior changes.What's added
CantorOnMulticloudj— top-level facade exposingobjects()andevents(), withBucketClientand convenience constructors.ObjectsOnMulticloudj— full Objects contract:store,get,delete,keys,size, and streaming variants, backed by object-storage primitives.EventsOnMulticloudj— Events contract via a buffer-and-flush model with client-side filtering on metadata/dimensions; flushes buffered events before namespace expiry so in-flight writes are not lost.MulticloudjUtils— shared helpers for listing, batched deletes, and namespace key trimming.AbstractBaseMulticloudjNamespaceable— hoisted base class for shared namespacecreate/drop/existslogic and constants, de-duplicated across Objects and Events.README.mdwith usage, supported backends, and an explicit Known Limitations / Trade-offs section.Security hardening
0700)...-style escapes.Tests
blob-inmemorymulticloudj provider, so CI and local dev work offline.Known limitations / trade-offs
multicloudj'sBucketClientdoes not expose an S3 Select-style API across backends, so Events filtering is performed client-side after fetching the candidate blobs. This is the main perf trade-off versus a native S3 implementation and is called out in the module README.keys()should account for backend behavior.Test plan
mvn -pl cantor-multicloudj -am clean installbuilds cleanly from a fresh checkoutmvn -pl cantor-multicloudj testpasses all 118 tests with no cloud credentials configuredcantor-multicloudjappears as a module in the rootpom.xmland no other module's build output changesObjectsOnMulticloudjagainst theObjectsinterface contract (store/get/delete/keys/stream round-trip)EventsOnMulticloudjbuffer-and-flush behavior and client-side filtering on metadata + dimensions0700perms and rejects path-traversal inputsServer Integration
cantor-serveris now wired to construct acantor-multicloudjCantor at runtime based on config.CantorFactory— newmulticloudjbranch alongside the existings3/mysql/h2types. Readsprovider,bucket,regionfrom config; builds aBucketClientvia the multicloudj builder API; supports optionalendpoint.override(.withEndpoint(URI)), optional proxy (.withProxyEndpoint(URI)), and optionalbuffer.directoryforEventsOnMulticloudj. BecauseSetsis not implemented by this backend (matchescantor-s3), the factory sourcesSetsfrom another cantor type configured undermulticloudj.sets.type.Constants.java— 8 new keys under themulticloudjconfig namespace:provider,bucket,region,proxy.host,proxy.port,endpoint.override,buffer.directory,sets.type.cantor-server.conf— default template block:cantor-server/pom.xml— addscantor-multicloudjcompile dep, and declaresblob-aws/blob-ali/blob-gcp(multicloudj 0.4.0) asruntime+optionaldeps so deployers can choose which cloud provider(s) to ship.Usage
Set
cantor.storage.type = multicloudjincantor-server.conf, configure themulticloudj { ... }block, and ensure the matchingblob-<provider>runtime dependency is on the classpath (they are declaredoptionalin the server pom so consumers pick per-deployment).Server integration test plan
mvn -pl cantor-server -am compilesucceedscantor-serverwithstorage.type = multicloudjand a validmulticloudjconfig block produces a working Cantor that routes Objects/Events to the configured cloud provider and Sets to the type named inmulticloudj.sets.typeproviderorbucketwith clear error messagesendpoint.override, proxy, andbuffer.directoryare honored when present and skipped when absent