-
Notifications
You must be signed in to change notification settings - Fork 111
Add documentation for stage extension API [CTT-830] [CTT-1103] #2087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
c4c6ca2
Add documentation for stage extension API [CTT-830]
k-jamroz c6342a3
Fix formatting, navigation
k-jamroz b59eb2c
Document mapUsingPutIfAbsent
k-jamroz ed0f18d
Add and fix example, fix ee label
k-jamroz 838a9ea
Apply suggestions from code review
k-jamroz ed7ba6e
Improve examples
k-jamroz d8a0bca
Merge branch 'main' into v5.7/stage-extension-api
k-jamroz 6ddb097
Update tutorial
k-jamroz 76362a5
Restore PythonTransforms examples
k-jamroz 52c129c
Merge branch 'main' into v5.7/stage-extension-api
k-jamroz 10a7013
Map map transforms to separate page
k-jamroz 05c64b6
Extension parameters, links
k-jamroz e5d65b6
Move reference, fix list
k-jamroz aaaf041
Add links to python extension code
k-jamroz 2020723
Fix
k-jamroz f41d6ff
Update docs/modules/pipelines/pages/map-transforms.adoc
k-jamroz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,202 @@ | ||
| = Create a Custom Stage Extension | ||
|
|
||
| It is possible to implement custom reusable xref:extensions.adoc[Pipeline API stage extension]. | ||
| This tutorial is based on `PythonExtension` available in Hazelcast and walks step-by-step | ||
| how it is created. | ||
|
|
||
| == Before you begin | ||
|
|
||
| `PythonExtension` already exists, but in order to recreate it in steps according to this tutorial, you will need: | ||
|
|
||
| 1. A Java project with dependency to `hazelcast` and `hazelcast-jet-python` modules | ||
| 2. Python in version compatible with Jet (see xref:transforms.adoc#_mapusingpython[mapUsingPython requirements]) | ||
| 3. A Python script with function to invoke | ||
|
|
||
| You can reuse setup from xref:python.adoc#_before_you_begin[Apply a Custom Transform with Python] tutorial. | ||
|
|
||
| == Step 1. Define supported stage types | ||
|
|
||
| First step is to decide to what stages the extension will be applicable. | ||
| It is done by implementing `StageExtension` interface from appropriate Stage class. | ||
| Most commonly `StreamStage` and `BatchStage` will be used and often it is relatively easy | ||
| to implement extension for both kinds of stages and it makes it more flexible. | ||
| In some cases you may want to also support `StreamStageWithKey` and `BatchStageWithKey`. | ||
|
|
||
| [source,java] | ||
| ---- | ||
| public interface PythonExtension extends | ||
| StreamStage.StageExtension<String, /*...*/>, | ||
| BatchStage.StageExtension<String, /*...*/> { | ||
| } | ||
| ---- | ||
|
|
||
| The Python extension supports both stream and batch stage. | ||
| Python transformation works only with `String`. | ||
| In more general cases this should be a generic type parameter. | ||
|
|
||
| == Step 2. Define API for the extension | ||
|
|
||
| Next you need to define the fluent API that will be available thanks to the extension. | ||
| It is recommended to use interfaces to clearly separate API from implementation. | ||
|
|
||
| Usually the extension will ultimately return to the original `StreamStage` and `BatchStage` API, | ||
| but possibly with changed item type eg. due to mapping performed via extension. | ||
|
|
||
| [source,java] | ||
| ---- | ||
| public interface PythonExtension extends | ||
| StreamStage.StageExtension<String, PythonExtension.PythonStage<StreamStage<String>>>, | ||
| BatchStage.StageExtension<String, PythonExtension.PythonStage<BatchStage<String>>> { // (1) | ||
|
|
||
| interface PythonStage<S extends GeneralStage<String>> { // (2) | ||
| S map(PythonServiceConfig cfg); // (3) | ||
| } | ||
| } | ||
| ---- | ||
|
|
||
| Initially the example Python extension will support only single `map` method *(3)*. | ||
| In order to handle both `StreamStage` and `BatchStage` in type-safe way we use generic `S` parameter *(2)* | ||
| so that `map` method returns correct stage type (stream or batch) for further chaining. | ||
|
|
||
| `StageExtension` needs to know the type that will be used as stage API for extensions, so it is configured in *(1)*. | ||
|
|
||
| == Step 3. Define entry point for the extension | ||
|
|
||
| Static parameterless method *(1)* is a recommended way to provide entry point to the extension for `stage.using()` invocation | ||
| as it can be statically imported and produces a readable, fluent syntax. | ||
|
|
||
| Note that the method provides implementations for different stage types *(2)* - Java compiler will choose appropriate variant automatically. | ||
| If there are generic types (e.g. stream item type), the compiler will infer them properly avoiding the need to specify them explicitly. | ||
|
|
||
| [source,java] | ||
| ---- | ||
| public interface PythonExtension extends | ||
| StreamStage.StageExtension<String, PythonExtension.PythonStage<StreamStage<String>>>, | ||
| BatchStage.StageExtension<String, PythonExtension.PythonStage<BatchStage<String>>> { | ||
|
|
||
| static PythonExtension python() { // (1) | ||
|
TomaszGaweda marked this conversation as resolved.
|
||
| return new PythonExtensionImpl(); // (2) | ||
| } | ||
| } | ||
|
|
||
| final class PythonExtensionImpl implements PythonExtension { | ||
| } | ||
| ---- | ||
|
|
||
| == Step 4. Implement the extension | ||
|
|
||
| There are two pieces left to be implemented: the extension class (`PythonExtensionImpl`) and the custom stage (`PythonStage`). | ||
|
|
||
| [source,java] | ||
| ---- | ||
| final class PythonExtensionImpl implements PythonExtension { | ||
|
|
||
| @Override | ||
| public PythonStage<StreamStage<String>> extend(StreamStage<String> streamStage) { // (1) | ||
| return new GeneralPythonStage<>(streamStage); | ||
| } | ||
|
|
||
| @Override | ||
| public PythonStage<BatchStage<String>> extend(BatchStage<String> batchStage) { // (2) | ||
| return new GeneralPythonStage<>(batchStage); | ||
| } | ||
| } | ||
|
|
||
| static class GeneralPythonStage<S extends GeneralStage<String>> implements PythonStage<S> { | ||
| private final S stage; | ||
|
|
||
| GeneralPythonStage(S stage) { | ||
| this.stage = stage; // (3) | ||
| } | ||
|
|
||
| @Override | ||
| public S map(PythonServiceConfig cfg) { | ||
| return (S) stage.mapUsingServiceAsyncBatched( // (4) | ||
| PythonService.factory(cfg), // (5) | ||
| PythonTransforms.DEFAULT_MAX_BATCH_SIZE, | ||
| PythonService::sendRequest) | ||
| .setName("mapUsingPython"); // (6) | ||
| } | ||
| } | ||
| ---- | ||
|
|
||
| Extension class is just an implementation of a visitor pattern *(1)* *(2)* that creates appropriate custom stage implementation. | ||
| It gets a reference to current stage in the pipeline which can be used to implement the extensions logic using existing infrastructure, | ||
| for example `mapUsingService`. | ||
|
|
||
| The custom stage remembers the `stage` *(3)* so it can be used to implement the `map` method. | ||
| `map` methods invokes standard `mapUsingServiceAsyncBatched` method *(4)* wiring extension-specific logic (service factory, mapping method) *(5)*. | ||
| We also provide a default stage name *(6)*. | ||
| The created stage is returned, so it is possible to use standard `Stage` methods like `setName` or `setLocalParallelism` to customize the just-created stage. | ||
|
|
||
| The result must be cast to `S` because in this context `mapUsingServiceAsyncBatched` returns `GeneralStage` *(4)*. | ||
|
k-jamroz marked this conversation as resolved.
|
||
| In case of the Python extension, functions will always return the same stage type as it was before applying the extension as `mapUsingServiceAsyncBatched` does not change `StreamStage` to `BatchStage`; item type will still be `String` too. | ||
| In general however, especially if the item type changes, you may need to handle `StreamStage` and `BatchStage` explicitly. | ||
|
Rob-Hazelcast marked this conversation as resolved.
|
||
| You can see how this can be implemented for example in `mapUsingIMap`: | ||
|
|
||
| 1. `GeneralStage` provides base method definitions, javadoc and some shared default implementations. Methods return `GeneralStage` | ||
| 2. `StreamStage` and `BatchStage` override the methods to return correct stage type, invoke base method and cast the result | ||
|
|
||
| == Step 5. Test the extension | ||
|
|
||
| Extensions can be tested by using them in a test pipeline. | ||
|
|
||
| [source,java] | ||
| ---- | ||
| import static com.hazelcast.jet.python.PythonExtension.python; | ||
|
|
||
| public void streamStage_mapUsingPython_extension() { | ||
| // Given | ||
| PythonServiceConfig cfg = new PythonServiceConfig() | ||
| .setBaseDir(baseDir.toString()) | ||
| .setHandlerModule("echo") | ||
| .setHandlerFunction("handle"); | ||
| List<String> items = IntStream.range(0, ITEM_COUNT).mapToObj(Integer::toString).collect(toList()); | ||
| Pipeline p = Pipeline.create(); | ||
| var stage = p.readFrom(TestSources.items(items)).addTimestamps(x -> 0, 0); | ||
|
|
||
| // When | ||
| var mapped = stage.using(python()) | ||
| .map(cfg) // (1) | ||
| .setName("python-echo") // (2) | ||
| .setLocalParallelism(2); | ||
|
|
||
| // Then | ||
| mapped.writeTo(AssertionSinks.assertAnyOrder( | ||
| "Python didn't map the items correctly", items.stream().map(i -> "echo-" + i).collect(toList()) | ||
| )); | ||
| instance().getJet().newJob(p).join(); | ||
| } | ||
| ---- | ||
|
|
||
| The extension invocation *(1)* is very simple: `stage.using(python()).map(cfg)`. | ||
|
|
||
| `map` returns one of the standard `Stage` interfaces, so you can customize it in the same way as with standard transforms *(2)*, by giving it a more specific name (`setName`) or configuring parallelism (`setLocalParallelism`). | ||
|
|
||
| == Step 6. Add more capabilities | ||
|
|
||
| Once the basic structure works, you can add more methods and capabilities. | ||
| You can leverage the fact that you control the return type, so any fluent API can be implemented as long as you ultimately return to one of the standard stages. | ||
|
|
||
| Python extension for example has a fluent builder API to prepare `PythonServiceConfig` on the fly: | ||
|
|
||
| [source,java] | ||
| ---- | ||
| StreamStage<String> sourceStage; | ||
| var mapped = sourceStage.using(python()) | ||
| .baseDir("/tmp") | ||
| .handlerModule("echo") | ||
| .handlerFunction("handle") | ||
| .maxBatchSize(1) | ||
| .map().setName("python-echo").setLocalParallelism(2); | ||
| ---- | ||
|
|
||
| == Summary | ||
|
|
||
| In this tutorial, you learned how to implement and use custom stage extension in Jet Pipeline API. | ||
|
|
||
| == Next steps | ||
|
|
||
| 1. Review the `PythonExtension` production-ready https://github.com/hazelcast/hazelcast/blob/master/extensions/python/src/main/java/com/hazelcast/jet/python/PythonExtension.java[API] and https://github.com/hazelcast/hazelcast/blob/master/extensions/python/src/main/java/com/hazelcast/jet/python/PythonExtensionImpl.java[implementation] in the Hazelcast repository | ||
| 2. Check other xref:extensions.adoc#_builtin_extensions[available extensions] | ||
| 3. Implement your own extension to simplify your pipeline, wrap a complex service or reuse some logic | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| = Stage Extensions | ||
|
|
||
| Stage extensions provide a way to add more capabilities to Pipeline API. | ||
| They maintain the familiar and convenient fluent API for defining the pipeline. | ||
| Extensions are applied in the pipeline by xref:transforms.adoc#_using[`using` transform]. | ||
|
|
||
| [#_builtin_extensions] | ||
| == Builtin extensions | ||
|
|
||
| There are some extensions available in the Hazelcast distribution. | ||
|
|
||
| .Extensions | ||
| |=== | ||
| |Extension | Module | Batch/Streaming | Description | ||
|
|
||
| |xref:python.adoc[PythonExtension] | ||
| |`hazelcast-jet-python` | ||
| |batch, streaming | ||
| |Fluent API for Python invocation | ||
|
|
||
| |xref:pipelines:map-transforms.adoc#_map_transforms[IMapExtension] ([.enterprise]*{enterprise-product-name}*) | ||
| |`hazelcast-enterprise` | ||
| |batch, streaming | ||
| |Convenience methods for using IMaps in pipeline transformations | ||
|
|
||
| |=== | ||
|
|
||
| == Custom extensions | ||
|
|
||
| It is possible to implement custom extensions. For a tutorial see xref:custom-extension.adoc[]. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| = Map Transforms | ||
|
|
||
| You can use IMap to perform transformations of the items in the pipeline, for example to lookup dictionary values or enrich the data. | ||
| The simplest xref:pipelines:transforms.adoc#_mapusingimap[`mapUsingIMap` transformation] is available in the stage API itself. | ||
|
|
||
| [#_map_transforms] | ||
| == IMapExtension ([.enterprise]*{enterprise-product-name}*) | ||
|
|
||
| More advanced transformations are provided in `IMapExtension`: | ||
|
|
||
| - `mapUsingPutIfAbsent`: performs `putIfAbsent` and provides previous value to custom mapping function | ||
|
|
||
| All these transformations provide `AT_LEAST_ONCE` processing guarantees. | ||
| They are not transactional, but rely on sources to retry events in case of failure. | ||
|
|
||
| It is possible to customize some parameters of the transformation: | ||
|
|
||
| - `maxConcurrentOps` - sets maximum number of concurrent async operations | ||
| - `doNotPreserveOrder` - disables preservation of ordering of the input items | ||
|
|
||
| These parameters must be configured before `mapUsingPutIfAbsent` is invoked. | ||
| Some standard stage parameters can also be configured for the created stage: | ||
|
|
||
| - `setName` - more meaningful stage name | ||
| - `setLocalParallelism` - local parallelism of the stage | ||
|
|
||
|
|
||
| The following example records first occurrence of a key from given bucket | ||
| and detects subsequent occurrences allowing to react differently to them. | ||
|
|
||
| [source,java] | ||
| ---- | ||
| import static com.hazelcast.jet.pipeline.IMapExtension.iMapExtension; | ||
|
|
||
| Pipeline p = Pipeline.create(); | ||
| p.readFrom(TestSources.items(1, 2, 3, 11, 12, 13, 21, 22, 23)) | ||
| // rebalance is used to guarantee order of items mapping to the same IMap key | ||
| .rebalance(e -> e % 10) | ||
| .using(iMapExtension()) | ||
| .mapUsingPutIfAbsent("some-map", | ||
| e -> e % 10, | ||
| e -> e, | ||
| Tuple3::tuple3) | ||
| .writeTo(Sinks.logger()); | ||
| ---- | ||
|
|
||
| It should produce output similar as below. The tuple contains in order: item, previous IMap value, value attempted to be inserted in `putIfAbsent`. | ||
| Notice that some entries appear in different order due to parallelism, but the invariant is preserved - only first occurrence updates the IMap. | ||
|
|
||
| [source] | ||
| ---- | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (2, null, 2) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (3, null, 3) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (12, 2, 12) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (13, 3, 13) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (22, 2, 22) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (23, 3, 23) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (1, null, 1) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (11, 1, 11) | ||
| [127.0.0.1]:5702 [dev] [0ee0-e20c-9700-0001/loggerSink#0] (21, 1, 21) | ||
|
|
||
| Final IMap contents: [1=1, 2=2, 3=3] | ||
| ---- | ||
|
|
||
| The following example demonstrates how to use all available settings: | ||
|
|
||
| [source,java] | ||
| ---- | ||
| import static com.hazelcast.jet.pipeline.IMapExtension.iMapExtension; | ||
|
|
||
| Pipeline p = Pipeline.create(); | ||
| p.readFrom(TestSources.items(1, 2, 3, 11, 12, 13, 21, 22, 23)) | ||
| .rebalance(e -> e % 10) | ||
| .using(iMapExtension()) | ||
| .doNotPreserveOrder() | ||
| .maxConcurrentOps(32) | ||
| .mapUsingPutIfAbsent("some-map", | ||
| e -> e % 10, | ||
| e -> e, | ||
| Tuple3::tuple3) | ||
| .setName("store-first-by-bucket") | ||
| .setLocalParallelism(8) | ||
| .writeTo(Sinks.logger()); | ||
| ---- |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.