Generate data stream input configuration and documentation #2826

mjwolf · 2025-08-14T18:34:21Z

This change introduces a new interactive prompt to the create data-stream command. When a user creates a data stream of type logs, they can now select from a list of common input types (e.g., aws-s3, gcp-pubsub, udp, tcp).

Based on the user's selection, the command now automatically populates the data_stream//manifest.yml with the corresponding input streams and their required variables. This streamlines the creation process for new log-based data streams by providing sensible defaults and reducing manual configuration.

This also updates the package README generation step to support a new function "inputDocs". When this is used in the readme template, all inputs used in the package will be found, and all will be listed in the readme. For inputs which have additional documentation added to elastic-package, the input documentation will also be inserted into the rendered readme.

Key changes include:

Added a multi-select prompt to the create data-stream command for choosing log input types.

Added internal/packages/_static/inputs/*.yml files, which contains information on each input type supported by the create command, with information on default name, title, description and variables.

Introduced internal/packages/archetype/data_stream_inputs.go to manage the logic for populating input variables from the new static inputs yml files.

Updated the dataStream-manifest.yml.tmpl to dynamically render streams and their associated variables.

Refactored the packages.DataStreamManifest to use a new packages.Stream struct for better organization.

Add "inputDocs" to the readme render functions, which will find all inputs used in the package, and render documentation for them.

Add docs/_static/inputs/*.yml for keeping input documentation (this is separate from archetype yml file, to avoid a circular dependancy between the packages).

How to test:

Create command
Run the elastic-package create data-stream command.

Select logs as the data stream type.

Select one or more input types from the list.

Verify that the generated data_stream//manifest.yml contains the correct streams configuration for the inputs you selected.

Build readme
Update a package readme to have the {{inputDocs}} function in the readme template

Build the package

Check the generated readme for the input documentation.

With the create data-stream command, ask the user for inputs they will use, and populate the created manifest with the selected values.

Render data stream manifests with the complete and valid input streams, based on the data stream descriptor provided to the renderer.

Fix an error generating data stream manifest (by setting subojects default to true) and fix system test_runner to use new Stream struct.

cmd/create_data_stream.go

vinit-chauhan · 2025-08-14T19:51:25Z

internal/packages/archetype/_static/input-variables.yml

+          The partition consumer waits up to a "receive count" or a "receive timeout", whichever comes first.
+
+          Default is `100` messages.
+  - cel:


Suggested change

- cel:

- name: cel

vinit-chauhan · 2025-08-15T14:42:54Z

cmd/create_data_stream.go

+				Prompt: &survey.MultiSelect{
+					Message: "Select input types which will be used in this data stream",
+					Options: []string{
+						"aws-cloudwatch",


UX question,
Do we want to present a more descriptive text to user?
That way we can add notes such as "deprecated" or additional context right when the user selects the input

ex:
etw -> Event Tracing for Windows
aws-s3 -> AWS S3 (S3 polling / SQS Notifications)

I thought about that too. It would be nice, but I don't think it's possible with the survey module we're using.

It might be nice to switch to another TUI module which can do more in the future, but that'll be a big job. For now I can add a link to the input documentation in the first prompt.

I think that's the best that can be done with this module.

It might be nice to switch to another TUI module which can do more in the future, but that'll be a big job. For now I can add a link to the input documentation in the first prompt.

Yep, actually the one we are using now is archived 😕 But we haven't considered yet the effort to migrate to another one.

I think that's the best that can be done with this module.

Multiselect accepts a Description function that can be used to give more details about each option. Not sure though how this is displayed.

Thanks, I didn't know about Description.

I've added that, and it shows the description next to each option:

kgeller

Some general feedback/questions:

It feels a bit awkward without the changes to the agent/stream template since the manifest will align with what's there. I know we have that slotted for a subtask, but I feel it would make more sense to include all together. Unless we can guarantee they'll be merged together around the same time and will definitely make it into the same release
I appreciate that the user doesn't have to select an input, but then we get a pretty empty generated manifest file (and later no agent/stream files). I feel it could be beneficial if no input is selected, we fall back to what gets generated today with the currently existing logfile details
Not sure how possible this is, but I think it could also be great if the manifest sections could be broken out into their own files. The one big massive file feels less ideal for maintaining
Is there anything we can do sync up the package manifest with the new datastream details?

kgeller · 2025-08-15T18:09:46Z

cmd/create_data_stream.go

+						"gcp-pubsub",
+						"gcs",
+						"http_endpoint",
+						"httpjson",


should we drop httpjson since cel is now the preferred?

I've taken out httpjson

I still see it when testing locally and in the PR?

cmd/create_data_stream.go

mjwolf · 2025-08-15T18:43:56Z

Some general feedback/questions:

It feels a bit awkward without the changes to the agent/stream template since the manifest will align with what's there. I know we have that slotted for a subtask, but I feel it would make more sense to include all together. Unless we can guarantee they'll be merged together around the same time and will definitely make it into the same release

I agree with that now. I was thinking that it'll be a big PR to have both changes in the same PR. But now that I'm working on the 2nd, a lot of the changes are closely related, and I think it makes sense to do them both at the same time.

So I converted this PR to draft, and I'll keep working on it to do them both at the same time.

jsoriano

With all the information on this PR we could create many input packages, that could be later reused as part of "composable" packages. Input packages would then become the source of truth for these settings and docs.

I am fine with adding something like this by now, till we have more input packages, and composable packages. But I am a bit concerned about the long-term maintenance of docs and variables.

cmd/create_data_stream.go

jsoriano · 2025-08-18T17:36:16Z

cmd/create_data_stream.go

+				Prompt: &survey.MultiSelect{
+					Message: "Select input types which will be used in this data stream",
+					Options: []string{
+						"aws-cloudwatch",


It might be nice to switch to another TUI module which can do more in the future, but that'll be a big job. For now I can add a link to the input documentation in the first prompt.

Yep, actually the one we are using now is archived 😕 But we haven't considered yet the effort to migrate to another one.

I think that's the best that can be done with this module.

Multiselect accepts a Description function that can be used to give more details about each option. Not sure though how this is displayed.

jsoriano · 2025-08-18T17:37:01Z

cmd/create_data_stream.go

+						"redis",
+						"tcp",
+						"udp",
+						"winlog",


Do you know if there is some source of truth that could allow us to keep this list automatically updated?

I don't know of a source of truth that could be used for this. The inputs here were selected because they are the most-used. I think the idea of composable inputs could be good to become the source of truth. It would be good if that had some concept of deprecated/discouraged inputs, since there's now some inputs that we don't want to add new usages.

jsoriano · 2025-08-18T17:40:35Z

internal/docs/_static/inputs/aws-cloudwatch.yml

Is there a plan to keep these docs updated in packages?

jsoriano · 2025-08-18T17:41:06Z

internal/packages/archetype/_static/inputs/aws-cloudwatch.yml

Is there a plan to keep these list of variables updated with the actual inputs?

I didn't know about "composable" packages. I think that could be a better way to handle this. These files were added here mainly because there isn't currently a good way to get information on all inputs in a single way. If there was a common location to get information on all inputs that could be used instead.

Improving scalability of integration maintenance is something my team will work on more in the near future, and there's this ticket, which has a similar problem: https://github.com/elastic/integration-experience/issues/164. So we don't have a very solid plan to maintain this yet, but it's something we plan to improve very soon.

mjwolf · 2025-08-18T18:32:56Z

This is what the panw integration rendered README looks like in Kibana when using this markdown in the "References" section:

### Inputs Used

These inputs can be used with this integration:

{{ inputDocs }}

(This isn't published in panw yet)

kgeller

This is what the panw integration rendered README looks like in Kibana when using this markdown in the "References" section:

@mjwolf is it possible to take advantage of the collapsable sections here with md now that we have support for that in kibana? I worry these big sections at the bottom will create unnecessary noise for the users when browsing the page

Kibana supports collapsable in the standard md format per elastic/kibana#223916 and we could add support to the docs page to identify that and collapse them as well (they don't support the standard md collapse, only a dropdown directive that they insert

Can we also have the top-level package manifest get when we add the datastream?

mjwolf · 2025-08-21T23:36:26Z

This is what the panw integration rendered README looks like in Kibana when using this markdown in the "References" section:

@mjwolf is it possible to take advantage of the collapsable sections here with md now that we have support for that in kibana? I worry these big sections at the bottom will create unnecessary noise for the users when browsing the page

I've changed to use collapsible sections for each input's doc (and updated the screenshot above)

jsoriano · 2025-08-22T11:34:13Z

cmd/create_data_stream.go

+						switch value {
+						case "aws-cloudwatch":
+							return "AWS Cloudwatch"
+						case "aws-s3":
+							return "AWS S3"
+						case "azure-blob-storage":
+							return "Azure Blob Storage"
+						case "azure-eventhub":
+							return "Azure Eventhub"
+						case "cel":
+							return "Common Expression Language (CEL)"
+						case "entity-analytics":
+							return "Entity Analytics"
+						case "etw":
+							return "Event Tracing for Windows (ETW)"
+						case "filestream":
+							return "Filestream"
+						case "gcp-pubsub":
+							return "GCP PubSub"
+						case "gcs":
+							return "Google Cloud Storage (GCS)"
+						case "http_endpoint":
+							return "HTTP Endpoint"
+						case "journald":
+							return "Journald"
+						case "netflow":
+							return "Netflow"
+						case "redis":
+							return "Redis"
+						case "tcp":
+							return "TCP"
+						case "udp":
+							return "UDP"
+						case "winlog":
+							return "WinLogBeat"
+						}


To ensure that we keep this list updated with the options of the multiselect, we could keep this info in a map, and use the ordered keys as options for the multiselect.

Suggested change

switch value {

case "aws-cloudwatch":

return "AWS Cloudwatch"

case "aws-s3":

return "AWS S3"

case "azure-blob-storage":

return "Azure Blob Storage"

case "azure-eventhub":

return "Azure Eventhub"

case "cel":

return "Common Expression Language (CEL)"

case "entity-analytics":

return "Entity Analytics"

case "etw":

return "Event Tracing for Windows (ETW)"

case "filestream":

return "Filestream"

case "gcp-pubsub":

return "GCP PubSub"

case "gcs":

return "Google Cloud Storage (GCS)"

case "http_endpoint":

return "HTTP Endpoint"

case "journald":

return "Journald"

case "netflow":

return "Netflow"

case "redis":

return "Redis"

case "tcp":

return "TCP"

case "udp":

return "UDP"

case "winlog":

return "WinLogBeat"

}

return inputOptions[value]

Thanks, it's a good idea to use a map. I've done that, so there's now just one map for the names and descriptions, instead of the duplicated lists.

jsoriano · 2025-08-22T11:36:29Z

internal/docs/resources.go

+	inputTcp,
+	inputUdp,
+	inputWinlog,
+}


This list is duplicated in internal/packages/archetype/resources.go? Could we reuse it?

I've removed this list completely, and switched to embed.FS. It will now walk all yaml files in the embedded dir to find inputs, so maintenance should be a bit less now.

I also moved all static input files into the docs module (it had to be in docs to avoid a circular dependency), so now we also won't have to maintain two sets of yaml files.

jsoriano · 2025-08-22T11:38:45Z

internal/docs/resources.go

+
+// Input definitions
+
+//go:embed _static/inputs/aws-cloudwatch.yml


Is it also possible to embed a whole directory. In case this helps, maybe when adding new inputs in the future.

//go:embed _static/inputs var inputResources embed.FS

…pdates

Changes are: * Move all static input docs into `docs` module * Use embed.FS instead of embedding individual files * Remove list of inputs; instead the file dir will be walked, removing the need to maintain this list. * In `cmd/create_data_stream.go`, use a map for input names and descriptions.

When a new data stream is created, add a simple policy to the main package manifest, to help users get started with the manifest.

mjwolf · 2025-08-25T04:47:37Z

Is there anything we can do sync up the package manifest with the new datastream details?

@kgeller, I had updated this to add a new policy_template to the package manifest when a new data stream is created, but it then caused some unit tests to fail, so I took it out. I think there's some interaction with the package-spec that makes it a bit more complicated then I thought. What do you think of leaving this out of this PR, since it's already pretty big.

We could look at updating the package spec later.

(This is what I did to update the package manifest when a data stream is added: 05acf87)

kgeller · 2025-08-25T15:24:56Z

@mjwolf

I had updated this to add a new policy_template to the package manifest when a new data stream is created, but it then caused some unit tests to fail, so I took it out. I think there's some interaction with the package-spec that makes it a bit more complicated then I thought. What do you think of leaving this out of this PR, since it's already pretty big.

We could look at updating the package spec later.

(This is what I did to update the package manifest when a data stream is added: 05acf87)

Thanks for giving it a shot, I think it's fine to leave for now as you mentioned, especially since that is today's current behavior anyway. Definitely feel free to consider it for phase two 😉

I've changed to use collapsible sections for each input's doc (and updated the screenshot above)

Do we have any issue/sub-task to track the support being also added to the docs?

jsoriano · 2025-08-26T11:40:08Z

internal/packages/archetype/archetype.go

+	lines := strings.Split(s, "\n")
+	indent := ""
+	for range numSpaces {
+		indent = indent + " "
+	}
+	for i, line := range lines {
+		lines[i] = indent + line
+	}
+	return strings.Join(lines, "\n")


Nit. A scanner could be used to split the lines, and a string builder to append them. Something like this:

Suggested change

lines := strings.Split(s, "\n")

indent := ""

for range numSpaces {

indent = indent + " "

}

for i, line := range lines {

lines[i] = indent + line

}

return strings.Join(lines, "\n")

indent := strings.Repeat(" ", numSpaces)

var b strings.Builder

var s scanner.Scanner

s.Init(strings.NewReader(s))

for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {

fmt.Fprintln(&b, indent, s.TokenText())

}

return b.String()

jsoriano · 2025-08-26T11:53:27Z

internal/docs/_static/inputs/cel.yml

+    description: Client secret used for OAuth2 authentication
+    show_user: true
+    required: false
+    secret: true


I tried to create a data stream with the cel input, and some settings like secret are not preserved 🤔

- name: oauth_secret type: password show_user: true description: |- Client secret used for OAuth2 authentication

In the case of secret this leads to errors like:

variable identified as possible secret, secret parameter required to be set to true or false

The Secret field wasn't in the updated manifest template, so it wasn't added into the resulting manifest, even though it was in the inputs yaml definition.

I've added secrets, and other missing fields to the manifest template now.

jsoriano · 2025-08-26T11:54:08Z

internal/packages/archetype/_static/package-docs-readme.md.tmpl

+(Remove the spaces between curly braces when using)
+{ { inputDocs } }


We still have to take a look at how to avoid this, but this can go in a follow up.

…pdates

elasticmachine · 2025-08-28T23:58:03Z

💚 Build Succeeded

Buildkite Build
Commit: 3b6483b

History

💚 Build #5963 succeeded 50f0b72
💔 Build #5962 failed c66ca35
💚 Build #5959 succeeded fc75add
💚 Build #5955 succeeded 5c747e8
💔 Build #5954 failed d320f9e
💔 Build #5953 failed f411f9b

mjwolf added 6 commits August 12, 2025 17:15

Create data-stream inputs with create command

229b8a7

With the create data-stream command, ask the user for inputs they will use, and populate the created manifest with the selected values.

Populate data stream descriptor with proper variable for inputs

236567e

Render data stream manifests with complete inputs

c2675c7

Render data stream manifests with the complete and valid input streams, based on the data stream descriptor provided to the renderer.

Fix errors

1c8d105

Fix an error generating data stream manifest (by setting subojects default to true) and fix system test_runner to use new Stream struct.

Add additional tests for data_stream_inputs

cfe4b6a

Remove doc_link

85f2e4c

mjwolf requested review from jsoriano, kgeller, vinit-chauhan and haetamoudi August 14, 2025 18:34

mjwolf added the enhancement New feature or request label Aug 14, 2025

mjwolf commented Aug 14, 2025

View reviewed changes

cmd/create_data_stream.go Outdated Show resolved Hide resolved

reformat

498cee7

vinit-chauhan reviewed Aug 15, 2025

View reviewed changes

kgeller reviewed Aug 15, 2025

View reviewed changes

mjwolf marked this pull request as draft August 15, 2025 18:41

mjwolf added 4 commits August 15, 2025 11:48

Add input documentation

fa005fd

Split input definitions into separate yml files

e895dd0

Update data-stream-inputs to work with separate files

69c6854

Move static documentation content into docs package

f72eeeb

jsoriano reviewed Aug 18, 2025

View reviewed changes

mjwolf added 2 commits August 18, 2025 10:53

Automatically render input documentation when building package readme

181d6e7

Do not render input docs in the archetype package

4deb7c4

Add AzureEventhub to input lists

4d0e01e

mjwolf changed the title ~~Allow selecting inputs for log data streams in data stream create wizard~~ Generate data stream input configuration and documentation Aug 18, 2025

Add descriptions to inputs in create data-stream command

eb5a251

mjwolf marked this pull request as ready for review August 18, 2025 23:49

kgeller reviewed Aug 20, 2025

View reviewed changes

Use collapsible sections for input docs, remove httpjson

d320f9e

mjwolf force-pushed the create-datastream-updates branch from f411f9b to d320f9e Compare August 21, 2025 23:41

Remove embedded httpjson doc

5c747e8

jsoriano reviewed Aug 22, 2025

View reviewed changes

mjwolf and others added 6 commits August 22, 2025 09:27

Merge remote-tracking branch 'upstream/main' into create-datastream-u…

ad06043

…pdates

Merge remote-tracking branch 'upstream/main' into create-datastream-u…

86af1ea

…pdates

Update package manifest with new data-stream policy

05acf87

When a new data stream is created, add a simple policy to the main package manifest, to help users get started with the manifest.

move static content in internal/docs dir

ab84d6f

Insert agent template_path into data stream manifest

c66ca35

Remove more unused code

50f0b72

mjwolf requested a review from vinit-chauhan August 25, 2025 14:16

added common agent code. added hbs files for {etw,redis}.

d9eaf8e

jsoriano reviewed Aug 26, 2025

View reviewed changes

mjwolf added 3 commits August 27, 2025 16:35

Merge remote-tracking branch 'upstream/main' into create-datastream-u…

7e69dd5

…pdates

Add additional fields to vars in data stream manifest

5c9bea9

Add additional fields to data stream manifest vars

3b6483b


		// Input definitions

		//go:embed _static/inputs/aws-cloudwatch.yml

		(Remove the spaces between curly braces when using)
		{ { inputDocs } }

Generate data stream input configuration and documentation #2826

Are you sure you want to change the base?

Generate data stream input configuration and documentation #2826

Conversation

mjwolf commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kgeller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mjwolf commented Aug 15, 2025

Uh oh!

jsoriano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjwolf commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kgeller left a comment

Choose a reason for hiding this comment

Uh oh!

mjwolf commented Aug 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjwolf Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjwolf commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kgeller commented Aug 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Aug 28, 2025

mjwolf commented Aug 14, 2025 •

edited

Loading

mjwolf commented Aug 18, 2025 •

edited

Loading

mjwolf Aug 22, 2025 •

edited

Loading

mjwolf commented Aug 25, 2025 •

edited

Loading