-
Notifications
You must be signed in to change notification settings - Fork 126
Generate data stream input configuration and documentation #2826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
With the create data-stream command, ask the user for inputs they will use, and populate the created manifest with the selected values.
Render data stream manifests with the complete and valid input streams, based on the data stream descriptor provided to the renderer.
Fix an error generating data stream manifest (by setting subojects default to true) and fix system test_runner to use new Stream struct.
The partition consumer waits up to a "receive count" or a "receive timeout", whichever comes first. | ||
|
||
Default is `100` messages. | ||
- cel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- cel: | |
- name: cel |
cmd/create_data_stream.go
Outdated
Prompt: &survey.MultiSelect{ | ||
Message: "Select input types which will be used in this data stream", | ||
Options: []string{ | ||
"aws-cloudwatch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UX question,
Do we want to present a more descriptive text to user?
That way we can add notes such as "deprecated" or additional context right when the user selects the input
ex:
etw -> Event Tracing for Windows
aws-s3 -> AWS S3 (S3 polling / SQS Notifications)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that too. It would be nice, but I don't think it's possible with the survey module we're using.
It might be nice to switch to another TUI module which can do more in the future, but that'll be a big job. For now I can add a link to the input documentation in the first prompt.
I think that's the best that can be done with this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to switch to another TUI module which can do more in the future, but that'll be a big job. For now I can add a link to the input documentation in the first prompt.
Yep, actually the one we are using now is archived 😕 But we haven't considered yet the effort to migrate to another one.
I think that's the best that can be done with this module.
Multiselect accepts a Description
function that can be used to give more details about each option. Not sure though how this is displayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general feedback/questions:
-
It feels a bit awkward without the changes to the agent/stream template since the manifest will align with what's there. I know we have that slotted for a subtask, but I feel it would make more sense to include all together. Unless we can guarantee they'll be merged together around the same time and will definitely make it into the same release
-
I appreciate that the user doesn't have to select an input, but then we get a pretty empty generated manifest file (and later no agent/stream files). I feel it could be beneficial if no input is selected, we fall back to what gets generated today with the currently existing logfile details
-
Not sure how possible this is, but I think it could also be great if the manifest sections could be broken out into their own files. The one big massive file feels less ideal for maintaining
-
Is there anything we can do sync up the package manifest with the new datastream details?
cmd/create_data_stream.go
Outdated
"gcp-pubsub", | ||
"gcs", | ||
"http_endpoint", | ||
"httpjson", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we drop httpjson since cel is now the preferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken out httpjson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see it when testing locally and in the PR?
I agree with that now. I was thinking that it'll be a big PR to have both changes in the same PR. But now that I'm working on the 2nd, a lot of the changes are closely related, and I think it makes sense to do them both at the same time. So I converted this PR to draft, and I'll keep working on it to do them both at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all the information on this PR we could create many input packages, that could be later reused as part of "composable" packages. Input packages would then become the source of truth for these settings and docs.
I am fine with adding something like this by now, till we have more input packages, and composable packages. But I am a bit concerned about the long-term maintenance of docs and variables.
cmd/create_data_stream.go
Outdated
Prompt: &survey.MultiSelect{ | ||
Message: "Select input types which will be used in this data stream", | ||
Options: []string{ | ||
"aws-cloudwatch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to switch to another TUI module which can do more in the future, but that'll be a big job. For now I can add a link to the input documentation in the first prompt.
Yep, actually the one we are using now is archived 😕 But we haven't considered yet the effort to migrate to another one.
I think that's the best that can be done with this module.
Multiselect accepts a Description
function that can be used to give more details about each option. Not sure though how this is displayed.
cmd/create_data_stream.go
Outdated
"redis", | ||
"tcp", | ||
"udp", | ||
"winlog", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is some source of truth that could allow us to keep this list automatically updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know of a source of truth that could be used for this. The inputs here were selected because they are the most-used. I think the idea of composable inputs could be good to become the source of truth. It would be good if that had some concept of deprecated/discouraged inputs, since there's now some inputs that we don't want to add new usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to keep these docs updated in packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to keep these list of variables updated with the actual inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about "composable" packages. I think that could be a better way to handle this. These files were added here mainly because there isn't currently a good way to get information on all inputs in a single way. If there was a common location to get information on all inputs that could be used instead.
Improving scalability of integration maintenance is something my team will work on more in the near future, and there's this ticket, which has a similar problem: https://github.com/elastic/integration-experience/issues/164. So we don't have a very solid plan to maintain this yet, but it's something we plan to improve very soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the
panw
integration rendered README looks like in Kibana when using this markdown in the "References" section:
@mjwolf is it possible to take advantage of the collapsable sections here with md now that we have support for that in kibana? I worry these big sections at the bottom will create unnecessary noise for the users when browsing the page
Kibana supports collapsable in the standard md format per elastic/kibana#223916 and we could add support to the docs page to identify that and collapse them as well (they don't support the standard md collapse, only a dropdown directive that they insert
Can we also have the top-level package manifest get when we add the datastream?
I've changed to use collapsible sections for each input's doc (and updated the screenshot above) |
f411f9b
to
d320f9e
Compare
cmd/create_data_stream.go
Outdated
switch value { | ||
case "aws-cloudwatch": | ||
return "AWS Cloudwatch" | ||
case "aws-s3": | ||
return "AWS S3" | ||
case "azure-blob-storage": | ||
return "Azure Blob Storage" | ||
case "azure-eventhub": | ||
return "Azure Eventhub" | ||
case "cel": | ||
return "Common Expression Language (CEL)" | ||
case "entity-analytics": | ||
return "Entity Analytics" | ||
case "etw": | ||
return "Event Tracing for Windows (ETW)" | ||
case "filestream": | ||
return "Filestream" | ||
case "gcp-pubsub": | ||
return "GCP PubSub" | ||
case "gcs": | ||
return "Google Cloud Storage (GCS)" | ||
case "http_endpoint": | ||
return "HTTP Endpoint" | ||
case "journald": | ||
return "Journald" | ||
case "netflow": | ||
return "Netflow" | ||
case "redis": | ||
return "Redis" | ||
case "tcp": | ||
return "TCP" | ||
case "udp": | ||
return "UDP" | ||
case "winlog": | ||
return "WinLogBeat" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure that we keep this list updated with the options of the multiselect, we could keep this info in a map, and use the ordered keys as options for the multiselect.
switch value { | |
case "aws-cloudwatch": | |
return "AWS Cloudwatch" | |
case "aws-s3": | |
return "AWS S3" | |
case "azure-blob-storage": | |
return "Azure Blob Storage" | |
case "azure-eventhub": | |
return "Azure Eventhub" | |
case "cel": | |
return "Common Expression Language (CEL)" | |
case "entity-analytics": | |
return "Entity Analytics" | |
case "etw": | |
return "Event Tracing for Windows (ETW)" | |
case "filestream": | |
return "Filestream" | |
case "gcp-pubsub": | |
return "GCP PubSub" | |
case "gcs": | |
return "Google Cloud Storage (GCS)" | |
case "http_endpoint": | |
return "HTTP Endpoint" | |
case "journald": | |
return "Journald" | |
case "netflow": | |
return "Netflow" | |
case "redis": | |
return "Redis" | |
case "tcp": | |
return "TCP" | |
case "udp": | |
return "UDP" | |
case "winlog": | |
return "WinLogBeat" | |
} | |
return inputOptions[value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's a good idea to use a map. I've done that, so there's now just one map for the names and descriptions, instead of the duplicated lists.
internal/docs/resources.go
Outdated
inputTcp, | ||
inputUdp, | ||
inputWinlog, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list is duplicated in internal/packages/archetype/resources.go
? Could we reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this list completely, and switched to embed.FS. It will now walk all yaml files in the embedded dir to find inputs, so maintenance should be a bit less now.
I also moved all static input files into the docs module (it had to be in docs to avoid a circular dependency), so now we also won't have to maintain two sets of yaml files.
internal/docs/resources.go
Outdated
|
||
// Input definitions | ||
|
||
//go:embed _static/inputs/aws-cloudwatch.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also possible to embed a whole directory. In case this helps, maybe when adding new inputs in the future.
//go:embed _static/inputs
var inputResources embed.FS
Changes are: * Move all static input docs into `docs` module * Use embed.FS instead of embedding individual files * Remove list of inputs; instead the file dir will be walked, removing the need to maintain this list. * In `cmd/create_data_stream.go`, use a map for input names and descriptions.
When a new data stream is created, add a simple policy to the main package manifest, to help users get started with the manifest.
@kgeller, I had updated this to add a new policy_template to the package manifest when a new data stream is created, but it then caused some unit tests to fail, so I took it out. I think there's some interaction with the package-spec that makes it a bit more complicated then I thought. What do you think of leaving this out of this PR, since it's already pretty big. We could look at updating the package spec later. (This is what I did to update the package manifest when a data stream is added: 05acf87) |
Thanks for giving it a shot, I think it's fine to leave for now as you mentioned, especially since that is today's current behavior anyway. Definitely feel free to consider it for phase two 😉
Do we have any issue/sub-task to track the support being also added to the docs? |
lines := strings.Split(s, "\n") | ||
indent := "" | ||
for range numSpaces { | ||
indent = indent + " " | ||
} | ||
for i, line := range lines { | ||
lines[i] = indent + line | ||
} | ||
return strings.Join(lines, "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. A scanner could be used to split the lines, and a string builder to append them. Something like this:
lines := strings.Split(s, "\n") | |
indent := "" | |
for range numSpaces { | |
indent = indent + " " | |
} | |
for i, line := range lines { | |
lines[i] = indent + line | |
} | |
return strings.Join(lines, "\n") | |
indent := strings.Repeat(" ", numSpaces) | |
var b strings.Builder | |
var s scanner.Scanner | |
s.Init(strings.NewReader(s)) | |
for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() { | |
fmt.Fprintln(&b, indent, s.TokenText()) | |
} | |
return b.String() |
description: Client secret used for OAuth2 authentication | ||
show_user: true | ||
required: false | ||
secret: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to create a data stream with the cel input, and some settings like secret
are not preserved 🤔
- name: oauth_secret
type: password
show_user: true
description: |-
Client secret used for OAuth2 authentication
In the case of secret
this leads to errors like:
variable identified as possible secret, secret parameter required to be set to true or false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Secret field wasn't in the updated manifest template, so it wasn't added into the resulting manifest, even though it was in the inputs yaml definition.
I've added secrets, and other missing fields to the manifest template now.
(Remove the spaces between curly braces when using) | ||
{ { inputDocs } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have to take a look at how to avoid this, but this can go in a follow up.
💚 Build Succeeded
History
|
This change introduces a new interactive prompt to the create data-stream command. When a user creates a data stream of type logs, they can now select from a list of common input types (e.g., aws-s3, gcp-pubsub, udp, tcp).
Based on the user's selection, the command now automatically populates the data_stream//manifest.yml with the corresponding input streams and their required variables. This streamlines the creation process for new log-based data streams by providing sensible defaults and reducing manual configuration.
This also updates the package README generation step to support a new function "inputDocs". When this is used in the readme template, all inputs used in the package will be found, and all will be listed in the readme. For inputs which have additional documentation added to elastic-package, the input documentation will also be inserted into the rendered readme.
Key changes include:
Added a multi-select prompt to the create data-stream command for choosing log input types.
Added
internal/packages/_static/inputs/*.yml
files, which contains information on each input type supported by the create command, with information on default name, title, description and variables.Introduced internal/packages/archetype/data_stream_inputs.go to manage the logic for populating input variables from the new static inputs yml files.
Updated the dataStream-manifest.yml.tmpl to dynamically render streams and their associated variables.
Refactored the packages.DataStreamManifest to use a new packages.Stream struct for better organization.
Add "inputDocs" to the readme render functions, which will find all inputs used in the package, and render documentation for them.
Add
docs/_static/inputs/*.yml
for keeping input documentation (this is separate from archetype yml file, to avoid a circular dependancy between the packages).How to test:
Create command
Run the elastic-package create data-stream command.
Select logs as the data stream type.
Select one or more input types from the list.
Verify that the generated data_stream//manifest.yml contains the correct streams configuration for the inputs you selected.
Build readme
Update a package readme to have the {{inputDocs}} function in the readme template
Build the package
Check the generated readme for the input documentation.