Skip to content

Add a new section for transitioning indices to data streams #2216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

yetanothertw
Copy link
Contributor

@yetanothertw yetanothertw commented Jul 22, 2025

The goal of this PR is to include the content from the Migrate from Indices to DataStreams knowledgebase article into our ILM-related documentation.

Initially, the suggestion was to include this new content as a new section on the Manage existing indices page. Upon further review of existing content, the Tutorial: Automate rollover seems like a more appropriate home for this content as it already includes two other equivalent use cases:

The reason I think these use cases are equivalent is because they're trying to use ILM policies to migrate from periodic indices to a more automated way to manage rollover and replace the need to schedule or script index creation (one option migrates from indices to data streams and the other one migrates to using aliases in order to manage their backing indices). The new content adds this use case that's equivalent in scope:

Fixes #1571

Copy link

github-actions bot commented Jul 22, 2025

@yetanothertw yetanothertw added the documentation Improvements or additions to documentation label Jul 23, 2025
@yetanothertw
Copy link
Contributor Author

I'm also wondering whether this tutorial is related to the goal we're trying to achieve in the Migrate from Indices to DataStreams knowledgebase article.
Is the overall goal to automate rollover for static data indexes with data streams? Is that why we'd be migrating from indices to data streams?

Also, applying some editiorial changes to blend in with the structure of the page
…ios described

Hopefully this helps someone decide which procedure to follow.
## Manage general content with data streams [manage-general-content-with-data-streams]

[Data streams](/manage-data/data-store/data-streams.md) are specifically designed for time series data.
If you want to manage general content (data without timestamps) with data streams, you can set up [ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md) to transform and enrich your general content at [ingest](/manage-data/ingest.md) time, so that you can transition from periodic indices to a data stream and get the benefits of time-based data management.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the final part of the sentence:

so that you can transition from periodic indices to a data stream and get the benefits of time-based data management. -> the word periodic feels really weird.

I understand that we propose to set up an ingest pipeline, which transforms and enrich the doc at ingestion time.... but the final part (which is the key) is not clear in my opinion. I would suggest something direct like:

by adding a timestamp field and get the benefits of time-based data management.

I'd add a warning note to tell users to double check that this makes sense and adds benefit to the customer user case (i'll share in private one use case example so you can consider adding something like that too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Edu! Instead of a warning I added an example for context to (hopefully) help users determine if this procedure fits their scenario/use case.


1. [Reindex with a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-reindex) to copy your documents from an existing index to the data stream you created.

1. [Roll over](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-roll-over) the reindexed data stream so that the lifecycle policy and ingest pipeline you created will be applied to new data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollover is supposed to happen automatically, I don't see the point of this step.
It would just prove to the user that rollover works, but if they have ILM setup to ensure rollover is done after 10GB this will cause an extra rollover.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. That's a good point, I'm not sure why this step was included in the knowledgebase tutorial, but I will reach out to Zoia to double-check with her.

Comment on lines 445 to 447
### Roll over the reindexed data stream [manage-general-content-with-data-streams-roll-over]

Use the [_rollover API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover) to create a new write index for the stream. This ensures that the lifecycle policy and ingest pipeline you've created will apply to any new documents that you index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have already created an index and we have (maybe) performed a reindex, everything should be in place. What's the point of a rollover API call?

The user just needs to send data to the data stream, the rollover API should be used manually when you want an exceptional / extra rollover for any reason. You can do it, definitely, it's harmless, but you will cause a rollover regardless if the backend index really needs to be rolled over based on the ILM settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, my plan is to check with Zoia on the intention of this step and then remove this guidance if the step is just checking that indices get rolled over.

I can mention that info in the procedure description, as an optional step and remove this section as there's no need to elaborate on it.

Copy link
Contributor

@eedugon eedugon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it feels great, just added some minor comments for your consideration, including the removal of one step.

@yetanothertw
Copy link
Contributor Author

Hi @kilfoyle, I've implemented most of the changes that Edu suggested (one item is pending, I'm checking with the author about the goal of the manual rollover step), so depending on that, I might need to remove that step.

Other than that, it should be ready for another peer review, whenever you get the chance. Please and thank you! 😃

P.S. The build seems to be broken now, but you should be able to use the preview regardless.

Copy link
Contributor

@kilfoyle kilfoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ⛵
Very nice work on this @yetanothertw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data lifecycle docs: Document migrating from indices to data streams
3 participants