-
Notifications
You must be signed in to change notification settings - Fork 112
Description
Hi all,
I followed this tutorial to rollover an index without data streams.
If I go step-by-step with DevTools directly in Kibana, then works fine.
Based on my use case, I have my data in a pandas DataFrame and want to use the pandas_to_eland() function to ingest the data into an aliased index. It seems that pandas_to_eland() can't work with an aliased index.
I tried two eland versions:
eland version - 8.15.3
eland version - 8.17.0
Elasticsearch Cloud version - 8.15.3
Steps to reproduce in DevTools
This manual flow in DevTools is working as expected (prerequisites for the step with eland)
PUT _ilm/policy/my-rollover-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "1GB",
"max_age": "1m"
}
}
}
}
}
}
PUT _index_template/my_template
{
"index_patterns": [
"my-index-*"
],
"template": {
"settings": {
"index.lifecycle.name": "my-rollover-policy",
"index.lifecycle.rollover_alias": "my-alias"
}
}
}
PUT /my-index-000001
{
"aliases": {
"my-alias": {
"is_write_index": true
}
}
}
POST /my-alias/_doc
{
"message": "Hello, World!",
"@timestamp": "2024-01-01T12:00:00Z"
}
Error with pandas_to_eland()
After the index is configured and created with one document, I want to ingest a second document with pandas_to_eland:
import eland as ed
import pandas as pd
from elasticsearch import Elasticsearch
from elastic_transport import RequestsHttpNode
if __name__ == "__main__":
es_client = Elasticsearch(
cloud_id="ES_CLOUD_ID",
api_key=("ES_API_ID", "ES_API_KEY"),
node_class=RequestsHttpNode,
request_timeout=60,
max_retries=10,
retry_on_timeout=True,
)
data = {
"message": "Hello, World 2!",
"@timestamp": "2025-01-01T12:00:00Z"
}
df = pd.DataFrame([data])
print(df)
ed.pandas_to_eland(
pd_df=df,
es_client=es_client,
es_dest_index="my-alias",
es_if_exists="append",
)
Traceback (most recent call last):
File "/fs075/sd19a/destech/devel/aklein/lib_cell_device_stats/trunk/src/eland_ingest_github.py", line 44, in <module>
ed.pandas_to_eland(
File "/fs075/sd19a/destech/devel/CAD_shared/python_environments/libs_statistics_tool/lib/python3.10/site-packages/eland/etl.py", line 180, in pandas_to_eland
dest_mapping = es_client.indices.get_mapping(index=es_dest_index)[
File "/fs075/sd19a/destech/devel/CAD_shared/python_environments/libs_statistics_tool/lib/python3.10/site-packages/elastic_transport/_response.py", line 186, in __getitem__
return self.body[item] # type: ignore[index]
KeyError: 'my-alias'
output of self.body of elastic_transport/_response.py
, line 186
{'my-index-000002': {'mappings': {}}, 'my-index-000001': {'mappings': {'properties': {'@timestamp': {'type': 'date'}, 'message': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}
It seems that pandas_to_eland() expects the es_dest_index name my-alias
in the response of self.body. The content of self.body shows that only the referenced alias indexes (my-index-000001
, my-index-000002
) are included and therefore a KeyError is raised for my-alias
Could anyone confirm that this behavior is wrong? Or advise me on what I'm doing wrong.
Thanks and best regards