Skip to content

pandas_to_eland - cannot write into index by alias name #747

@akerfx

Description

@akerfx

Hi all,

I followed this tutorial to rollover an index without data streams.

If I go step-by-step with DevTools directly in Kibana, then works fine.

Based on my use case, I have my data in a pandas DataFrame and want to use the pandas_to_eland() function to ingest the data into an aliased index. It seems that pandas_to_eland() can't work with an aliased index.

I tried two eland versions:
eland version - 8.15.3
eland version - 8.17.0
Elasticsearch Cloud version - 8.15.3

Steps to reproduce in DevTools

This manual flow in DevTools is working as expected (prerequisites for the step with eland)

PUT _ilm/policy/my-rollover-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "1GB",
            "max_age": "1m"
          }
        }
      }
    }
  }
}


PUT _index_template/my_template
{
  "index_patterns": [
    "my-index-*"
  ],
  "template": {
    "settings": {
      "index.lifecycle.name": "my-rollover-policy",
      "index.lifecycle.rollover_alias": "my-alias"
    }
  }
}

PUT /my-index-000001
{
  "aliases": {
    "my-alias": {
      "is_write_index": true
    }
  }
}

POST /my-alias/_doc
{
  "message": "Hello, World!",
  "@timestamp": "2024-01-01T12:00:00Z"
}

Error with pandas_to_eland()

After the index is configured and created with one document, I want to ingest a second document with pandas_to_eland:

import eland as ed
import pandas as pd

from elasticsearch import Elasticsearch
from elastic_transport import RequestsHttpNode

if __name__ == "__main__":

    es_client = Elasticsearch(
        cloud_id="ES_CLOUD_ID",
        api_key=("ES_API_ID", "ES_API_KEY"),
        node_class=RequestsHttpNode,
        request_timeout=60,
        max_retries=10,
        retry_on_timeout=True,
    )

    data = {
        "message": "Hello, World 2!",
        "@timestamp": "2025-01-01T12:00:00Z"
    }
    
    df = pd.DataFrame([data])

    print(df)

    ed.pandas_to_eland(
        pd_df=df,
        es_client=es_client,
        es_dest_index="my-alias",
        es_if_exists="append",
    )
Traceback (most recent call last):
  File "/fs075/sd19a/destech/devel/aklein/lib_cell_device_stats/trunk/src/eland_ingest_github.py", line 44, in <module>
    ed.pandas_to_eland(
  File "/fs075/sd19a/destech/devel/CAD_shared/python_environments/libs_statistics_tool/lib/python3.10/site-packages/eland/etl.py", line 180, in pandas_to_eland
    dest_mapping = es_client.indices.get_mapping(index=es_dest_index)[
  File "/fs075/sd19a/destech/devel/CAD_shared/python_environments/libs_statistics_tool/lib/python3.10/site-packages/elastic_transport/_response.py", line 186, in __getitem__
    return self.body[item]  # type: ignore[index]
KeyError: 'my-alias'

output of self.body of elastic_transport/_response.py, line 186

{'my-index-000002': {'mappings': {}}, 'my-index-000001': {'mappings': {'properties': {'@timestamp': {'type': 'date'}, 'message': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}

It seems that pandas_to_eland() expects the es_dest_index name my-alias in the response of self.body. The content of self.body shows that only the referenced alias indexes (my-index-000001, my-index-000002) are included and therefore a KeyError is raised for my-alias

Could anyone confirm that this behavior is wrong? Or advise me on what I'm doing wrong.

Thanks and best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttopic:dataframeIssue or PR about eland.DataFrame

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions