Skip to content

Faster perspective flattening #22

@timkpaine

Description

@timkpaine

Right now to ingest into perspective we go GatewayStruct to JSON via to_json, then from json to plain python objects via orjson, then we flatten, then we dump back to jsonl for ingestion into pyarrow and finally from pyarrow into perspective. This is done to balance performance (pyarrow json loading and perspective ingestion of pyarrow are very fast, gateway struct to json is in C++ so also faster than flattening as structured objects, orjson very fast) with the fact that we want to flatten things. In increasing order of preference, we can:

  • enhance csp to flatten
  • try to emit record batches directly instead of flattening into json and moving back and forth between python objects
  • have perspective flatten for us

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions