Skip to content

Improve Display for DataType #8351

@emilk

Description

@emilk

The Display implementation for DataType is pretty bad for some things, including:

  • FixedSizeBinary
  • Timestamp
  • Struct (non-reversible / lossy)
  • Union
  • Dictionary
  • Decimal*
  • Map
  • RunEndEncoded

We need to have a good overall design for these, and then implement it.

Design considerations

Readable

The output should be short and readable for common cases, e.g. List<nullable u8>

Reversable

We should be able to parse back the original DataType.

Open question: Should that also include meta-data on any embedded Field:s?

Safe

Strings like field names need to be escaped to avoid string injection bugs (e.g. strings containing commas, quotes, newlines, …). We could consider omitting the quotes for the common case of "safe" strings ([_a-zA-Z][+a-zA-Z0-0]*).

Consistent

We currently use parentheses for complex datatypes, e.g. List(Uint8) and Struct("field": Uint8).
We can switch that for [], {}, or <>, but I believe we should use the same thing for every type, i.e. NOT mix List<u8> and Struct { … }

Familiar

We currently use the long names Uint8, which is familiar to users of other Arrow libraries (e.g. py-arrow),
but we could consider using the shorter u8, which is more familiar to Rust users.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions