-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
The Display
implementation for DataType
is pretty bad for some things, including:
- FixedSizeBinary
- Timestamp
- Struct (non-reversible / lossy)
- Union
- Dictionary
- Decimal*
- Map
- RunEndEncoded
We need to have a good overall design for these, and then implement it.
Design considerations
Readable
The output should be short and readable for common cases, e.g. List<nullable u8>
Reversable
We should be able to parse
back the original DataType
.
Open question: Should that also include meta-data on any embedded Field
:s?
Safe
Strings like field names need to be escaped to avoid string injection bugs (e.g. strings containing commas, quotes, newlines, …). We could consider omitting the quotes for the common case of "safe" strings ([_a-zA-Z][+a-zA-Z0-0]*
).
Consistent
We currently use parentheses for complex datatypes, e.g. List(Uint8)
and Struct("field": Uint8)
.
We can switch that for []
, {}
, or <>
, but I believe we should use the same thing for every type, i.e. NOT mix List<u8>
and Struct { … }
Familiar
We currently use the long names Uint8
, which is familiar to users of other Arrow libraries (e.g. py-arrow),
but we could consider using the shorter u8
, which is more familiar to Rust users.