Commit 4b8cbe2
authored
Add Decimal32 and Decimal64 support to arrow-avro Reader (apache#8255)
# Which issue does this PR close?
- Part of apache#4886
# Rationale for this change
Apache Avro’s `decimal` logical type annotates either `bytes` or `fixed`
and carries `precision` and `scale`. Implementations should reject
invalid combinations such as `scale > precision`, and the underlying
bytes are the two’s‑complement big‑endian representation of the unscaled
integer. On the Arrow side, Rust now exposes first‑class `Decimal32`,
`Decimal64`, `Decimal128`, and `Decimal256` data types with documented
maximum precisions (9, 18, 38, 76 respectively). Until now, `arrow-avro`
decoded all Avro decimals to 128/256‑bit Arrow decimals, even when a
narrower type would suffice.
# What changes are included in this PR?
**`arrow-avro/src/codec.rs`**
* Map `Codec::Decimal(precision, scale, _size)` to Arrow’s
`Decimal32`/`64`/`128`/`256` **by precision**, preferring the narrowest
type (≤9→32, ≤18→64, ≤38→128, otherwise 256).
* Strengthen decimal attribute parsing:
* Error if `scale > precision`.
* Error if `precision` exceeds Arrow’s maximum (Decimal256).
* If Avro uses `fixed`, check that declared `precision` fits the byte
width (≤4→max 9, ≤8→18, ≤16→38, ≤32→76).
* Update docstring of `Codec::Decimal` to mention `Decimal32`/`64`.
**`arrow-avro/src/reader/record.rs`**
* Add `Decoder::Decimal32` and `Decoder::Decimal64` variants with
corresponding builders (`Decimal32Builder`, `Decimal64Builder`).
* Builder selection:
* If Avro uses **fixed**: choose by size (≤4→Decimal32, ≤8→Decimal64,
≤16→Decimal128, ≤32→Decimal256).
* If Avro uses **bytes**: choose by declared precision (≤9/≤18/≤38/≤76).
* Implement decode paths that sign‑extend Avro’s two’s‑complement
payload to 4/8 bytes and append values to the new builders; update
`append_null`/`flush` for 32/64‑bit decimals.
**`arrow-avro/src/reader/mod.rs` (tests)**
* Expand `test_decimal` to assert that:
* bytes‑backed decimals with precision 4 map to `Decimal32`; precision
10 map to `Decimal64`;
* legacy fixed\[8] decimals map to `Decimal64`;
* fixed\[16] decimals map to `Decimal128`.
* Add a nulls path test for bytes‑backed `Decimal32`.
# Are these changes tested?
Yes. Unit tests under `arrow-avro/src/reader/mod.rs` construct expected
`Decimal32Array`/`Decimal64Array`/`Decimal128Array` with
`with_precision_and_scale`, and compare against batches decoded from
Avro files (including legacy fixed and bytes‑backed cases). The tests
also exercise small batch sizes to cover buffering paths; a new Avro
data file is added for higher‑width decimals.
New Avro test file details:
- test/data/int256_decimal.avro # bytes logicalType:
decimal(precision=76, scale=10)
- test/data/fixed256_decimal.avro # fixed[32] logicalType:
decimal(precision=76, scale=10)
- test/data/fixed_length_decimal_legacy_32.avro # fixed[4] logicalType:
decimal(precision=9, scale=2)
- test/data/int128_decimal.avro # bytes logicalType:
decimal(precision=38, scale=2)
These new Avro test files were created using this script:
https://gist.github.com/jecsand838/3890349bdb33082a3e8fdcae3257eef7
There is also an arrow-testing PR for these new files:
apache/arrow-testing#112
# Are there any user-facing changes?
N/A due to `arrow-avro` not being public.1 parent 911940f commit 4b8cbe2
File tree
9 files changed
+611
-142
lines changed- arrow-avro
- src
- reader
- test/data
9 files changed
+611
-142
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | | - | |
27 | | - | |
| 26 | + | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
388 | 388 | | |
389 | 389 | | |
390 | 390 | | |
391 | | - | |
| 391 | + | |
392 | 392 | | |
393 | 393 | | |
394 | 394 | | |
| |||
434 | 434 | | |
435 | 435 | | |
436 | 436 | | |
437 | | - | |
| 437 | + | |
438 | 438 | | |
439 | 439 | | |
440 | | - | |
441 | | - | |
442 | | - | |
443 | | - | |
444 | | - | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
445 | 458 | | |
446 | | - | |
447 | | - | |
448 | | - | |
449 | | - | |
450 | | - | |
451 | 459 | | |
452 | 460 | | |
453 | 461 | | |
| |||
493 | 501 | | |
494 | 502 | | |
495 | 503 | | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
496 | 527 | | |
497 | 528 | | |
498 | 529 | | |
| |||
516 | 547 | | |
517 | 548 | | |
518 | 549 | | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
519 | 578 | | |
520 | 579 | | |
521 | 580 | | |
| |||
734 | 793 | | |
735 | 794 | | |
736 | 795 | | |
737 | | - | |
| 796 | + | |
738 | 797 | | |
739 | 798 | | |
740 | 799 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
697 | 697 | | |
698 | 698 | | |
699 | 699 | | |
700 | | - | |
| 700 | + | |
701 | 701 | | |
702 | 702 | | |
703 | 703 | | |
| |||
2176 | 2176 | | |
2177 | 2177 | | |
2178 | 2178 | | |
2179 | | - | |
2180 | | - | |
2181 | | - | |
2182 | | - | |
2183 | | - | |
| 2179 | + | |
| 2180 | + | |
| 2181 | + | |
| 2182 | + | |
| 2183 | + | |
| 2184 | + | |
| 2185 | + | |
| 2186 | + | |
| 2187 | + | |
| 2188 | + | |
| 2189 | + | |
| 2190 | + | |
| 2191 | + | |
| 2192 | + | |
| 2193 | + | |
| 2194 | + | |
| 2195 | + | |
| 2196 | + | |
| 2197 | + | |
| 2198 | + | |
| 2199 | + | |
| 2200 | + | |
| 2201 | + | |
| 2202 | + | |
| 2203 | + | |
| 2204 | + | |
| 2205 | + | |
| 2206 | + | |
| 2207 | + | |
| 2208 | + | |
| 2209 | + | |
| 2210 | + | |
| 2211 | + | |
| 2212 | + | |
| 2213 | + | |
| 2214 | + | |
| 2215 | + | |
| 2216 | + | |
| 2217 | + | |
| 2218 | + | |
| 2219 | + | |
| 2220 | + | |
| 2221 | + | |
| 2222 | + | |
| 2223 | + | |
| 2224 | + | |
| 2225 | + | |
| 2226 | + | |
| 2227 | + | |
| 2228 | + | |
| 2229 | + | |
| 2230 | + | |
| 2231 | + | |
| 2232 | + | |
2184 | 2233 | | |
2185 | | - | |
2186 | | - | |
2187 | | - | |
| 2234 | + | |
| 2235 | + | |
| 2236 | + | |
| 2237 | + | |
| 2238 | + | |
| 2239 | + | |
| 2240 | + | |
| 2241 | + | |
| 2242 | + | |
| 2243 | + | |
| 2244 | + | |
| 2245 | + | |
| 2246 | + | |
| 2247 | + | |
| 2248 | + | |
| 2249 | + | |
| 2250 | + | |
| 2251 | + | |
| 2252 | + | |
| 2253 | + | |
| 2254 | + | |
| 2255 | + | |
| 2256 | + | |
| 2257 | + | |
| 2258 | + | |
| 2259 | + | |
| 2260 | + | |
| 2261 | + | |
| 2262 | + | |
| 2263 | + | |
| 2264 | + | |
| 2265 | + | |
| 2266 | + | |
| 2267 | + | |
| 2268 | + | |
| 2269 | + | |
| 2270 | + | |
| 2271 | + | |
| 2272 | + | |
| 2273 | + | |
| 2274 | + | |
| 2275 | + | |
| 2276 | + | |
| 2277 | + | |
| 2278 | + | |
| 2279 | + | |
| 2280 | + | |
| 2281 | + | |
| 2282 | + | |
| 2283 | + | |
| 2284 | + | |
| 2285 | + | |
| 2286 | + | |
| 2287 | + | |
| 2288 | + | |
| 2289 | + | |
| 2290 | + | |
2188 | 2291 | | |
2189 | | - | |
2190 | | - | |
2191 | | - | |
| 2292 | + | |
| 2293 | + | |
2192 | 2294 | | |
2193 | 2295 | | |
2194 | 2296 | | |
2195 | | - | |
2196 | | - | |
2197 | | - | |
| 2297 | + | |
| 2298 | + | |
| 2299 | + | |
2198 | 2300 | | |
2199 | | - | |
2200 | | - | |
| 2301 | + | |
2201 | 2302 | | |
2202 | 2303 | | |
2203 | | - | |
| 2304 | + | |
2204 | 2305 | | |
2205 | 2306 | | |
2206 | 2307 | | |
2207 | | - | |
2208 | | - | |
2209 | | - | |
| 2308 | + | |
| 2309 | + | |
2210 | 2310 | | |
2211 | 2311 | | |
2212 | 2312 | | |
| |||
0 commit comments