-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Which part is this question about
arrow-rs/arrow-array/src/array/mod.rs
Lines 176 to 191 in 814ee42
| /// Returns the offset into the underlying data used by this array(-slice). | |
| /// Note that the underlying data can be shared by many arrays. | |
| /// This defaults to `0`. | |
| /// | |
| /// # Example: | |
| /// | |
| /// ``` | |
| /// use arrow_array::{Array, BooleanArray}; | |
| /// | |
| /// let array = BooleanArray::from(vec![false, false, true, true]); | |
| /// let array_slice = array.slice(1, 3); | |
| /// | |
| /// assert_eq!(array.offset(), 0); | |
| /// assert_eq!(array_slice.offset(), 1); | |
| /// ``` | |
| fn offset(&self) -> usize; |
Describe your question
What exactly is this supposed to represent, and what is the use case of this function?
If we consider a simple case, it might seem obvious, from the docstring:
arrow-rs/arrow-array/src/array/mod.rs
Lines 185 to 189 in 814ee42
| /// let array = BooleanArray::from(vec![false, false, true, true]); | |
| /// let array_slice = array.slice(1, 3); | |
| /// | |
| /// assert_eq!(array.offset(), 0); | |
| /// assert_eq!(array_slice.offset(), 1); |
- If we slice an array by an offset, calling
offseton the sliced array returns the offset; simple!
But if we look at primitive arrays, we don't even support this:
arrow-rs/arrow-array/src/array/primitive_array.rs
Lines 1229 to 1231 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
So regardless of whether a primitive array gets sliced, it will always say the offset is 0. We might consider this a bug to be fixed, but if we think about it more, which offset do we return? Technically a primitive array has two buffers: the values and null buffer. If we use slice this is trivial since we use the same offset for both. However, if we manually construct a primitive array by passing in the values and null buffers, but we pre-slice these by a different amount each, what does the offset become?
let values: ScalarBuffer<i64> = vec![1, 2, 3].into();
let nulls: NullBuffer = vec![true, true, true].into();
let values = values.slice(1, 1);
let nulls = nulls.slice(2, 1);
let array = Int64Array::new(values, Some(nulls));- What should the offset be?
We could sidestep this by just defining an offset to only be valid if preceded by a slice (so pre-slicing and then creating an array from the buffers is not considered slicing) but I feel this would be inconsistent.
Additional context
Arrays that implement offset
Run array
arrow-rs/arrow-array/src/array/run_array.rs
Lines 297 to 299 in 814ee42
| fn offset(&self) -> usize { | |
| self.run_ends.offset() | |
| } |
Boolean array
arrow-rs/arrow-array/src/array/boolean_array.rs
Lines 325 to 327 in 814ee42
| fn offset(&self) -> usize { | |
| self.values.offset() | |
| } |
Dictionary array
arrow-rs/arrow-array/src/array/dictionary_array.rs
Lines 734 to 736 in 814ee42
| fn offset(&self) -> usize { | |
| self.keys.offset() | |
| } |
- Just delegates to key array; but key array is always a primitive array, so this is essentially 0
Arrays that always leave offset as 0
Byte array
arrow-rs/arrow-array/src/array/byte_array.rs
Lines 502 to 504 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
List view array
arrow-rs/arrow-array/src/array/list_view_array.rs
Lines 456 to 458 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
Map array
arrow-rs/arrow-array/src/array/map_array.rs
Lines 401 to 403 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
List array
arrow-rs/arrow-array/src/array/list_array.rs
Lines 565 to 567 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
Struct array
arrow-rs/arrow-array/src/array/struct_array.rs
Lines 440 to 442 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
Fixed size binary array
arrow-rs/arrow-array/src/array/fixed_size_binary_array.rs
Lines 641 to 643 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
Null array
arrow-rs/arrow-array/src/array/null_array.rs
Lines 108 to 110 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
Fixed size list array
arrow-rs/arrow-array/src/array/fixed_size_list_array.rs
Lines 501 to 503 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
Union array
arrow-rs/arrow-array/src/array/union_array.rs
Lines 781 to 783 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |
Byte view array
arrow-rs/arrow-array/src/array/byte_view_array.rs
Lines 895 to 897 in 814ee42
| fn offset(&self) -> usize { | |
| 0 | |
| } |