What does `Array::offset` actually represent?

**Which part is this question about**


https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/mod.rs#L176-L191

**Describe your question**


What exactly is this supposed to represent, and what is the use case of this function?

If we consider a simple case, it might seem obvious, from the docstring:

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/mod.rs#L185-L189

- If we slice an array by an offset, calling `offset` on the sliced array returns the offset; simple!

But if we look at primitive arrays, we don't even support this:

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/primitive_array.rs#L1229-L1231

So regardless of whether a primitive array gets sliced, it will always say the offset is 0. We might consider this a bug to be fixed, but if we think about it more, **which** offset do we return? Technically a primitive array has two buffers: the values and null buffer. If we use `slice` this is trivial since we use the same offset for both. However, if we manually construct a primitive array by passing in the values and null buffers, but we pre-slice these by a different amount each, what does the offset become?

```rust
let values: ScalarBuffer<i64> = vec![1, 2, 3].into();
let nulls: NullBuffer = vec![true, true, true].into();

let values = values.slice(1, 1);
let nulls = nulls.slice(2, 1);

let array = Int64Array::new(values, Some(nulls));
```

- What should the offset be?

We could sidestep this by just defining an offset to only be valid if preceded by a slice (so pre-slicing and then creating an array from the buffers is not considered slicing) but I feel this would be inconsistent.

**Additional context**


**Arrays that implement `offset`**

Run array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/run_array.rs#L297-L299

Boolean array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/boolean_array.rs#L325-L327

Dictionary array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/dictionary_array.rs#L734-L736

- Just delegates to key array; but key array is always a primitive array, so this is essentially 0

**Arrays that always leave `offset` as 0**

Byte array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/byte_array.rs#L502-L504

List view array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/list_view_array.rs#L456-L458

Map array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/map_array.rs#L401-L403

List array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/list_array.rs#L565-L567

Struct array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/struct_array.rs#L440-L442

Fixed size binary array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/fixed_size_binary_array.rs#L641-L643

Null array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/null_array.rs#L108-L110

Fixed size list array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/fixed_size_list_array.rs#L501-L503

Union array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/union_array.rs#L781-L783

Byte view array

https://github.com/apache/arrow-rs/blob/814ee4227c01fce478bdd3594dd156250286b46e/arrow-array/src/array/byte_view_array.rs#L895-L897

	/// Returns the offset into the underlying data used by this array(-slice).
	/// Note that the underlying data can be shared by many arrays.
	/// This defaults to `0`.
	///
	/// # Example:
	///
	/// ```
	/// use arrow_array::{Array, BooleanArray};
	///
	/// let array = BooleanArray::from(vec![false, false, true, true]);
	/// let array_slice = array.slice(1, 3);
	///
	/// assert_eq!(array.offset(), 0);
	/// assert_eq!(array_slice.offset(), 1);
	/// ```
	fn offset(&self) -> usize;

	/// let array = BooleanArray::from(vec![false, false, true, true]);
	/// let array_slice = array.slice(1, 3);
	///
	/// assert_eq!(array.offset(), 0);
	/// assert_eq!(array_slice.offset(), 1);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What does `Array::offset` actually represent? #9068

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	fn offset(&self) -> usize {
	0
	}

	fn offset(&self) -> usize {
	self.run_ends.offset()
	}

	fn offset(&self) -> usize {
	self.values.offset()
	}

	fn offset(&self) -> usize {
	self.keys.offset()
	}

What does Array::offset actually represent? #9068

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What does `Array::offset` actually represent? #9068