Skip to content

What does Array::offset actually represent? #9068

@Jefffrey

Description

@Jefffrey

Which part is this question about

/// Returns the offset into the underlying data used by this array(-slice).
/// Note that the underlying data can be shared by many arrays.
/// This defaults to `0`.
///
/// # Example:
///
/// ```
/// use arrow_array::{Array, BooleanArray};
///
/// let array = BooleanArray::from(vec![false, false, true, true]);
/// let array_slice = array.slice(1, 3);
///
/// assert_eq!(array.offset(), 0);
/// assert_eq!(array_slice.offset(), 1);
/// ```
fn offset(&self) -> usize;

Describe your question

What exactly is this supposed to represent, and what is the use case of this function?

If we consider a simple case, it might seem obvious, from the docstring:

/// let array = BooleanArray::from(vec![false, false, true, true]);
/// let array_slice = array.slice(1, 3);
///
/// assert_eq!(array.offset(), 0);
/// assert_eq!(array_slice.offset(), 1);

  • If we slice an array by an offset, calling offset on the sliced array returns the offset; simple!

But if we look at primitive arrays, we don't even support this:

fn offset(&self) -> usize {
0
}

So regardless of whether a primitive array gets sliced, it will always say the offset is 0. We might consider this a bug to be fixed, but if we think about it more, which offset do we return? Technically a primitive array has two buffers: the values and null buffer. If we use slice this is trivial since we use the same offset for both. However, if we manually construct a primitive array by passing in the values and null buffers, but we pre-slice these by a different amount each, what does the offset become?

let values: ScalarBuffer<i64> = vec![1, 2, 3].into();
let nulls: NullBuffer = vec![true, true, true].into();

let values = values.slice(1, 1);
let nulls = nulls.slice(2, 1);

let array = Int64Array::new(values, Some(nulls));
  • What should the offset be?

We could sidestep this by just defining an offset to only be valid if preceded by a slice (so pre-slicing and then creating an array from the buffers is not considered slicing) but I feel this would be inconsistent.

Additional context

Arrays that implement offset

Run array

fn offset(&self) -> usize {
self.run_ends.offset()
}

Boolean array

fn offset(&self) -> usize {
self.values.offset()
}

Dictionary array

fn offset(&self) -> usize {
self.keys.offset()
}

  • Just delegates to key array; but key array is always a primitive array, so this is essentially 0

Arrays that always leave offset as 0

Byte array

fn offset(&self) -> usize {
0
}

List view array

fn offset(&self) -> usize {
0
}

Map array

fn offset(&self) -> usize {
0
}

List array

fn offset(&self) -> usize {
0
}

Struct array

fn offset(&self) -> usize {
0
}

Fixed size binary array

fn offset(&self) -> usize {
0
}

Null array

fn offset(&self) -> usize {
0
}

Fixed size list array

fn offset(&self) -> usize {
0
}

Union array

fn offset(&self) -> usize {
0
}

Byte view array

fn offset(&self) -> usize {
0
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions