Add CSD

`CSD` stands for Compressed Sparse Dimensions. It is a multidimensional generalization of `CSR`, `CSC` and `COO`.

`CSD` has a few "compressed" dimensions and a few "uncompressed" dimensions. The "compressed" dimensions are linearized and go into `indptr` similar to CSR/CSC. The "uncompressed" dimensions (corresponding to `indices` in CSR/CSC, but there can be multiple) will be stored in `coords`.

The plan is to:

- Write `CSD`.
- Test it.
- Make `COO` and others inherit from it.

Task checklist:

Implement CSD (parametrize tests over compressed axes).:

- [ ] Element-wise
- [ ] Reductions
- [ ] Slicing
- [ ] Change compressed dimensions

Post-Implementation (parametrize tests over types):

- [ ] Make COO inherit from CSD
- [ ] Make CSR inherit from CSD
- [ ] Make CSC inherit from CSD

#  Compressed Sparse Dimensions — Format Specification
Compressed Sparse Dimensions (CSD) is an multidimensional sparse array storage format.

Let’s say that the number of dimensions is `ndim` and `compressed_axes` represents the axes/dimensions that are compressed, as a tuple. 

It contains three main arrays, one of them two-dimensional:
* `data` (one dimensional, `(nnz,)`)
* `coords`  (two-dimensional, `(ndim - len(compressed_axes), nnz)`)
* `indptr` (one dimensional `(reduce(operator.mul, (shape[a] for a in compressed_axes), 1) + 1,)`)
## `data`/`coords`/`indptr` from an `ndarray`
Relating to a normal NumPy array, we can specify `data` and `coords` directly. Let’s say `arr` is the Numpy array to convert.

```python
non_compressed_axes = tuple(a for a in range(ndim) if a not in set(compressed_axis))
coo_coords = np.where(arr)
data = arr[coo_coords]
axes_order = compressed_axes + non_compressed_axes
if axes_order != tuple(range(ndim)):
    lin_coo_coords = linear_loc(coo_coords[np.asarray(axes_order)], tuple(self.shape[a] for a in axes_order))
    idx = np.argsort(lin_coo_coords)
    data = data[idx]
    coords = coords[:, idx]

coords = coo_coords[np.asarray(non_compressed_axes)].astype(np.min_scalar_type(max(non_compressed_axes)))
compressed_linear_shape = reduce(operator.mul, (shape[a] for a in compressed_axes), 1)
compressed_linearized_coords = linear_loc(coo_coords[np.array(compressed_axes)], compressed_axes)
indptr = np.empty(compressed_linear_shape + 1, dtype=np.min_scalar_type(compressed_linear_shape))
indptr[0] = 0
np.cumsum(np.bincount(compressed_linearized_coords, minlength=compressed_linear_shape), out=indptr[1:])
```
## Interpreting the arrays
For CSR, the interpretation is simple: `indices` represents the `columns`, `data` the respective data. `indptr[i]:indptr[i+1]` represents the slice corresponding to row `i` for which `columns` and `data`  are valid.

We can extend this definition to CSD as well.  `coords` represent the non-compressed axes, and `data` the respective data. `indptr` can be interpreted the same way, replacing the word “row” with “linearized compressed axes index”, but it turns out that there is a simpler way to work with things.
### Alternative representation for `indptr`
Consider the following two arrays produced from `indptr`:
```python
starts = indptr[:-1].reshape((shape[a] for a in compressed_axes))
stops indptr[1:].reshape((shape[a] for a in compressed_axes))
```
Note that these are views, no copies are made. This gives a simpler and more intuitive interpretation for `indptr`. For every possible index corresponding to the compressed axes: Indexing into `starts` gives the starts for that index and `stops` gives the stops.
## The “multiple-COO” view
For every value in `starts`/`stops`, `coords[:, start_val:stop_val]` and `data[start_val:stop_val]` can be seen as a mini-COO array, and can thus have any COO-related algorithms applied to it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CSD #125

Compressed Sparse Dimensions — Format Specification

`data`/`coords`/`indptr` from an `ndarray`

Interpreting the arrays

Alternative representation for `indptr`

The “multiple-COO” view

14 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Add CSD #125

Description

Compressed Sparse Dimensions — Format Specification

data/coords/indptr from an ndarray

Interpreting the arrays

Alternative representation for indptr

The “multiple-COO” view

Activity

mrocklin commented on Apr 18, 2018

hameerabbasi commented on Apr 18, 2018

hameerabbasi commented on Apr 18, 2018

stefanv commented on Apr 18, 2018

hameerabbasi commented on Apr 18, 2018

hameerabbasi commented on Apr 18, 2018

stefanv commented on Apr 18, 2018

hameerabbasi commented on Apr 18, 2018

mrocklin commented on Apr 18, 2018

mrocklin commented on Apr 18, 2018

stefanv commented on Apr 19, 2018

14 remaining items

daletovar commented on Jun 10, 2019

hameerabbasi commented on Jun 11, 2019

daletovar commented on Jun 11, 2019

hameerabbasi commented on Jun 11, 2019

daletovar commented on Jun 11, 2019

ivirshup commented on Nov 13, 2019

daletovar commented on Nov 13, 2019

ivirshup commented on Nov 13, 2019

daletovar commented on Nov 13, 2019

luk-f-a commented on Jun 27, 2020

hameerabbasi commented on Jun 27, 2020

BorisTheBrave commented on Jan 1, 2021

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

`data`/`coords`/`indptr` from an `ndarray`

Alternative representation for `indptr`