Chronos: Some new API suggestions for `TSDataset`

### Why we need this change?
Poor performance of pandas, easier to use with fewer cascade calls.

### How we can modify?

1. roll+to_numpy:
This is a very classic combination that must be called almost every time. For users, they probably don't need to know what roll is, since we can probably just keep `to_numpy` or `numpy`. Btw, these changes should not affect the use of `to_torch_data_loader`.

```python
# API change
from bigdl.chronos.data import TSDataset
tsdata = TSDataset.from_pandas(..., lookback=48, horizon=1, with_split=False)
x, y = tsdata.to_numpy()  # like to_torch_data_loader
```

2. Optimize some existing APIs:
Perhaps too many cascade calls are not necessary, we can change some cascade calls to properties. Classified according to framework, with some usage given.

|Category|pandas|tsfresh|scikit-learn|other|
|--|--|--|--|--|
|Method|deduplicate/impute/resample|gen_dt_feature/gen_global_feature/gen_rolling_feature|scale/unscale/unscale_numpy|to_tf_dataset/to_numpy/to_torch_data_loader/to_pandas|
|Advice|Change to attributes|No change|Calling `scale` will change the source data, can we leave the original data unchanged so we don't need `unscale` and `unscale_numpy` either?|Merge roll(exclude to_pandas/to_torch_data_loader)|

```python
# Change pandas-related methods to attributes.
tsdata = TSDataset.from_pandas(..., impute=True, impute_mode="const",
                               const_num=0, deduplicate=True,
                               resample=True, interval='s', start_time=None,
                               end_time=None, merge_mode='mean', with_split=False)
```

3. We can use `Descriptor` and `Property` to manage properties and methods, more info, please refer to #5656.
```python
@property
def get_cycle_length(self):
    cycle_length = (...)
    return cycle_length

@get_cycle_length.setattr
def get_cycle_length(self, instance, value):
    # Check for illegal input
    if not isinstance(value, str):
        raise error
    return cycle_length

# Usage 
tsdataset.get_cycle_length = 'min'  # Set the mode of cycle_length.
```

4. Because of the poor performance of pandas, we can add `polars` as a new backend, `polars` has good parallel performance and supports the lazy API.
```python
tsdata = TSDataset.from_pandas(df, ..., use_polars=True)
```
`pandas` and `polars` performance comparison: https://h2oai.github.io/db-benchmark/
Differences between pandas and polars:
1. `polars` does not have indexes.
2. `groupby` can only return a single data column.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chronos: Some new API suggestions for `TSDataset` #6054

Why we need this change?

How we can modify?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Category	pandas	tsfresh	scikit-learn	other
Method	deduplicate/impute/resample	gen_dt_feature/gen_global_feature/gen_rolling_feature	scale/unscale/unscale_numpy	to_tf_dataset/to_numpy/to_torch_data_loader/to_pandas
Advice	Change to attributes	No change	Calling `scale` will change the source data, can we leave the original data unchanged so we don't need `unscale` and `unscale_numpy` either?	Merge roll(exclude to_pandas/to_torch_data_loader)

Chronos: Some new API suggestions for TSDataset #6054

Description

Why we need this change?

How we can modify?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Chronos: Some new API suggestions for `TSDataset` #6054