Skip to content

Commit f293de5

Browse files
authored
Update README.md
1 parent 140065b commit f293de5

File tree

1 file changed

+10
-12
lines changed

1 file changed

+10
-12
lines changed

README.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,52 +3,50 @@
33
This repository provides the code to construct the `HORIZON benchmark` — a large-scale, cross-domain benchmark built by refactoring the popular **Amazon-Reviews 2023 datase**t for evaluating sequential recommendation and user behavior modeling.
44
We do not release any new data; instead, we share reproducible scripts and guidelines to regenerate the benchmark, enabling rigorous evaluation of generalization across time, unseen users, and long user histories. The benchmark supports modern research needs by focusing on temporal robustness, out-of-distribution generalization, and long-horizon user modeling beyond next-item prediction.
55

6-
## 1. Transparency
7-
8-
### Overview
6+
## Overview
97

108
HORIZON is a benchmark for in-the-wild user modeling in the e-commerce domain. This repository provides the necessary code to load a publicly available dataset, process it to create a benchmark, and then run a diverse set of user modeling algorithms on the benchmark. The publicly available dataset was collected from amazon.com, likely representing users from the United States.
119

12-
### Objective
10+
## Objective
1311

1412
Our objective is to provide a standardized testbed for user modeling.
1513

16-
### Audience
14+
## Audience
1715

1816
HORIZON benchmark is intended for researchers, AI practitioners, and industry professionals who are interested in evaluating user modeling algorithms.
1917

20-
### Intended Uses
18+
## Intended Uses
2119

2220
HORIZON benchmark can be used as a standardized evaluation platform to evaluate performance of both existing and new algorithms. Our results may be most useful for settings involving products in similar categories to the dataset we used. For a list of these 33 categories, see https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023/blob/main/all_categories.txt.
2321

24-
### Out of Scope Uses
22+
## Out of Scope Uses
2523

2624
HORIZON benchmark is not intended to be used to circumvent any policies adopted by LLM providers.
2725

2826
The user modeling algorithms provided in HORIZON are for e-commerce product scenarios only and may not translate to other kinds of products or buying behavior.
2927

30-
### Evaluation
28+
## Evaluation
3129

3230
We have evaluated many state-of-the-art algorithms on the HORIZON benchmark. For details, please refer to the accompanying [Arxiv paper](TBD).
3331

34-
### Limitations
32+
## Limitations
3533

3634
- HORIZON provides an offline evaluation. In real-world applications, offline evaluation results may differ from online evaluation that involves deploying a user modeling algorithm.
3735

3836
- HORIZON benchmark contains only items in English language.
3937

4038
- The accuracy of HORIZON evaluation metrics for a real-world application depends on the diversity and representativeness of the underlying data.
4139

42-
### Usage
40+
## Usage
4341

4442
This project is primarily designed for research and experimental purposes. We strongly recommend conducting further testing and validation before considering its application in industrial or real-world scenarios.
4543

46-
### Feedback and Collaboration
44+
## Feedback and Collaboration
4745

4846
We welcome feedback and collaboration from our audience. If you have suggestions, questions, or would like to contribute to the project, please feel free to raise an [issue](https://github.com/microsoft/horizon-benchmark/issues) or add a [pull request](https://github.com/microsoft/horizon-benchmark/pulls)
4947

5048
---
51-
## 2. HORIZON Benchmark Construction
49+
## HORIZON Benchmark Construction
5250

5351
### a. Curating the Full Dataset:
5452
The scripts for constructing the `HORIZON` benchmark are provided in the `data` folder. Follow the following steps to reproduce the benchmark:

0 commit comments

Comments
 (0)