Skip to the content.

Data

The dataset is now public and accessible via Zenodo!

The data on which the tested systems are required to operate consist of product reviews from the Amazon website, already converted in vector form.

In all the above cases, the predictor is thus asked to predict how many among the reviews in the test sample have a certain class label, and to do this for all the classes in the set.

This version of the dataset consits of the training and validation splits for all four tasks. By May 1 participants will be provided with the samples of the unlabelled (test) datapoints. In GitHub participants may find detailed information regarding the dataset format, along with useful functions (written in Python) allowing to easily read, and iterate over, the data samples. This repository also contains a format checker allowing to check that the format of a submission is correct, and the official evaluation script that will be used on test samples once these are released (and that can be used on the development samples in order to simulate the conditions of the evaluation phase).