Skip to the content.
Evaluation
- Evaluation measures / scorer
- The performance of the predictors will be evaluated
- (for Tasks T1, T2, T4): in terms of the RAE (relative absolute error) and AE (absolute error) measures; only RAE will be used for the final ranking, though;
- (for Task T3): in terms of NMD (Normalized Match Distance), a special case of the Earth Mover’s Distance. We consider two variants of NMD: the classical mean NMD (MNMD), and the “macro-NMD”; only MNMD will be used for the final ranking, though.
The evaluation script that will be used to evaluate the participants’ submissions will be made available here by Feb 15 2024. Check Chapter 3 of this (open-access) book for a thorough discussion of RAE, AE, NMD, and their suitability to evaluating quantification systems.
- The test set will consist of a number of test samples (i.e., sets of documents), some of them characterized by prior probability shift (T1, T2, T3) or covariate shift (T4); this is done in order to test the robustness of the quantifier to predict class prevalence values under conditions different from those it has been trained on.
- Tools and baselines
- Required format for submissions