Each subtask uses metrics well-suited to the nature of the prediction โ ranking for Subtask 1 and text similarity for Subtask 2.
๐
MRR@10
Subtask 1 โ Find It
Mean Reciprocal Rank at 10 measures how highly the correct original title is ranked among the top 10 candidates submitted by a system.
โ๏ธ
F1 + BERTScore
Subtask 2 โ Fix It
Mean token-level F1 Score measures lexical overlap, while BERTScore captures semantic similarity between the reconstructed and original title.
Metric Details
- MRR@10: The mean of reciprocal ranks of the correct title across all queries, considering only the top-10 ranked results.
- Mean F1 Score: Harmonic mean of token-level precision and recall between predicted and reference titles.
- BERTScore: Uses contextual BERT embeddings to measure semantic similarity, capturing meaning beyond surface-level token overlap.