Evaluation — FIFI 2026

Each subtask uses metrics well-suited to the nature of the prediction — ranking for Subtask 1 and text similarity for Subtask 2.

🔍

MRR@10

Mean Reciprocal Rank at 10 measures how highly the correct original title is ranked among the top 10 candidates submitted by a system.

✍️

F1 + BERTScore

Mean token-level F1 Score measures lexical overlap, while BERTScore captures semantic similarity between the reconstructed and original title.

Metric Details

MRR@10: The mean of reciprocal ranks of the correct title across all queries, considering only the top-10 ranked results.
Mean F1 Score: Harmonic mean of token-level precision and recall between predicted and reference titles.
BERTScore: Uses contextual BERT embeddings to measure semantic similarity, capturing meaning beyond surface-level token overlap.