sigtrec_eval is a trec_eval python-wrapper that get the output, process, and present in a clear and configurable format.
sigtrec_eval uses numpy, scipy and pandas to compute the output. To install those dependencies:
pip install -r requirements.txt
usage: sigtrec_eval.py [-h] [-m M [M ...]] [-t [T]]
qrel baseline_result
[result_to_compare [result_to_compare ...]]
[-s [{ttest,welchttest,None} [{ttest,welchttest,None} ...]]]
[-f [{csv,html,json,latex,sql,string}]] [-o [O]]
positional arguments:
qrel qrel file in trec_eval format
baseline_result The baseline result to evaluate
result_to_compare The results to compare with the baseline
optional arguments:
-h, --help show this help message and exit
-m M [M ...] Evaluation method
-t [T] The trec_eval executor path
-s [{ttest,welchttest,None} [{ttest,welchttest,None} ...]]
Statistical test
-f [{csv,html,json,latex,sql,string}]
Output format
-o [O] Output file
The input format is based on trec_eval input format.
qrel is the ground-truth file, which consists of each line with tuples of the form:
qid iter docno rel
Read text tuples from trec_top_file of the form:
qid iter docno rank sim run_id
ex.: 030 Q0 ZF08-175-870 0 4238 prise1
For more information, including measures availables and their descriptions, visit trec_eval README.
Compute precision@10:
$ python3 sigtrec_eval.py example/qrelFile.qrel example/baseline example/result_to_compare1 -m P.10
Approach P_10
0 baseline 0.1960 bl
1 result_to_compare1 0.2071
Compute precision@10 and recall@10:
$ python3 sigtrec_eval.py example/qrelFile.qrel example/baseline example/result_to_compare1 -m P.10 recall.10
Approach P_10 recall_10
0 baseline 0.1960 bl 0.1669 bl
1 result_to_compare1 0.2071 0.1711
Using latex output format:
$ python3 sigtrec_eval.py example/qrelFile.qrel example/baseline example/result_to_compare1 -m ndcg_cut.10 map_cut.10 -f latex
\begin{tabular}{llll}
\toprule
{} & Approach & ndcg\_cut\_10 & map\_cut\_10 \\
\midrule
0 & baseline & 0.2482 bl & 0.0979 bl \\
1 & result\_to\_compare1 & 0.2231 & 0.0805 \\
\bottomrule
\end{tabular}
Generate t-studdent test:
$ python3 sigtrec_eval.py example/qrelFile.qrel example/baseline example/result_to_compare1 example/result_to_compare2 -m P.10 recip_rank -s ttest
Approach P_10 recip_rank
0 baseline 0.1960 bl 0.5467 bl
1 result_to_compare1 0.2071 ▲ 0.4004 ▼
2 result_to_compare2 0.0002 ▼ 0.0529 ▼
Save the output into a file:
$ python3 sigtrec_eval.py example/qrelFile.qrel example/baseline example/result_to_compare1 -m Rprec bpref -f html -o output.html
$ cat output.html
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Approach</th>
<th>Rprec</th>
<th>bpref</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>baseline</td>
<td>0.1832 bl</td>
<td>0.2418 bl</td>
</tr>
<tr>
<th>1</th>
<td>result_to_compare1</td>
<td>0.2031</td>
<td>0.2787</td>
</tr>
</tbody>
</table>