Skip to content

Using the prover

Fibo Kowalsky edited this page Nov 16, 2020 · 3 revisions

You will need only SWI-Prolog to run the prover. Go to the main directory and run the following command in order to run the prover on the trial portion of the SICK dataset (included in ccg_sen_d):

swipl -f prolog/main.pl -l ccg_sen_d/SICK_trial_ccg.pl ccg_sen_d/SICK_trial_sen.pl

*_ccg.pl contains CCG derivations by the C&C CCG parser (Clark&Curran,2007) while *_sen.pl lists RTE problems. Choose SICK_trial_eccg.pl for the CCG derivations by EasyCCG (Lewis&Steedman). Now already in the Prolog environment, run

gentail(211).              % gui/graphical entailment check

This will trigger a tableau in the XPCE GUI for the SICK-211 problem to verify its gold answer. If you prefer an HTML representation of the proof, add the html parameter before gentail/1:

parList([html]), gentail(211).

This first creates xml/tableau-211-yes-no_align.xml and then xml/tableau-211-yes-no_align.html from the latter via xsltproc. The file names stand for "a tableau for problem-211 to verify entailment (i.e. yes) with no alignment techniques used". To reset the parameter list for the next run use parList([]). If you want to force the alignment of LLFs, run (because SICK-253 shows the effect of alignment well):

parList([]), gentail(align, 253).       % gentail(253) is the same as gentail(no_align, 253)

In the example, the phrase is on top of the mountain shared by the premise and the hypothesis is treated as an atomic term by the alignment technique.
If you want to get both tableaux for entailment (yes) and contradiction (no) checks, run

gentail_no_answer(211).    % or gentail_no_answer(align, 211) to force alignment of LLFs

Be aware not all IDs have corresponding problems in SICK-trial as it is just a portion of SICK. You can try: 4, 24, 105, 116, 119, 185, 197, ... In order to run the prover on several problems without GUI, by listing them or giving an interval, try:

entail_some([4, 24, 105, 116, 119, 185, 197]).

or equivalently

entail_some(1-200).

For each RTE problem, these commands will print the problem ID, the [gold answer], the prover's judgment and a tableau status (open/closed, terminated/limited, #rule applications). In the end, a confusion matrix with some stats is printed. Something like this:

?- entail_some(1-200).
   4:      [no],      no,  closed, 17         
  24: [unknown], unknown,    open, 'Ter',175  
 105: [unknown], unknown,    open, 'Ter',56   
 116: [unknown], unknown,    open, 'Ter',27   
 119: [unknown], unknown,    open, 'Ter',205  
 185: [unknown], unknown,    open, 'Ter',71   
 197: [unknown], unknown,    open, 'Ter',68   
------------------------------------------------------ 
 Gold\Prover       YES       NO       UNK       DEF
------------------------------------------------------ 
 ENTAILMENT:        0        0        0 (0)      0 
------------------------------------------------------ 
 CONTRADICTION:     0        1        0 (0)      0 
------------------------------------------------------ 
 NEUTRAL:           0        0        6 (0)      0 
------------------------------------------------------ 
Total #problems:  7
Accuracy (pure):  1.00000    (1.00000)
Precision:        1.00000
Recall (pure):    1.00000    (1.00000)
------------ STATS -------------
Total #clTabperPrb:            1
Total #ruleApps for clTab:     17
Average #ruleApps for clTab:   17.00000
true.

Run the prover on all the problems of SICK-trial by:

entail_all.

The above predcates entail_some/1 and entail_all/0 build tableaux both for aligned and non-aligned LLFs. If you want to terminate the run, type Ctrl+C and then press a for abort. If you have a multi-core CPU, you can use parallel in parList to prove problems in a concurrent way.

parList([parallel]), entail_all.    % similarly: parList([parallel]), entail_some(1-200). 

You can make theorem prover faster by building tableaux only for aligned LLFs (use parList([]) to cancel the parallel parameter asserted by the previous command):

parList([]), entail_all(align).   % similarly: entail_some(align, 1-200)    

or even setting the rule application limit (RAL) in order to stop a tableau after the rule application limit is reached. The default RAL is 400.

parList([ral(50)]), entail_all(align).   % or combine it with parallel proving

Optimal parameters

The theorem prover has several parameters (see main.pl). Part of these parameters control the actual proof procedure, For example:

  • rule application limit (ral/1),
  • wordnet relations (wn_ant/0, wn_sim/0, wn_der/0),
  • treatment of bare, plural and definite NPs (the/0, a2the/0, s2the/0, thE/0, noPl/0, noThe/0, noHOQ/0),
  • word senses filtering (ss/1),
  • rule efficiency criterion (effCr/1),
  • intersectivity of unknown noun modifiers (allInt/0),
  • strong alignment of LLFs (align all, aall/0),
  • check consistency of each sentence constchck/0.

To run the theorem prover with the optimal set of parameters for SICK (according to Abzianidze (2016b)), run:

parList([aall, wn_ant, wn_sim, wn_der, constchck, allInt, ral(800), waif('file_with_answers_inside.txt')]), entail_all.

For FraCaS run the command below. fracFilter guarantees that the problems with undefined gold answers are excluded.

parList([fracFilter, aall, wn_ant, ral(400), waif('fr_answers_all_sections.txt')]), entail_all.

wn_sim, wn_der, and constchck doesn't affect the answers. ral(800) is decreased to ral(400) as 400 rule applications seem enough for FraCaS problems. This decrease also makes the prover faster. allInt is omitted as it hurts the performance because FraCaS contains many non-intersective adjectives.
If you want to run the prover on certain sections of FraCas, e.g., 1, 2, 5 and 9, run:

parList([fracFilter, aall, wn_ant, ral(400), waif('fr_answers_sec1259.txt')]), findall(X, (sen_id(_,X,'h',_,_), (between(1,80,X); between(81,113,X);  between(197,219,X); between(334,346,X))), List), entail_some(List).

The command makes sure that the problem IDs fall in the specific sections, e.g., the Plurals section contains 81,...,113 problems.

Evaluating or combining answers of prover(s)

After the prover generates all the answers in a file passed by the waif/1 predicate, the evaluation can be done in the following way, depending on the dataset:

./perl/evaluate.pl  SICK_dataset/SICK_test_annotated.txt   file_with_answers_inside.txt  
./perl/evaluate.pl  FraCaS_dataset/FraCaS_gold.txt   file_with_answers_inside.txt  

It is also possible to combine two answer sets of the prover obtained based on different LLFs (e.g., resulted from different CCG parsers).

./perl/combine_two_classifiers.pl  SICK_dataset/SICK_test_annotated.txt  file_with_answers_inside_1.txt  file_with_answers_inside_2.txt

The script also creates the HYBRID_ANSWERS files that list the problemID-combinedAnswer pairs. Notice that if you have files with the answers for a part of a dataset, e.g., for the FraCaS sections 1 and 2, the evaluation and combining scripts will still work for them.

Clone this wiki locally