eflomal crashes during filtering #63

yvesscherrer · 2023-05-31T08:35:41Z

Alignment model creation works fine, but during filtering Eflomal crashes with the following error message:

INFO:opusfilter.opusfilter:Running step 5: filter
20343327it [10:23, 32615.14it/s]
INFO:eflomal:Prepared 20343327 sentences for alignment
INFO:eflomal:Reading lexical priors...
INFO:eflomal:1618911 (of 2174631) pairs of lexical priors used
Traceback (most recent call last):
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/bin/opusfilter", line 31, in <module>
    of.execute_steps(overwrite=args.overwrite, last=args.last)
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/opusfilter/opusfilter.py", line 224, in execute_steps
    self._run_step(step, num + 1, overwrite)
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/opusfilter/opusfilter.py", line 289, in _run_step
    self.step_functions[step['type']](parameters, overwrite=overwrite)
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/opusfilter/opusfilter.py", line 96, in wrapper
    return self.parallelize(*args, **kwargs)
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/opusfilter/opusfilter.py", line 141, in parallelize
    self.func(obj, parameters, overwrite)
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/opusfilter/opusfilter.py", line 380, in filter_data
    for idx, pair in enumerate(pairs):
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/opusfilter/word_alignment.py", line 170, in _filtergen
    self.aligner.align(
  File "/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/eflomal/__init__.py", line 72, in align
    align(srcf.name, trgf.name,
  File "python/eflomal/eflomal.pyx", line 161, in eflomal.cython.align
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/eflomal/bin/eflomal', '-m', '3', '-s', '/tmp/tmpawsij1rg', '-t', '/tmp/tmpphsceo43', '-n', '3', '-N', '0.2', '-1', '2', '-q', '-2', '1', '-3', '2', '-F', '/tmp/tmpyamo5usj', '-R', '/tmp/tmps4d0ndvi', '-p', '/tmp/tmp18jxqkax']' died with <Signals.SIGKILL: 9>.

The Eflomal unittest (test_eflomal.py) runs fine:

/mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/eflomal/bin/eflomal -m 3 -s /tmp/tmpst1zbe0v -t /tmp/tmps4j5_0m8 -n 3 -N 0.2 -1 721 -2 721 -3 2887 -f /tmp/tmpf50or8p5 -r /tmp/tmp98dw6njz
Read texts (3 sentences): 0.000 s
Vocabulary sizes are 9 (source), 9 (target)
Created alignment structures: 0.000 s
Created alignment structures: 0.000 s
Randomized alignment: 0.002 s
Aligning with model 1 (721 iterations)
Randomized alignment: 0.000 s
Aligning with model 1 (721 iterations)
Done: 0.002 s
Aligning with model 2 (721 iterations)
Done: 0.002 s
Aligning with model 2 (721 iterations)
Done: 0.001 s
Aligning with model 3 (2887 iterations)
Done: 0.001 s
Aligning with model 3 (2887 iterations)
Done: 0.019 s
Final argmax iteration: 0.000 s
Writing alignments to /tmp/tmpf50or8p5 for 3 sentencess
Done: 0.019 s
Final argmax iteration: 0.000 s
Writing alignments to /tmp/tmp98dw6njz for 3 sentencess
./mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/eflomal/bin/eflomal -m 3 -s /tmp/tmpfpe3h_i5 -t /tmp/tmpqggbus3t -n 3 -N 0.2 -1 721 -2 721 -3 2887 -f /tmp/tmp4y0_3tw1 -r /tmp/tmpk5nynnwy -p /tmp/tmp4yygknic
Read texts (3 sentences): 0.000 s
Vocabulary sizes are 9 (source), 9 (target)
Created alignment structures: 0.000 s
Created alignment structures: 0.000 s
Randomized alignment: 0.001 s
Aligning with model 1 (721 iterations)
Randomized alignment: 0.001 s
Aligning with model 1 (721 iterations)
Done: 0.001 s
Aligning with model 2 (721 iterations)
Done: 0.002 s
Aligning with model 2 (721 iterations)
Done: 0.002 s
Aligning with model 3 (2887 iterations)
Done: 0.001 s
Aligning with model 3 (2887 iterations)
Done: 0.019 s
Final argmax iteration: 0.000 s
Writing alignments to /tmp/tmpk5nynnwy for 3 sentencess
Done: 0.019 s
Final argmax iteration: 0.000 s
Writing alignments to /tmp/tmp4y0_3tw1 for 3 sentencess
./mnt/c/Users/yvessche/work/americasnlp2023-st/myenv/lib/python3.8/site-packages/eflomal/bin/eflomal -m 3 -s /tmp/tmpdd0kzzqb -t /tmp/tmpex4wlj51 -n 3 -N 0.2 -1 721 -2 721 -3 2887 -f /tmp/tmpjxe0px3n -r /tmp/tmpu0jpju0y
Read texts (3 sentences): 0.000 s
Vocabulary sizes are 9 (source), 9 (target)
Created alignment structures: 0.000 s
Created alignment structures: 0.000 s
Randomized alignment: 0.002 s
Aligning with model 1 (721 iterations)
Randomized alignment: 0.002 s
Aligning with model 1 (721 iterations)
Done: 0.002 s
Aligning with model 2 (721 iterations)
Done: 0.003 s
Aligning with model 2 (721 iterations)
Done: 0.002 s
Aligning with model 3 (2887 iterations)
Done: 0.001 s
Aligning with model 3 (2887 iterations)
Done: 0.019 s
Final argmax iteration: 0.000 s
Writing alignments to /tmp/tmpu0jpju0y for 3 sentencess
Done: 0.019 s
Final argmax iteration: 0.000 s
Writing alignments to /tmp/tmpjxe0px3n for 3 sentencess
.
----------------------------------------------------------------------
Ran 3 tests in 0.182s

OK

The OpusFilter unit test also seems to run fine:

.........
----------------------------------------------------------------------
Ran 9 tests in 0.911s

OK

The text was updated successfully, but these errors were encountered:

svirpioj · 2023-06-21T10:43:29Z

It seems most probable that the process was killed due to exceeding memory limits. Eflomal is using a considerable amount of memory for large inputs, apparently growing linearly with the corpus size. For a corpus of 20 million sentence pairs, it used 10 gigabytes of memory.

Possible solutions:

Split the files to smaller subsets before filtering
If you use multiple filters, set WordAlignFilter as the last one (less data remaining)

The score step and filter with filterfalse=True automatically do chunking, but the normal filter does not. Maybe there should be an option for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eflomal crashes during filtering #63

eflomal crashes during filtering #63

yvesscherrer commented May 31, 2023

svirpioj commented Jun 21, 2023

eflomal crashes during filtering #63

eflomal crashes during filtering #63

Comments

yvesscherrer commented May 31, 2023

svirpioj commented Jun 21, 2023