Skip to content

Latest commit

 

History

History
74 lines (49 loc) · 3.46 KB

Setup.md

File metadata and controls

74 lines (49 loc) · 3.46 KB

Installation and setup

  • download the code
git clone https://github.com/Helsinki-NLP/OPUS-MT-train.git
  • make sure that you have pip (for Python libraries) and cpan (for Perl modules) available on your system. For `cpan you may need to setup local::lib to install locally in your user environment.
  • install pre-requisites (manually) or via submodules:
git submodule update --init --recursive --remote
make install

Prerequisites

The installation procedure should hopefully setup the necessary software for running the OPUS-MT recipes. Be aware that running the scripts does not work out of the box because many settings are adjusted for the local installations on our IT infrastructure at CSC. Here is an incomplete list of prerequisites needed for running a process:

Optional (recommended) software:

  • terashuf: efficiently shuffle massive data sets
  • pigz: multithreaded gzip
  • eflomal (needed for word alignment when transformer-align is used)
  • fast_align

Adjust environment setup

Environment variables are mostly specified in lib/env.mk. Adjust the settings to match your environment! You maye have to re-run make install after the adjustments or compile/install tools manually.

CSC users

OPUS-MT-train is developed to run on the CSC HPC infrastructure and supports puhti and mahti. There are some hard-coded settings that match our particular environment in our CSC project. You need to adjust those settings in lib/env/puhti.mk and lib/env/puhti.mk. This includes at least the paths to important tools such as Marian-NMT and others. You also need to set the CSC project identifier (CSCPROJECT) to match the project that you use for requestion billing units!

Mac OSX

  • for Marian-NMT: make sure that you have Xcode, protobuf and MKL installed. Protobuf can be added using, for example Mac ports:
sudo port install protobuf3-cpp

For MKL libraries, check https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library/choose-download.html

  • for eflomal: compile with gcc:
sudo port install gcc10
gcc-mp-10 -Ofast -march=native -Wall --std=gnu99 -Wno-unused-function -g -fopenmp -c eflomal.c
gcc-mp-10 -lm -lgomp -fopenmp  eflomal.o   -o eflomal
##
sudo port install llvm-devel py-cython py-numpy
sudo port select --set python python38
sudo port select --set python3 python38
sudo port select --set cython cython38
cd tools/efmoral
sudo env python3 setup.py install

Troubleshooting