Convert the image of the formula to Latex.
Under the project root directory, run
python training/run_experiment.py --max_epochs=3 --gpus='0,' --num_workers=2 --model_class=ResnetTransformer --data_class=Im2Latex100K --batch_size=16
Use the wandb init
command to set up a new W&B project, so we can add --wandb
to record our experiments through the service provided by W&B.
python training/run_experiment.py --wandb --max_epochs=3 --gpus='0,' --num_workers=2 --model_class=ResnetTransformer --data_class=Im2Latex100K --batch_size=8
If you want to test your model you can add --overfit_batches
argument.
For more argument usage, you can refer to pytorch-lightning Trainer.
- CNNLSTM
- ResnetTransformer
- Im2Latex100K
Under the project root directory, run
python training/save_best_model.py --entity=zhengkai --project=im2latex --trained_data_class=Im2Latex100K
--entity
: your W&B user name--project
: your W&B project
Under the project root directory, run python im2latex/im2latex_inference.py <image_path>
, for example:
python im2latex/im2latex_inference.py im2latex/tests/support/im2latex_100k/7944775fc9.png
Under the project root directory, run
docker build -t im2latex/api-server -f api_server/Dockerfile .
If you want to rebuild the image, you can use the following command to remove the existing image.
docker rmi -f im2latex/api-server
Under the project root directory, run
docker run -p 60000:60000 -p 60001:60001 -it --rm --name im2latex-api im2latex/api-server
Then, we can use the model API through port 60000 and use the Streamlit App through port 60001.
If the container is already running, you can use the following command to remove the existing container.
docker rm -f im2latex-api
Under the project root directory, run
pytest -s ./im2latex/tests/test_im2latex_inference.py
Under the project root directory, run
pytest -s ./im2latex/evaluation/evaluate_im2latex_inference.py
Under the project root directory, run
pytest -s api_server/tests/test_app.py
You can try Image to LaTeX App on Hugging Face Space. But please note that for images other than the training data set, the model performance is still very poor.
- Small data can cause exposure bias. We may be able to mitigate exposure bias through scheduled sampling.
- Decoding algorithms can lead to different results. We might be able to use nucleus sampling instead of beam search to reduce the possibility of duplicate tokens.