Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
.gitlab/merge_request_templates		.gitlab/merge_request_templates
fbk_dev		fbk_dev
src/subsonar		src/subsonar
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Repository files navigation

SONAR Subtitling Evaluator

Code to evaluate the quality of SRT files using the multilingual multimodal SONAR sentence embedding model.

The evaluation accounts for the semantic similarity (computed as a cosine similarity) between each subtitle block and the corresponding audio to which the block is assigned to (through the timestamps in the SRT). The returned scores range in [-1, 1] where the higher, the better.

Installation

Ensure that you have libsndfile installed in you environment. Then, run:

pip install SubSONAR

or, in the source root of this repository:

pip install -e .

The installation has been tested with python 3.8 and 3.10.

Usage

Example usage for Italian SRTs and English audios of two files (1 and 2):

subsonar \
  --srt-files 1.srt 2.srt \
  --audio-files 1.wav 2.wav \
  --text-lang ita_Latn --audio-lang eng \
  -bs 32

Please set the batch size bs according to your GPU capacity.

The available languages for the speech encoder (--audio-lang) can be found in the SONAR repository, while the text encoder (--text-lang) supports the 200 languages of NLLB.

License

SONAR Subtitling Evaluator is licensed under Apache Version 2.0.

However, the SONAR encoders have a dedicated license that can be found in their repository LICENSE. Please check the license for the encoders you are using.

Citation

If you find this project useful, please cite:

@inproceedings{gaido-et-al-2024-sbaam,
title = {{SBAAM! Eliminating Transcript Dependency in Automatic Subtitling}},
author = {Gaido, Marco and Papi, Sara and Negri, Matteo and Cettolo, Mauro and Bentivogli, Luisa},
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
address = "Bangkok, Thailand",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SONAR Subtitling Evaluator

Installation

Usage

License

Citation

About

Releases 1

Packages

Languages

License

hlt-mt/subsonar

Folders and files

Latest commit

History

Repository files navigation

SONAR Subtitling Evaluator

Installation

Usage

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages