LandmarkNER - Identify Bavarian Landmarks in Text

This repo contains code to identify landmarks in subtitles from videos of Bayerischer Rundfunk (BR). To this end a custom Named Entity Recognition (NER) model was trained in spaCy. The model uses the pretrained German transformer pipeline (bert-base-german-cased) included in spaCy. An NER pipeline for de_dep_news_trf was created and fine-tuned on a corpus of annotated subtitles (annotated using prodigy). The initial subtitle files are not included in this repo.

Scripts

Create the corpus from subtitle .txt files
Create a pattern.jsonl file from a collection of landmark names
Create initial NER training data from corpus and patterns
Custom prodigy annotation recipce

Notebooks

Training notebook for fine-tuning of pretrained German transformer pipeline (bert-base-german-cased)
Multimodal landmark extraction notebook for combined analysis of video and text

Data

Pattern dictionary of landmark names
Base config for training of transformer ner

Usage

You can adapt the scripts to a custom NER label of your choice. Start by creating a corpus from .txt files. Then, create a pattern.jsonl file from a collection of example labels. From this, you create training data for an initial model. You train this initial model by using the spaCy command line interface. Next, you correct the initial model's predictions on the corpus with the custom prodigy annotation recipce to generate high-quality NER training data. Split into training data and validation data. With these data, you can fine-tune the German BERT model. For this project, the LandmarkNER model was fine-tuned in a notebook in GoogleColab, using the base config. Finally, create a test data corpus from fresh subtitles and fully annotate them with the standard prodigy recipe ner.manual. Evaluate your model on the test set.

Model

The trained LandmarkNER model is available on the Hugging Face hub.

To disambiguate detected entities to Wikipedia titles mGenre can be used. The resulting pipeline can be tested in an interactive Web App on Hugging Face Spaces.

PoC for multimodal video unterstanding

Furthermore, this model is used in a proof of concept for the extraction of the landmark names and frames of landmarks from a video, using time-code-associated descriptive texts or subtitles. OWL-ViT (an open-vocabulary object detection model) is used as a building detector. The text is analyzed with the LandmarkNER model, its output is disambiguated for Wikipedia titles by mGenre. For timecodes at which a building is detected in the video by OWL-ViT and the name of a landmark is detected in the text, the frames and the associated landmark names are extracted. The proof of concept notebook can be executed in Colab.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
data		data
notebooks		notebooks
recipes		recipes
scripts		scripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
LandmarkNER_Logo_freigestellt.png		LandmarkNER_Logo_freigestellt.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LandmarkNER - Identify Bavarian Landmarks in Text

Scripts

Notebooks

Data

Usage

Model

PoC for multimodal video unterstanding

License

About

Languages

License

ConstantinSchmidts/2022_bavarian_landmarks

Folders and files

Latest commit

History

Repository files navigation

LandmarkNER - Identify Bavarian Landmarks in Text

Scripts

Notebooks

Data

Usage

Model

PoC for multimodal video unterstanding

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages