Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
murre		murre
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
testi.py		testi.py

Repository files navigation

🐶 Murre 🐕

The amazing Murre (genitive Murren 🐕) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). This repository is maintained by Mika Hämäläinen.

Installation

This library is designed for Python 3 and it may not work on Python 2.

pip3 install murre
python3 -m murre.download

Normalize

To normalize Finnish, all you need to do is to run:

from murre import normalize_sentence

normalize_sentence("mä syön paljo karkkii")
>> minä syön paljon karkkia

To use the same chunk level BRNN model as described in the paper, you can pass wnut19_model=True, however this model might only work on Linux.

You can normalize multiple sentences at the same time by running

from murre import normalize_sentences

sents = ["kissa syö karkkii", "jok laulaa tuol puole", "en tiiä oikee et kuka se o", "kyl on hölömöö"]
normalize_sentences(sents)
>> ['kissa syö karkkia', 'joka laulaa tuolla puolen', 'en tiedä oikein että kuka se on', 'kyllä on hölmöä']

Swedish

You can use the Swedish model by passing language=swe

from murre import normalize_sentence

normalize_sentence("int vet ja", language="swe")
>> inte vet jag

Generate

Murre can also generate different dialects. All you need to do, is to run:

from murre import dialectalize_sentence
dialectalize_sentence("kodin takana on koira", "Inkerinsuomalaismurteet")
>> 'kojin takan on koira'

Or for multiple sentences:

from murre import dialectalize_sentences
sents = ["kissa syö karkkia", "kädellä on perhonen", "kettu juoksee sutta karkuun"]
dialectalize_sentences(sents,'Kainuu')
>> ['kissa syöpi karkkia', 'käellä om perhonej', 'kettu juoksee sutta karkuu']

The list of available dialects can be obtained by:

from murre import supported_dialects
supported_dialects()
>> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'Etelä-Pohjanmaa', 'Etelä-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-Häme', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'Etelä-Karjala', 'Länsi-Uusimaa', 'Inkerinsuomalaismurteet', 'LäntinenKeski-Suomi', 'Länsi-Satakunta', 'Etelä-Savo', 'Länsipohja', 'Pohjois-Häme', 'EteläinenKeski-Suomi', 'Etelä-Häme', 'Peräpohjola']

Business solutions

Non-standard dialectal data can be a mess to deal with when you are processing your customer data. Let us help! Rootroo offers consulting related to text normalization. We have a strong academic background in the state-of-the-art AI solutions for every NLP need. Just contact us, we won't bite.

Cite

Normalization (Finnish)

Niko Partanen, Mika Hämäläinen, and Khalid Alnajjar. 2019. Dialect Text Normalization to Normative Standard Finnish. In the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT).

Normalization (Swedish)

Mika Hämäläinen, Niko Partanen and Khalid Alnajjar. Accepted. Normalization of Different Swedish Dialects Spoken in Finland. In the Proceedings of the 4th ACM SIGSPATIAL International Workshop on Geospatial Humanities.

Dialect generation

Hämäläinen, M., Partanen, N., Alnajjar, K., Rueter J. & Poibeau T. (2020). Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity. In Proceedings of the 11th International Conference on Computational Creativity. p. 204-211

Data

The data used in the paper describing dialect generation has been published on Zenodo .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐶 Murre 🐕

Installation

Normalize

Swedish

Generate

Business solutions

Cite

Data

About

Releases 2

Sponsor this project

Packages

Contributors 2

Languages

License

mikahama/murre

Folders and files

Latest commit

History

Repository files navigation

🐶 Murre 🐕

Installation

Normalize

Swedish

Generate

Business solutions

Cite

Data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Sponsor this project

Packages 0

Contributors 2

Languages

Packages