PoemesProfonds is a project created to write automatically French poetry with neural networks. Two main neural networks are used to do so : a text-to-phoneme converter and a predictor that creates the best sequence of verses from a given single verse or a given sequence of verses.
- Requirements
- Text-to-phenomes converter
- Best sequence of verses
- Possible evolutions
- References
- License
- Python >= 3.5
- Tensorflow 2
- Keras >= 2.3.1
- Java 1.8
In order to get the phonemes out of the verses from the poetry, a text-to-phonemes converter for French needed to be developped. Some words, especially proper nouns, may appear in poetry but may not in any database. Thus, a model based on neural networks was created so every verse has its phonetic representation.
The data to train the model on was found in [1]. Instead of using the International Phonetic Alphabet (IPA), a substitution alphabet is used as in [1]. It is the one described below:
IPA symbol | k | p | l | t | ʁ | f | s | d | ʒ | n | b | v | g | m | z | ʃ | ɲ | ɳ | x |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Corresponding character | k | p | l | t | R | f | s | d | Z | n | b | v | g | m | z | S | N | G | x |
IPA symbol | a | ɛ | ɔ~ | j | o | ɔ | i | œ~ | ɛ~ | e | u | ɑ~ | ə | œ | w | y | ɥ | ø |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Corresponding character | a | E | § | j | o | O | i | 1 | 5 | e | u | @ | ° | 9 | w | y | 8 | 2 |
The model takes as input a word up to 25 letters long and returns a phonetic transcription up to 21 phonemes long.
The architecture of the model features an attention mechanism [2].
The model has a 99.76% accuracy on words it was not trained on. It seems to be the best (as of August 2020) in French.
A class Lecteur was developped to read the texts. The algorithm uses the phonemes in the data [1] thanks to a mapping in the dictionnary dico_u.
However, some words can have several pronounciations (i.e. "est" can be read /e/ or /ɛst/). The algorithm uses a dictionnary mapping the word and its part-of-speech (POS) to the phonemes. This mapping is stored in the dictionnary dico_m. The keys are a tuple (word, POS) (i.e. {("est", "AUX"): /e/} and {("est", "NOUN"): /ɛst/}).
Therefore, only words absent of these dictionnaries are read by the model.
import preprocessing as pp
from lecture import *
from keras.models import load_model
dico_u, dico_m, df_w2p = pd.read_pickle(os.path.join(".", "data", "dicos.pickle"))
ltr2idx, phon2idx, Tx, Ty = pp.chars2idx(df_w2p)
model_lire = load_model(os.path.join(".", "models", "lecteur", "CE1_T12_l10.h5"))
lecteur = Lecteur(Tx, Ty, ltr2idx, phon2idx, dico_u, dico_m, n_brnn1=90, n_h1=80, net=model_lire, blank="_")
There are three main methods in the class Lecteur.
- Method lire_nn returns a dictionnary mapping words to phonemes using only the neural network model.
>>> lecteur.lire_nn(["cheval", "abistouque"])
{'cheval': 'S°val', 'abistouque': 'abistuk'}
- Method lire_mots uses words' POS and the dictionnaries dico_u and dico_m besides the model to read words. It returns a list containing the phonetic transcriptions of the words.
>>> lecteur.lire_mots(["cheval", "abistouque"])
['S°val', 'abistuk']
- Method lire_vers features the French liaisons. The POS is considered while applying the liaison or not. For instance, with les enfants ouvrent there is no liaison between enfants (noun) and ouvrent (verb). The POS-tagger used is StanfordPOSTagger [6]. Because of it, only a single sentence should be input to the method.
Numbers can also be read thanks to a script broadly inspired by [3].
As this text-to-phonemes converter was developped to read French poetry, the phoneme /ə/ is added when a word ends with a consonant sound followed by a mute e (except at the end of a verse). This was added thanks to the functions e_final and e_final_tokens used in the method lire_vers. These /ə/ are neither present in the dictionnaries dico_u and dico_m nor in the model.
>>> lecteur.lire_vers("Les trains arrivent en gare de Jarlitude, voies 14 et 97.")
'letR5aRiv°t@gaR°d°ZaRlityd°vwakatORzekatR°v5disEt'
This project was inspired by [4]. The aim of this project is to get a realistic sequence of verses with a neural network. This model scans a sequence of previous verses to get the most likely verse to continue this sequence. Some verses are candidates to be chosen as the best sequel. The model predicts a score for each candidate. Here, it is the probability of the verse to be the sequence's sequel.
Unlike in [4], the neural network reckons a verse as a tuple reprensenting its phonemes (got thanks to the text-to-phonemes converter) and its FastText embedding. FastText [5] is a word embedding representation which can derive a unique vector for a sentence. It also considers the punctuation and it is case-sensitive. This allows the model to make more realistic predictions as it considers these elements.
As the neural network creates a sequences of verses, these verses need to be stored in a data frame. Each observation is a verse. For the training, the verses should be ordered in the data frame like in the poems. The verse line is in its original poem right before the verse line and right after the one line .
The data frame needs at least these four variables (their names can be changed while creating an instance of Chercheur2Vers):
- vers: raw text of the verse
- phonemes: phonemes of the verse
- id: identifying number of the poem from which the verse is from. (Only needed for training)
- vect: FastText representation of the verse. The models are built to use 300-dimension vectors.
This is an example of the data:
vers | phonemes | id | vect |
---|---|---|---|
Que les parfums légers de ton air embaumé, | k°lepaRf1leZed°t§nER@bome | 23 | (0.001, ..., 0.03) |
Que tout ce qu'on entend, l'on voit ou l'on respire, | k°tus2k§n@t@l§vwaul§REspiR | 23 | (0.2, ..., 0.004) |
Tout dise : Ils ont aimé ! | tudizilz§teme | 23 | (0.052, ..., 0.14) |
Comme je descendais des Fleuves impassibles, | kOm°Z°des@dEdefl9v°z5pasibl | 55 | (0.092, ..., 0.74) |
Je ne me sentis plus guidé par les haleurs : | Z°n°m°s@tiplygidepaRleal9R | 55 | (0.0001, ..., 0.02) |
Des Peaux-Rouges criards les avaient pris pour cibles | depoRuZ°kRijaRleavEpRipuRsibl | 55 | (0.096, ..., 0.032) |
In order to have a huge amount of verses, the idea was to get French classical plays. They were got from the website Théâtre classique [8] through XML files format.
The aim of the neural network is to compute, for a set of verses, the probability that a verse is the real one following this set.
These verses are modeled by two different matrixes:
- : one hot encoding representation of the phonemes of the verses. Its shape is . is the size of the longest string of phonemes among the data. is the number of characters of the phonemes alphabet (38 with the alphabet used here). One is added to as the matrix holds the representations of the verses of the set and the one of the verse for which we want to compute the probability. One is also added to as there is always a symbol for a blank.
- : stack of FastText representations of the verses. Its shape is with the dimension of the FastText vectors.
A gated recurrent unit (GRU) layer is used to compute an embedding for each verse's phonemes. This type of layer was chosen because it considers the order of the phonemes to create the embedding. GRU layers have proven to be efficient while having fewer parameters to train than long short-term memory (LSTM) layers. Its activation is the hyperbolic tangent. The layer turns every of the rows of the matrix from a -shaped matrix to a -dimension vector. This layer creates a -shaped matrix. The last row is extracted from this matrix. So that a new -shaped matrix , representing the set of verses, and a -dimension vector, reprensenting the candidate verse, are created.
A LSTM layer is applied on the matrix . It creates an embedding of the set of verses for which the model derives the best sequel. This embedding is a -dimension vector. LSTM was chosen over GRU as it has an additionnal cell state vector which can be useful while creating an embedding considering better enclosing rhymes (rimes croisées and rimes embrassées). Once the set of verses is embedded, it has to be concatenated to the embedding of the verse which is a candidate for the sequel. This concatenation returns a -dimension vector which represents the set of verses and the candidate verse.
A fully-connected layer with units is applied to the output of the previous concatenation. Another fully-connected layer with units derives the ultimate embedding of the phonemes of the verses and the candidate verse. A leaky rectified linear unit with α = 0.2 is used as the activation function for these fully-connected layers. These layers are regularized with a 10% dropout. A batch normalization is also applied on these fully-connected layers to avoid vanishing or exploding gradients.
The only architectures producing realistic results, were the symetric ones. That is to say that for both the phonemes and the FastText representation sides, the output of each layer needs to be of the same size. Thus, each verse's FastText representation needs to be turned into a -dimension vector like the phoneme embedding. Therefore, the matrix is turned into a shaped matrix. A fully-connected layer with units derives the embedding of the same size as the one of the phonemes. The output of the layer is normalized in order not to have some FastText representations to activate more the next layers' units and thus to be more likely to be picked as a sequel, even though they are not the most realistic ones. Indeed, shorter verses seem to have FastText representations with a norm closer to 1 than the longer verses. Thus, shorter verses were more likely to be picked up by the model. This problem is thus fixed by this procedure. This normalization can be considered as the activation function. A 10% dropout regularization and a batch normalization are applied to the output of this layer.
Similarly to the phonemes' side, the shaped matrix is split into a shaped matrix, called , and a -dimension vector. represents the set of verses and the stand-alone vector the candidate verse. is input to a GRU layer. It creates an embedding of dimensions. The GRU captures a relevant embedding of the theme and the grammatical natures (like gender and number) of the verses. The GRU may give more weight to the last verse or understand a pattern between each verse. Unlike the phonemes, side which had a LSTM, the additional cell state was not needed as the the theme and nature should be continuous from a verse to the next one.
Like the phonemes' side, the candidate verse and the verses embedding are concatenated into a single -dimension vector. It goes through two fully-connected layers with respectively and units. Each layer has a leaky rectifer linear unit with α = 0.2. They are regularized with a 10% dropout. A batch normalization is also applied to both outputs.
The embeddings from both the phonemes and the FastText sides are concatenated. This derives a -dimension vector. This vector is input to a fully connected layer with units. Its activation function is again a leaky rectifier linear unit with α = 0.2. Again, it is regularized with a 10% dropout. A batch normalization is applied to the output of this layer. Adding more layers here was tried, but it was not as efficient as adding a layer to the phoneme and FastText sides.
This -dimension vector feeds the last layer which has a single unit and is fully-connected. Its activation function is the sigmoid. Thus the number is the probability of the candidate verse to be the sequel of the verses.
Bellow, the diagram of the architecture generated with netron:
In order to generate enclosing rhymes, the poem written will not keep only the most likely verse after having applied the neural network model (greedy search). Similary to machine translation, a beam search algorithm is used here to keep the best sequences of verses which were derived by the model. Then, the model uses these sequences as the set of verses and compute the best sequel for each of them. The code is inspired from [7].
There are two models available to create the sequences. The first one is the most precise but its computation time is twelve times longer than the second one which produced slighly less good results. For both of them: = 8, = 38, = 51.
This model has 2,982,751 parameters.
This model has 240,231 parameters.
The user has to write at least one verse by oneself. The model writes a poem from it.
import preprocessing as pp
from lecture import *
from chercheur2vers import *
from keras.models import load_model
import fasttext.util
dico_u, dico_m, df_w2p = pd.read_pickle(os.path.join(".", "data", "dicos.pickle"))
ltr2idx, phon2idx, Tx, Ty = pp.chars2idx(df_w2p)
model_lire = load_model(os.path.join(".", "models", "lecteur", "lecteur_mdl.h5"))
lecteur = Lecteur(Tx, Ty, ltr2idx, phon2idx, dico_u, dico_m, n_brnn1=90, n_h1=80, net=model_lire, blank="_")
ft = fasttext.load_model(os.path.join("models", "FastText", 'cc.fr.300.bin'))
vers = pd.read_pickle(os.path.join("data", "vers.pkl"))
_, vers_test = split_train_dev(vers, test_size=0.03)
model_chercher = load_model(os.path.join(".", "models", "chercher_vers", "fast.h5"))
checheur = Chercheur2Vers(t_p=50, p2idx=phon2idx, net=model_chercher, n_antecedant_vers=8)
poem = checheur.beam_search_write(["Après cette peine, emprisonné par l'ennui,",
"Je suis mon cher ami très heureux de te voir."],
df=vers_test, vers_suivants=5, k=5, batch_size=512, split=100
lecteur=lecteur, ft=ft)
Two parameters are important for the quality of the generated poem and the speed of execution:
- test_size: amount of the poems in the data to pick the verses from to write the sequel
- k: beam width
If there is any memory error, it is possible to divide the size of the input matrixes thanks to the parameter split. The higher it is, the less memory is required.
It is also possible to export formated inputs from verses to run the neural networks on a GPU or TPU (like colab) elsewhere:
- Exporting the inputs
phoneme_input, fasttext_input = checheur.vers2matrixes(["Après cette peine emprisonné par l'ennui,",
"Je suis mon cher ami très heureux de te voir."],
lecteur, ft, len_output=8)
- Using them
poem = checheur.beam_search_write(liste_vers=["Après cette peine emprisonné par l'ennui,",
"Je suis mon cher ami très heureux de te voir."],
df=vers_test, vers_suivants=7, k=3, split=100, batch_size=512,
mphon_prec=phoneme_input, mvect_prec=fasttext_input)
Harry :
Après cette peine, emprisonné par l'ennui,
Je suis mon cher ami très heureux de te voir.
J'en attends peu de chose, et brûle de le voir.
Passés auprès de toi, blonde soeur d'Ophélie,
Que se borne en effet le roman de ma vie.
Vous aviez désolé les pays d'alentour,
Égare votre coeur, vous séduit en ce jour.
Dauphin :
Après cette peine, emprisonné par l'ennui,
Je suis mon cher ami très heureux de te voir.
J'en attends peu de chose, et brûle de le voir.
Passés auprès de toi, blonde soeur d'Ophélie,
Que se borne en effet le roman de ma vie.
Vous aviez désolé les pays d'alentour,
Que tous les Citoyens vous rendent tour à tour.
Élu :
Après cette peine, emprisonné par l'ennui,
Je suis mon cher ami très heureux de te voir.
J'en attends peu de chose, et brûle de le voir.
Passés auprès de toi, blonde soeur d'Ophélie,
Que se borne en effet le roman de ma vie.
Vous aviez désolé les pays d'alentour,
Qu'un peu de vanité se sent dans vos atours.
Harry is the third most likely, Dauphin the second and Élu is the most likely. There are three results displayed thanks to the beam seach (when k = 2, Harry is not displayed as it is not computed). These poems was got with k = 5, vers_suivants = 5 and the model fast.
A third neural network which can generate its own verses can be developped so this algorithm could write poetry all by itself. It can be done with a generative adversarial network or with reinforcement learning.
The text-to-phonemes converter can be used in various applications like text-to-speech applications.
The whole algorithm can also be used to create songs lyrics.
[2] Vaswani, A., et al.: Attention is all you need. arXiv (2017). arXiv:1706.03762
[3] Michel Claveau. 2004. "Traduction nombre => texte" from mclaveau.com
© Rémi Desmarescaux, Ryan Ugolini
Licensed under the MIT License.