Skip to content

High school research in Quote Speaker Identification for novel texts

Notifications You must be signed in to change notification settings


Repository files navigation


HuggingFace links

Try fine-tuned model on HuggingFace without installing it on local

See NovelQSI dataset, that was used for fine-tuning

See TriviaQA_SQuAD dataset, that was used to compare mrm8488/longformer-base-4096-finetuned-squadv2 and deepset/deberta-v3-base-squad2 base models

This is a github repository of my research project. You can find here all the important code that I've used.

  • TriviaQA_to_SQuAD.ipynb contains notebook python code for installing and converting original TriviaQA dataset to the dataset in the JSON format:

    {"id": "<id of the row>", "context": "context text", "question": "question text", "answers": {"text": ["answer1"], "answer_start": [<id where answer1 starts in the context>]}}
  • Compare_Models.ipynb contains notebook python code for comparing models. It was used for two base models comapring and for chosen base model with fine-tuned model comapring.

  • contains python code for reformating project-dialogism-novel-corpus dataset to previously mentioned JSON format

    Where "context" structure is:

    {"context": """
    <characters description>
    <novel summary till the current context window>
    Novel Text:
    <context window with the quote in the middle>"""}

    The "question" structure is:

    {"question": """Which character said "<the quote>"?"""}

    And the "answer" structure is:

    {"answer": {"text":["<one of the names from the characters description>"], "answer_start": [<id where the name starts in the context>]}}

    The "id" left without important changes.

  • /TheGambler is a part of the project-dialogism-novel-corpus dataset, that was reformatted and used for fine-tuning.

  • train_Longformer.ipynb contains notebook python code for fine-tuning chosen base model. Running the notebook requires A100 accelerator in the Google Colaboratory.


High school research in Quote Speaker Identification for novel texts






No releases published


No packages published