Skip to content

High school research in Quote Speaker Identification for novel texts

Notifications You must be signed in to change notification settings

Kkordik/NovelQSI

Repository files navigation

NovelQSI

HuggingFace links

Try fine-tuned model on HuggingFace without installing it on local

See NovelQSI dataset, that was used for fine-tuning

See TriviaQA_SQuAD dataset, that was used to compare mrm8488/longformer-base-4096-finetuned-squadv2 and deepset/deberta-v3-base-squad2 base models

This is a github repository of my research project. You can find here all the important code that I've used.

  • TriviaQA_to_SQuAD.ipynb contains notebook python code for installing and converting original TriviaQA dataset to the dataset in the JSON format:

    {"id": "<id of the row>", "context": "context text", "question": "question text", "answers": {"text": ["answer1"], "answer_start": [<id where answer1 starts in the context>]}}
    
  • Compare_Models.ipynb contains notebook python code for comparing models. It was used for two base models comapring and for chosen base model with fine-tuned model comapring.

  • reformat_dialogism.py contains python code for reformating project-dialogism-novel-corpus dataset to previously mentioned JSON format

    Where "context" structure is:

    {"context": """
    Characters:
    <characters description>
    Summary:
    <novel summary till the current context window>
    Novel Text:
    <context window with the quote in the middle>"""}
    

    The "question" structure is:

    {"question": """Which character said "<the quote>"?"""}
    

    And the "answer" structure is:

    {"answer": {"text":["<one of the names from the characters description>"], "answer_start": [<id where the name starts in the context>]}}
    

    The "id" left without important changes.

  • /TheGambler is a part of the project-dialogism-novel-corpus dataset, that was reformatted and used for fine-tuning.

  • train_Longformer.ipynb contains notebook python code for fine-tuning chosen base model. Running the notebook requires A100 accelerator in the Google Colaboratory.

About

High school research in Quote Speaker Identification for novel texts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published