DeBERTa-ELL: Automated Proficiency Assessment for English Language Learners

Overview

DeBERTa-ELL is an advanced natural language processing (NLP) project that leverages the power of the DeBERTa (Decoding-enhanced BERT with Disentangled Attention) model to automatically assess the language proficiency of high school English Language Learners (ELLs) based on their essays. This project aims to provide a reliable, efficient, and scalable solution for educators and researchers in the field of second language acquisition and assessment.

Features

Utilizes state-of-the-art DeBERTa model for text analysis
Assesses multiple aspects of language proficiency:
- Cohesion
- Syntax
- Vocabulary
- Phraseology
- Grammar
- Conventions
Implements multi-label stratified k-fold cross-validation for robust model evaluation
Supports both training and inference modes
Includes data preprocessing and augmentation techniques
Provides detailed logging and model checkpointing

Requirements

Python 3.10+
PyTorch 2.3+
Transformers 4.37+

For a complete list of dependencies, please refer to the requirements.txt file.

Installation

Clone this repository:

git clone https://github.com/arnavs04/deberta-ell.git
cd deberta-ell

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Data Preparation

The training data is already in the data/feedback-prize-english-language-learning/ directory.

Training

To train the model, run:

python train.py

You can modify the hyperparameters in the configs.py file.

Inference

To run inference on new data:

python inference.py

You can modify the hyperparameters in the configs.py file

Model Architecture

This project uses the DeBERTa-v3-base model as the backbone for essay analysis. The model is fine-tuned on the task of multi-aspect proficiency assessment, with a custom head for multi-label regression.

Performance

The performance of the model was evaluated using Smooth L1 Loss for training and validation, and Mean Column-wise Root Mean Square Error (MCRMSE) score for the final evaluation. Below are the summarized results for each fold:

Fold	Score
0	0.4493
1	0.4576
2	0.4663
3	0.4529
Overall	0.4566

Contributing

Contributions are welcomed to improve DeBERTa-ELL! Please feel free to submit issues, fork the repository and send pull requests!

Citation

If you use this code for your research, please cite our project:

@software{DeBERTa_ELL2024,
  author = {Arnav Samal},
  title = {DeBERTa-ELL: Automated Proficiency Assessment for English Language Learners},
  year = {2024},
  url = {https://github.com/arnavs04/deberta-ell.git}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any queries, please open an issue or contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
__pycache__		__pycache__
data		data
logs		logs
misc		misc
models		models
src		src
tokenizer		tokenizer
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeBERTa-ELL: Automated Proficiency Assessment for English Language Learners

Overview

Features

Requirements

Installation

Usage

Data Preparation

Training

Inference

Model Architecture

Performance

Contributing

Citation

License

Contact

About

Languages

License

arnavs04/deberta-ell

Folders and files

Latest commit

History

Repository files navigation

DeBERTa-ELL: Automated Proficiency Assessment for English Language Learners

Overview

Features

Requirements

Installation

Usage

Data Preparation

Training

Inference

Model Architecture

Performance

Contributing

Citation

License

Contact

About

Resources

License

Stars

Watchers

Forks

Languages