Skip to content

msalhab96/SpeeQ

Repository files navigation

Documentation Status CI Code style: black License: MIT

SpeeQ

"SpeeQ", pronounced as "speekiu", is a Python-based speech recognition framework that allows developers and researchers to experiment and train various speech recognition models. It offers pre-implemented model architectures that can be trained with just a few lines of code, making it a suitable option for quick prototyping and testing of speech recognition models.

To get started, refer to the documentation. If you need assistance or want to stay connected, please join our Discord Server.

Installation

To install this package, you can follow the steps below:

  1. Create and activate a Python environment using the following commands:
python3 -m venv env
source env/bin/activate
  1. Install the packge from source

    git clone https://github.com/msalhab96/SpeeQ.git
    cd SpeeQ
    pip install -r requirements.txt
    pip install -e .

Implemented Models/Papers

Model name Paper Type
Deep Speech 1 Deep Speech: Scaling up end-to-end speech recognition CTC
Deep Speech 2 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin CTC
Conformer Conformer: Convolution-augmented Transformer for Speech Recognition CTC
Jasper Jasper: An End-to-End Convolutional Neural Acoustic Model CTC
Wav2Letter Wav2Letter: an End-to-End ConvNet-based Speech Recognition System CTC
QuartzNet QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions CTC
Squeezeformer Squeezeformer: An Efficient Transformer for Automatic Speech Recognition CTC
RNNTransducer Sequence Transduction with Recurrent Neural Networks Transducer
ConformerTransducer Conformer: Convolution-augmented Transformer for Speech Recognition Transducer
ContextNet ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context Transducer
VGGTransformer-Transducer Transformer-Transducer: End-to-End Speech Recognition with Self-Attention Transducer
Transformer-Transducer Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss Transducer
BasicAttSeq2SeqRNN N/A Seq2Seq (encoder/decoder)
LAS Listen, Attend and Spell Seq2Seq (encoder/decoder)
RNNWithLocationAwareAtt Attention-Based Models for Speech Recognition Seq2Seq (encoder/decoder)
SpeechTransformer Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition Seq2Seq (encoder/decoder)

Contributiuon

Your contributions are highly valued and appreciated! Our aim is to create an open and transparent environment that facilitates easy and straightforward contributions to this project. This can include reporting any issues or bugs you encounter, engaging in discussions regarding the current codebase, submitting fixes, proposing new features, or even becoming a maintainer yourself. We believe that your input is crucial to the continued growth and success of this framework. To start contributing to the framework, please consult the guidelines for contributions.

License & Citation

The framework is licensed under MIT. Therefore, if you use the framework, please consider citing it using the following bitex.

@software{Salhab_SpeeQ_A_framework_2023,
author = {Salhab, Mahmoud},
doi = {10.5281/zenodo.7708780},
license = {MIT},
month = {3},
title = {{SpeeQ: A framework for automatic speech recognition}},
url = {https://github.com/msalhab96/SpeeQ},
version = {0.0.1},
year = {2023}
}