A comprehensive list of ML and AI acronyms and abbreviations. Feel free to ⭐ it!
Machine learning is rapidly growing, creating more mysterious acronyms and abbreviations that might be challenging to follow, especially for beginners. This abbreviations list was created when I collected all acronyms from my Ph.D. thesis. Surprised by the enormous number, I searched through the web to copy and paste them to save time on writing. I found a few lists, but none covered all I needed. I decided to gather all this info in a single Table to make it easier to fellow ML enthusiasts.
- Contributors knowledge
- A Comprehensive Survey on Machine Learning for Networking Evolution Applications and Research Opportunities
- Deep learning acronym cheatsheet
- Machine learning acronyms list
- Awesome deep learning music
- Hearai.pl/paperslang/
Feel free to:
- add any ML-related abbreviation,
- add the definition alone,
- add an issue.
Currently, ~30% of abbreviations have descriptions, so feel free to add them! It should be a brief and concise one-liner rather than explain the whole subject. The purpose is to quickly find the meaning of an abbreviation, and the given definition helps to understand if it matches the context. Abbreviations should be in alphabetical order.
I have added a link to the online doc with all abbreviations to make it easier for you to contribute. Feel free to add a new one and sort the table automatically. You can copy the table from Google Sheets to the markdown table generator: https://www.tablesgenerator.com/markdown_tables.
Acronym | Description | Definition |
---|---|---|
ACC | ACCuracy | Accuracy is a metric for evaluating classification models. |
ACE | Alternating conditional expectation (ACE) algorithm | An algorithm to find the optimal transformations between the response variable and predictor variables in regression analysis. |
ADA | AdaBoosted Decision Trees | Using AdaBoost to improve performance in decision trees. |
AdaBoost | Adaptive Boosting | A statistical classification meta-algorithm that can be used in conjunction with many other types of learning algorithms to improve performance. |
AdR | AdaBoostRegressor | Using AdaBoost to improve performance in regression. |
ADT | Automatic Drum Transcription | Methods that aim to detect drum events in polyphonic music |
AE | AutoEncoder | A type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning) |
AGI | Artificial General Intelligence | The hypothetical ability of an intelligent agent to understand or learn any intellectual task that a human being can |
AI | Artificial Intelligence | The simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. |
AIWPSO | Adaptive Inertia Weight Particle Swarm Optimization | An optimization algorithm using an individual search ability (ISA) to indicate whether each particle lacks global exploration or local exploitation abilities in each dimension. |
AM | Activation Maximization | A method to visualize neural networks and aims to maximize the activation of certain neurons. |
AMT | Automatic Music Transcription | Computational algorithms that convert acoustic music signals into some form of music notation |
ANN | Artificial Neural Network | A collection of connected computational units or nodes called neurons arranged in multiple computational layers. |
AR | Augmented Reality | An interactive experience of a real-world environment where the objects that reside in the real world are enhanced by computer-generated perceptual information sometimes across multiple sensory modalities. |
ARNN | Anticipation Recurrent Neural Network | |
AUC | Area Under the (ROC) Curve | Probability of confidence in a model to accurately predict positive outcomes for actual positive instances |
BDT | Boosted Decision Tree | |
BERT | Bidirectional Encoder Representation from Transformers | Commonly used transformer-based language model. |
BiFPN | Bidirectional Feature Pyramid Network | |
BILSTM | Bidirectional Long Short-Term Memory | A bidirectional recurrent neural network architecture (see LSTM). |
BLEU | Bilingual Evaluation Understudy | A score of the effectiveness of translating one language into another one. |
BN | Bayesian Network | A probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). |
BNN | Bayesian Neural Network | A type of artificial neural network built by introducing random variations into the network either by giving the network's artificial neurons stochastic transfer functions either by giving the network's artificial neurons stochastic transfer functions or by giving them stochastic weights |
BP | BackPropagation | A widely used algorithm for training feedforward neural networks. |
BPMF | Bayesian Probabilistic Matrix Factorization | |
BPTT | Backpropagation Through Time | A gradient-based technique for training certain types of recurrent neural networks (e.g. Elman networks). |
BQML | Big Query Machine Learning | |
BRNN | Bidirectional Recurrent Neural Network | |
BRR | Bayesian Ridge Regression | |
CAE | Contractive AutoEncoder | |
CALA | Continuous Action-set Learning Automata | |
CART | Classification And Regression Tree | |
CAV | Concept Activation Vectors | Explainability method that provides an interpretation of a neural net's internal state in terms of human-friendly concepts. |
CBI | Counterfactual Bias Insertion | |
CBOW | Continuous Bag of Words | |
CDBN | Convolutional Deep Belief Networks | A type of deep artificial neural network composed of multiple layers of convolutional restricted Boltzmann machines stacked together. |
CE | Cross-Entropy | |
CEC | Constant Error Carousel | |
CF | Common Features | |
CLNN | ConditionaL Neural Networks | |
CMAC | Cerebellar Model Articulation Controller | |
CMMs | Conditional Markov Model | A graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. Also known as maximum-entropy Markov model (MEMM). |
CNN | Convolutional Neural Network | A class of artificial neural network (ANN) most commonly applied to analyze visual imagery |
ConvNet | Convolutional Neural Network | A class of artificial neural network (ANN) most commonly applied to analyze visual imagery |
CRBM | Conditional Restricted Boltzmann Machine | |
CRFs | Conditional Random Fields | |
CRNN | Convolutional Recurrent Neural Network | |
CTC | Connectionist Temporal Classification | |
CTR | Collaborative Topic Regression | |
CV | Coefficient of Variation | Intra-cluster similarity to measure the accuracy of unsupervised classification models based on clusters |
CV | Computer Vision | |
CV | Cross Validation | Resampling method for training, validation and testing a model across different iterations on portions of the full data set. |
CSLR | Continuous Sign Language Recognition | Sign language recognition and understanding (continuous using not only single words but whole phrases) getting knowledge about the meaning of signs essential for SLT. |
DAAF | Data Augmentation and Auxiliary Feature | |
DAE | Denoising AutoEncoder or Deep AutoEncoder | |
DBM | Deep Boltzmann Machine | |
DBN | Deep Belief Network | |
DBSCAN | Density-Based Spatial Clustering of Applications with Noise | |
DCGAN | Deep Convolutional Generative Adversarial Network | |
DCMDN | Deep Convolutional Mixture Density Network | |
DE | Differential Evolution | |
DeconvNet | DeConvolutional Neural Network | |
DeepLIFT | Deep Learning Important FeaTures | |
DL | Deep Learning | |
DNN | Deep Neural Network | |
DQN | Deep Q-Network | |
DR | Detection Rate | Represents the sensitivity or detection rate of a model |
DSN | Deep Stacking Network | |
DT | Decision Tree | |
DTD | Deep Taylor Decomposition | |
DWT | Discrete Wavelet Transform | |
ELECTRA | Efficiently Learning an Encoder that Classifies Token Replacements Accurately | |
ELM | Extreme Learning Machine | |
ELMo | Embeddings from Language Models | |
ELU | Exponential Linear Unit | |
EM | Expectation maximization | |
EMD | Entropy Minimization Discretization | |
ERNIE | Enhanced Representation through kNowledge IntEgration | |
ETL Pipeline | Extract Transform Load Pipeline | |
EXT | Extremely Randomized Trees | |
F1 Score | Harmonic Precision-Recall Mean | |
FALA | Finite Action-set Learning Automata | |
FC | Fully-Connected | Layers where all the inputs from one layer are connected to every activation unit of the next layer. |
FC-CNN | Fully Convolutional Convolutional Neural Network | A neural network that only performs convolution (and subsampling or upsampling) operations. |
FC-LSTM | Fully Connected Long Short-Term Memory | A fully connected neural network to combine the spatial information of surrounding stations (see LSTM and FC). |
FCM | Fuzzy C-Means | |
FCN | Fully Convolutional Network | A neural network that only performs convolution (and subsampling or upsampling) operations. |
FFT | Fast Fourier transform | |
FLOP | Floating Point Operations | A unit of measure of the amount of mathematical computations often used to describe the complexity of a neural network. |
FLOPS | Floating Point Operations Per Second | A unit of measure of computer performance |
FNN | Feedforward Neural Network | |
FNR | False Negative Rate | Proportion of actual positives predicted as negatives |
FPN | Feature Pyramid Network | |
FPR | False Positive Rate | Proportion of actual negatives predicted as positives |
FST | Finite state transducer | |
FWIoU | Frequency Weighted Intersection over Union | Metric in segmentation/object detection tasks. Weighted average of IoU's over classes, where weights depend on class frequency. |
GA | Genetic Algorithm | |
GALE | Global Aggregations of Local Explanations | Explainability method that aggregates local explanations (of single prediction) into an explanation how the whole model works. |
GAM | Generalized Additive Model | |
GAM | Global Attribution Mapping | |
GAMLSS | Generalized Additive Models for Location, Scale and Shape | |
GAN | Generative Adversarial Network | A deep-learning-based generative model using "indirect" training through the discriminator another neural network that is able to tell how much an input is "realistic" which itself is also being updated dynamically. |
GAP | Global Average Pooling | |
GBRCN | Gradient-Boosting Random Convolutional Network | |
GD | Gradient Descent | An optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient |
GEBI | Global Explanation for Bias Identification | Explainability method that aggregates local explanations (of single prediction) into a global explanation with the goal of finding biases and systematic errors in decision making. |
GFNN | Gradient Frequency Neural Networks | |
GLCM | Gray Level Co-occurrence Matrix | |
Gloss2Text | A task of transforming raw glosses into meaningful sentences. | |
GloVE | Global Vectors | |
GMM | Gaussian mixture model | A probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. |
GPR | Gaussian Process Regression | |
GPT | Generative Pre-trained Transformer | An autoregressive language model that uses deep learning to produce human-like text. |
GradCAM | GRADient-weighted Class Activation Mapping | |
HamNoSys | Hamburg Sign Language Notation System | An annotation system that describes sign language symbols |
HAN | Hierarchical Attention Network | |
HCA | Hierarchical Clustering Analysis | |
HDP | Hierarchical Dirichlet process | |
HHDS | HipHop Dataset | |
hLDA | Hierarchical Latent Dirichlet allocation | |
HMM | Hidden Markov Model | |
HNN | Hopfield Neural Network | |
i.i.d | Independent and Identically Distributed | |
ID3 | Iterative Dichotomiser 3 | |
IDR | Input dependence rate | |
IIR | Input independence rate | |
INFD | Explanation Infidelity | |
IoU | Jaccard index (intersection over union) | Metric in segmentation/object detection tasks. Ratio of areas of intersection and union of two (segmentation) boxes, corresponding to e.g. prediction and label. |
ISIC | International Skin Imaging Collaboration | |
k-NN | k-Nearest Neighbor | |
KDE | Kernel Density Estimation | |
KL | Kullback Leibler (KL) divergence | |
kNN | k-Nearest Neighbours | A non-parametric supervised learning method used for classification and regression. |
KRR | Kernel Ridge Regression | |
LDA | Latent Dirichlet Allocation | A generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. |
LDA | Linear Discriminant Analysis | |
LDADE | Latent Dirichlet Allocation Differential Evolution | |
LightGBM | Light Gradient-Boosting Machine | Gradient boosting framework that uses tree based learning algorithms, originally developed by Microsoft |
LIME | Local Interpretable Model-agnostic Explanations | |
LRP | Layer-wise Relevance Propagation | |
LSA | Latent semantic analysis | |
LSI | Latent Semantic Indexing | |
LSTM | Long Short-Term Memory | A recurrent neural network can process not only single data points (such as images) but also entire sequences of data (such as speech or video). |
LTR | Learning To Rank | |
LVQ | Learning Vector Quantization | |
MADE | Masked Autoencoder for Distribution Estimation | |
MAE | Mean Absolute Error | Average of the absolute error between the actual and predicted values |
MAF | Masked Autoregressive Flows | |
MAP | Maximum A Posteriori (MAP) Estimation | |
MAPE | Mean Absolute Prediction Error | Percentage of the error between the actual and predicted values |
MARS | Multivariate Adaptive Regression Spline | Non-parametric regression technique, extends linear models. Note that the name is trademarked, opem source implementations are often called "EARTH" |
MART | Multiple Additive Regression Tree | |
MaxEnt | Maximum Entropy | Entropy a scientific concept as well as a measurable physical property that is most commonly associated with a state of disorderrandomnessor uncertainty. |
MCLNN | Masked ConditionaL Neural Networks | |
MCMC | Markov Chain Monte Carlo | |
MCS | Model contrast score | |
MDL | Minimum description length (MDL) principle | |
MDN | Mixture Density Network | |
MDP | Markov Decision Process | |
MDRNN | Multidimensional recurrent neural network | |
MER | Music Emotion Recognition | |
MINT | Mutual Information based Transductive Feature Selection | |
MIoU | Mean Intersection over Union | Metric in segmentation/object detection tasks. Mean of IoU's over classes. |
ML | Machine Learning | The study of computer algorithms that can improve automatically through experience and by the use of data. |
MLE | Maximum Likelihood Estimation | |
MLM | Music Language Models | |
MLP | Multi-Layer Perceptron | A fully connected class of feedforward artificial neural network |
MPA | Mean Pixel Accuracy | Metric in segmentation/object detection tasks. Average ratio of correctly classified pixels by class. |
MRR | Mean Reciprocal Rank | |
MRS | Music Recommender System | |
MSDAE | Modified Sparse Denoising Autoencoder | |
MSE | Mean Squared Error | Average of the squares of the error between the actual and predicted values |
MSR | Music Style Recognition | |
NAS | Neural Architecture Search | A technique for automating the design of artificial neural networks. |
NB | Na ̈ıve Bayes | |
NBKE | Na ̈ıve Bayes with Kernel Estimation | |
NER | Named Entity Recognition | |
NERQ | Named Entity Recognition in Query | |
NF | Normalizing Flow | |
NFL | No Free Lunch (NFL) theorem | |
NLP | Natural Language Processing | |
NLT | Neural Machine Translation | An approach to translation with the use of a neural network to predict a sequence of words. |
NMS | Non Maximum Suppression | A technique used in Object Detection for removing redundand overlapping bounding boxes |
NN | Neural Network | |
NNMODFF | Neural Network based Multi-Onset Detection Function Fusion | |
NPE | Neural Physical Engine | |
NRMSE | Normalized RMSE | Cross-entropy Metric based on the logistic function that measures the error between the actual and predicted values. |
NST | Neural Style Transfer | A method that uses of deep neural networks for transfering style. |
NTM | Neural Turing Machine | |
ODF | Onset Detection Function | |
OLR | Ordinary Linear Regression | |
OLS | Ordinary Least Squares | |
PA | Pixel Accuracy | Metric in segmentation/object detection tasks. Ratio of correctly classified over total number of pixels. |
PACO | Poisson Additive Co-Clustering | |
PCA | Principal Component Analysis | The process of computing the principal components and using them to perform a change of basis on the data sometimes using only the first few principal components and ignoring the rest. |
PEGASUS | Pre-training with Extracted Gap-Sentences for Abstractive Summarization | |
PLSI | Probabilistic Latent Semantic Indexing | |
PM | Project Manager | |
PMF | Probabilistic Matrix Factorization | |
PMI | Pointwise Mutual Information | |
PNN | Probabilistic Neural Network | |
POC | Proof of Concept | |
POMDP | Partially Observable Markov Decision Process | |
POS | Part of Speech (POS) Tagging | |
PPMI | Positive Pointwise Mutual Information | |
PReLU | Parametric Rectified Linear Unit-Yor Topic Modeling | |
PU | Positive Unlabaled | Machine learning paradigma to learn from only positive and unlabeled data. |
PYTM | Pitman | |
RandNN | Random Neural Network | |
RANSAC | RANdom SAmple Consensus | |
RBF | Radial Basis Function | |
RBFNN | Radial Basis Function Neural Network | |
RBM | Restricted Boltzmann Machine | |
ReLU | Rectified Linear Unit | An activation function that allow fast and effective training of deep neural architectures on large and complex datasets. |
REPTree | Reduced Error Pruning Tree | |
RF | Random Forest | |
RGB | Red Green Blue color model | An additive color model used for display of images |
RICNN | Rotation Invariant Convolutional Neural Network | |
RIM | Recurrent Interence Machines | |
RIPPER | Repeated Incremental Pruning to Produce Error Reduction | |
RL | Reinforcement Learning | |
RLFM | Regression based latent factors | |
RMSE | Root MSE | Squared root of MSE |
RNN | Recurrent Neural Network | |
RNNLM | Recurrent Neural Network Language Model (RNNLM) | |
RoBERTa | Robustly Optimized BERT Pretraining Approach | Commonly used transformer-based language model. |
ROC | Received Operating Characteristic | Curve that plots TPR versus FPR at different parameter settings |
ROI | Region Of Interest | |
RR | Ridge Regression | |
RTRL | Real-Time Recurrent Learning | |
SAE | Stacked AE | |
SARSA | State-Action-Reward-State-Action | |
SBM | Stochastic block model | |
SBO | Structured Bayesian optimization | |
SBSE | Search-based software engineering | |
SCH | Stochastic convex hull | |
SDAE | Stacked DAE | |
seq2seq | Sequence to Sequence Learning | Desribes training approach to convert sequences from one domain (e.g. sentences in English) to sequences in another domain (e.g. the same sentences translated to French). |
SER | Sentence Error Rate | |
SGBoost | Stochastic Gradient Boosting | |
SGD | Stochastic Gradient Descent | |
SGVB | Stochastic Gradient Variational Bayes | |
SHAP | SHapley Additive exPlanation | |
SHLLE | Supervised Hessian Locally Linear Embedding | |
Sign2(Gloss+Text) | Sign to Gloss and Text | A two-step process requires joint learning of sign language recognition and translation. |
Sign2Gloss | A one to one translation from the single sign to the single gloss. | |
Sign2Text | A task of full translation from the sign language into the spoken one | grammar and syntax are included. |
SLP | Single-Layer Perceptron | |
SLRT | Sign Language Recognition Transformer | an encoder transformer model trained to predict sign gloss sequences it takes spatial embeddings and learns spatio-temporal representations. |
SLT | Sign Language Translation | A full translation of signs to a spoken language. |
SLTT | Sign Language Translation Transformer | an autoregressive transformer decoder model trained on output from SLRT to predict one word at a time to generate the corresponding spoken language sentence. |
SMBO | Sequential Model-Based Optimization | |
SOM | Self-Organizing Map | A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data |
SpRay | Spectral Relevance Analysis | Global explainability method using spectral clustering and local explanations (LRP). |
SSD | Single-Shot Detector | A type of object detector that consists of a single stage. Some examples are YOLO RetinaNet and EfficientDet. |
SSL | Self-Supervised Learning | |
SSVM | Smooth support vector machine | |
ST | Style Transfer | An algorithm that allows to tranfer properties of one object to another (i.e. transfer painitning style to a photography). |
STDA | Style Transfer Data Augmentation | A method using style transfer to augment dataset. |
STL | Selt-Taught Learning | |
SVD | Singing Voice Detection | |
SVD | Singular Value Decomposition | |
SVM | Support Vector Machine | Supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. |
SVR | Support Vector Regression | Supervised learning models with associated learning algorithms that analyze data for regression analysis. |
SVS | Singing Voice Separation | |
t-SNE | t-distributed stochastic neighbor embedding | |
T5 | Text-To-Text Transfer Transformer | Transformer based language model that uses a text-to-text approach. |
TD | Temporal Difference | |
TDA | Targeted Data Augmentation | |
TGAN | Temporal Generative Adversarial Network | |
THAID | THeta Automatic Interaction Detection | |
TINT | Tree-Interpreter | |
TLFN | Time-Lagged Feedforward Neural Network | |
TNR | True Negative Rate | Proportion of actual negatives that are correctly predicted |
TPR | True Positive Rate | Proportion of actual positives that are correctly predicted |
TRPO | Trust Region Policy Optimization | |
ULMFiT | Universal Language Model Fine-Tuning | |
V-Net | Volumetric Convolutional neural network | 3D image segmentation based on a volumetric fully convolutional neural network |
VAD | Voice Activity Detection | |
VAE | Variational AutoEncoder | An artificial neural network architecture belonging to the families of probabilistic graphical models and variational Bayesian methods. |
VGG | Visual Geometry Group | Popular deep convolutional model designed for classification. |
VPNN | Vector Product Neural Network | |
VQ-VAE | Vector Quantized Variational Autoencoders | |
VR | Virtual Reality | |
WER | Word Error Rate | metric to measure performance used in NLP solutions e.g. in automatic speech recognition (ASR). |
WFST | Weighted finite-state transducer (WFST) | |
WMA | Weighted Majority Algorithm | |
WPE | Weighted Prediction Error | |
XAI | Explainable Artificial Intelligence | A set of processes and methods to make machine learning algorithms and its results more interpretable. |
XGBoost | eXtreme Gradient Boosting | |
YOLO | You Only Look Once | Fast object detection algorithm. |