text
stringlengths
2
11.8k
Preprocess [[open-in-colab]] Before you can train a model on a dataset, it needs to be preprocessed into the expected model input format. Whether your data is text, images, or audio, they need to be converted and assembled into batches of tensors. πŸ€— Transformers provides a set of preprocessing classes to help prepare your data for the model. In this tutorial, you'll learn that for:
Text, use a Tokenizer to convert text into a sequence of tokens, create a numerical representation of the tokens, and assemble them into tensors. Speech and audio, use a Feature extractor to extract sequential features from audio waveforms and convert them into tensors. Image inputs use a ImageProcessor to convert images into tensors. Multimodal inputs, use a Processor to combine a tokenizer and a feature extractor or image processor.
AutoProcessor always works and automatically chooses the correct class for the model you're using, whether you're using a tokenizer, image processor, feature extractor or processor. Before you begin, install πŸ€— Datasets so you can load some datasets to experiment with: pip install datasets Natural Language Processing
Before you begin, install πŸ€— Datasets so you can load some datasets to experiment with: pip install datasets Natural Language Processing The main tool for preprocessing textual data is a tokenizer. A tokenizer splits text into tokens according to a set of rules. The tokens are converted into numbers and then tensors, which become the model inputs. Any additional inputs required by the model are added by the tokenizer.
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer. This ensures the text is split the same way as the pretraining corpus, and uses the same corresponding tokens-to-index (usually referred to as the vocab) during pretraining. Get started by loading a pretrained tokenizer with the [AutoTokenizer.from_pretrained] method. This downloads the vocab a model was pretrained with:
Get started by loading a pretrained tokenizer with the [AutoTokenizer.from_pretrained] method. This downloads the vocab a model was pretrained with: from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased") Then pass your text to the tokenizer:
Then pass your text to the tokenizer: encoded_input = tokenizer("Do not meddle in the affairs of wizards, for they are subtle and quick to anger.") print(encoded_input) {'input_ids': [101, 2079, 2025, 19960, 10362, 1999, 1996, 3821, 1997, 16657, 1010, 2005, 2027, 2024, 11259, 1998, 4248, 2000, 4963, 1012, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
The tokenizer returns a dictionary with three important items: input_ids are the indices corresponding to each token in the sentence. attention_mask indicates whether a token should be attended to or not. token_type_ids identifies which sequence a token belongs to when there is more than one sequence. Return your input by decoding the input_ids: tokenizer.decode(encoded_input["input_ids"]) '[CLS] Do not meddle in the affairs of wizards, for they are subtle and quick to anger. [SEP]'
Return your input by decoding the input_ids: tokenizer.decode(encoded_input["input_ids"]) '[CLS] Do not meddle in the affairs of wizards, for they are subtle and quick to anger. [SEP]' As you can see, the tokenizer added two special tokens - CLS and SEP (classifier and separator) - to the sentence. Not all models need special tokens, but if they do, the tokenizer automatically adds them for you. If there are several sentences you want to preprocess, pass them as a list to the tokenizer:
batch_sentences = [ "But what about second breakfast?", "Don't think he knows about second breakfast, Pip.", "What about elevensies?", ] encoded_inputs = tokenizer(batch_sentences) print(encoded_inputs) {'input_ids': [[101, 1252, 1184, 1164, 1248, 6462, 136, 102], [101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102], [101, 1327, 1164, 5450, 23434, 136, 102]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]]}
Pad Sentences aren't always the same length which can be an issue because tensors, the model inputs, need to have a uniform shape. Padding is a strategy for ensuring tensors are rectangular by adding a special padding token to shorter sentences. Set the padding parameter to True to pad the shorter sequences in the batch to match the longest sequence:
batch_sentences = [ "But what about second breakfast?", "Don't think he knows about second breakfast, Pip.", "What about elevensies?", ] encoded_input = tokenizer(batch_sentences, padding=True) print(encoded_input) {'input_ids': [[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0], [101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102], [101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]]}
The first and third sentences are now padded with 0's because they are shorter. Truncation On the other end of the spectrum, sometimes a sequence may be too long for a model to handle. In this case, you'll need to truncate the sequence to a shorter length. Set the truncation parameter to True to truncate a sequence to the maximum length accepted by the model:
batch_sentences = [ "But what about second breakfast?", "Don't think he knows about second breakfast, Pip.", "What about elevensies?", ] encoded_input = tokenizer(batch_sentences, padding=True, truncation=True) print(encoded_input) {'input_ids': [[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0], [101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102], [101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]]}
Check out the Padding and truncation concept guide to learn more different padding and truncation arguments. Build tensors Finally, you want the tokenizer to return the actual tensors that get fed to the model. Set the return_tensors parameter to either pt for PyTorch, or tf for TensorFlow:
batch_sentences = [ "But what about second breakfast?", "Don't think he knows about second breakfast, Pip.", "What about elevensies?", ] encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt") print(encoded_input) {'input_ids': tensor([[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0], [101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102], [101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])} </pt> <tf>py batch_sentences = [ "But what about second breakfast?", "Don't think he knows about second breakfast, Pip.", "What about elevensies?", ] encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf") print(encoded_input) {'input_ids': , 'token_type_ids': , 'attention_mask': }
Different pipelines support tokenizer arguments in their __call__() differently. text-2-text-generation pipelines support (i.e. pass on) only truncation. text-generation pipelines support max_length, truncation, padding and add_special_tokens. In fill-mask pipelines, tokenizer arguments can be passed in the tokenizer_kwargs argument (dictionary).
Audio For audio tasks, you'll need a feature extractor to prepare your dataset for the model. The feature extractor is designed to extract features from raw audio data, and convert them into tensors. Load the MInDS-14 dataset (see the πŸ€— Datasets tutorial for more details on how to load a dataset) to see how you can use a feature extractor with audio datasets: from datasets import load_dataset, Audio dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")
from datasets import load_dataset, Audio dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") Access the first element of the audio column to take a look at the input. Calling the audio column automatically loads and resamples the audio file:
Access the first element of the audio column to take a look at the input. Calling the audio column automatically loads and resamples the audio file: dataset[0]["audio"] {'array': array([ 0. , 0.00024414, -0.00024414, , -0.00024414, 0. , 0. ], dtype=float32), 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav', 'sampling_rate': 8000}
This returns three items: array is the speech signal loaded - and potentially resampled - as a 1D array. path points to the location of the audio file. sampling_rate refers to how many data points in the speech signal are measured per second.
For this tutorial, you'll use the Wav2Vec2 model. Take a look at the model card, and you'll learn Wav2Vec2 is pretrained on 16kHz sampled speech audio. It is important your audio data's sampling rate matches the sampling rate of the dataset used to pretrain the model. If your data's sampling rate isn't the same, then you need to resample your data. Use πŸ€— Datasets' [~datasets.Dataset.cast_column] method to upsample the sampling rate to 16kHz:
Use πŸ€— Datasets' [~datasets.Dataset.cast_column] method to upsample the sampling rate to 16kHz: dataset = dataset.cast_column("audio", Audio(sampling_rate=16_000)) Call the audio column again to resample the audio file:
dataset = dataset.cast_column("audio", Audio(sampling_rate=16_000)) Call the audio column again to resample the audio file: dataset[0]["audio"] {'array': array([ 2.3443763e-05, 2.1729663e-04, 2.2145823e-04, , 3.8356509e-05, -7.3497440e-06, -2.1754686e-05], dtype=float32), 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav', 'sampling_rate': 16000}
Next, load a feature extractor to normalize and pad the input. When padding textual data, a 0 is added for shorter sequences. The same idea applies to audio data. The feature extractor adds a 0 - interpreted as silence - to array. Load the feature extractor with [AutoFeatureExtractor.from_pretrained]: from transformers import AutoFeatureExtractor feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
from transformers import AutoFeatureExtractor feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base") Pass the audio array to the feature extractor. We also recommend adding the sampling_rate argument in the feature extractor in order to better debug any silent errors that may occur.
Pass the audio array to the feature extractor. We also recommend adding the sampling_rate argument in the feature extractor in order to better debug any silent errors that may occur. audio_input = [dataset[0]["audio"]["array"]] feature_extractor(audio_input, sampling_rate=16000) {'input_values': [array([ 3.8106556e-04, 2.7506407e-03, 2.8015103e-03, , 5.6335266e-04, 4.6588284e-06, -1.7142107e-04], dtype=float32)]}
Just like the tokenizer, you can apply padding or truncation to handle variable sequences in a batch. Take a look at the sequence length of these two audio samples: dataset[0]["audio"]["array"].shape (173398,) dataset[1]["audio"]["array"].shape (106496,) Create a function to preprocess the dataset so the audio samples are the same lengths. Specify a maximum sample length, and the feature extractor will either pad or truncate the sequences to match it:
def preprocess_function(examples): audio_arrays = [x["array"] for x in examples["audio"]] inputs = feature_extractor( audio_arrays, sampling_rate=16000, padding=True, max_length=100000, truncation=True, ) return inputs Apply the preprocess_function to the first few examples in the dataset: processed_dataset = preprocess_function(dataset[:5])
Apply the preprocess_function to the first few examples in the dataset: processed_dataset = preprocess_function(dataset[:5]) The sample lengths are now the same and match the specified maximum length. You can pass your processed dataset to the model now! processed_dataset["input_values"][0].shape (100000,) processed_dataset["input_values"][1].shape (100000,)
processed_dataset["input_values"][0].shape (100000,) processed_dataset["input_values"][1].shape (100000,) Computer vision For computer vision tasks, you'll need an image processor to prepare your dataset for the model. Image preprocessing consists of several steps that convert images into the input expected by the model. These steps include but are not limited to resizing, normalizing, color channel correction, and converting images to tensors.
Image preprocessing often follows some form of image augmentation. Both image preprocessing and image augmentation transform image data, but they serve different purposes:
Image augmentation alters images in a way that can help prevent overfitting and increase the robustness of the model. You can get creative in how you augment your data - adjust brightness and colors, crop, rotate, resize, zoom, etc. However, be mindful not to change the meaning of the images with your augmentations. Image preprocessing guarantees that the images match the model’s expected input format. When fine-tuning a computer vision model, images must be preprocessed exactly as when the model was initially trained.
You can use any library you like for image augmentation. For image preprocessing, use the ImageProcessor associated with the model. Load the food101 dataset (see the πŸ€— Datasets tutorial for more details on how to load a dataset) to see how you can use an image processor with computer vision datasets: Use πŸ€— Datasets split parameter to only load a small sample from the training split since the dataset is quite large! from datasets import load_dataset dataset = load_dataset("food101", split="train[:100]")
from datasets import load_dataset dataset = load_dataset("food101", split="train[:100]") Next, take a look at the image with πŸ€— Datasets Image feature: dataset[0]["image"] Load the image processor with [AutoImageProcessor.from_pretrained]: from transformers import AutoImageProcessor image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
Load the image processor with [AutoImageProcessor.from_pretrained]: from transformers import AutoImageProcessor image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224") First, let's add some image augmentation. You can use any library you prefer, but in this tutorial, we'll use torchvision's transforms module. If you're interested in using another data augmentation library, learn how in the Albumentations or Kornia notebooks.
Here we use Compose to chain together a couple of transforms - RandomResizedCrop and ColorJitter. Note that for resizing, we can get the image size requirements from the image_processor. For some models, an exact height and width are expected, for others only the shortest_edge is defined.
from torchvision.transforms import RandomResizedCrop, ColorJitter, Compose size = ( image_processor.size["shortest_edge"] if "shortest_edge" in image_processor.size else (image_processor.size["height"], image_processor.size["width"]) ) _transforms = Compose([RandomResizedCrop(size), ColorJitter(brightness=0.5, hue=0.5)])
The model accepts pixel_values as its input. ImageProcessor can take care of normalizing the images, and generating appropriate tensors. Create a function that combines image augmentation and image preprocessing for a batch of images and generates pixel_values: def transforms(examples): images = [_transforms(img.convert("RGB")) for img in examples["image"]] examples["pixel_values"] = image_processor(images, do_resize=False, return_tensors="pt")["pixel_values"] return examples
In the example above we set do_resize=False because we have already resized the images in the image augmentation transformation, and leveraged the size attribute from the appropriate image_processor. If you do not resize images during image augmentation, leave this parameter out. By default, ImageProcessor will handle the resizing. If you wish to normalize images as a part of the augmentation transformation, use the image_processor.image_mean, and image_processor.image_std values.
Then use πŸ€— Datasets[~datasets.Dataset.set_transform] to apply the transforms on the fly: dataset.set_transform(transforms) Now when you access the image, you'll notice the image processor has added pixel_values. You can pass your processed dataset to the model now! dataset[0].keys() Here is what the image looks like after the transforms are applied. The image has been randomly cropped and it's color properties are different.
dataset[0].keys() Here is what the image looks like after the transforms are applied. The image has been randomly cropped and it's color properties are different. import numpy as np import matplotlib.pyplot as plt img = dataset[0]["pixel_values"] plt.imshow(img.permute(1, 2, 0))
import numpy as np import matplotlib.pyplot as plt img = dataset[0]["pixel_values"] plt.imshow(img.permute(1, 2, 0)) For tasks like object detection, semantic segmentation, instance segmentation, and panoptic segmentation, ImageProcessor offers post processing methods. These methods convert model's raw outputs into meaningful predictions such as bounding boxes, or segmentation maps.
Pad In some cases, for instance, when fine-tuning DETR, the model applies scale augmentation at training time. This may cause images to be different sizes in a batch. You can use [DetrImageProcessor.pad] from [DetrImageProcessor] and define a custom collate_fn to batch images together.
def collate_fn(batch): pixel_values = [item["pixel_values"] for item in batch] encoding = image_processor.pad(pixel_values, return_tensors="pt") labels = [item["labels"] for item in batch] batch = {} batch["pixel_values"] = encoding["pixel_values"] batch["pixel_mask"] = encoding["pixel_mask"] batch["labels"] = labels return batch
Multimodal For tasks involving multimodal inputs, you'll need a processor to prepare your dataset for the model. A processor couples together two processing objects such as as tokenizer and feature extractor. Load the LJ Speech dataset (see the πŸ€— Datasets tutorial for more details on how to load a dataset) to see how you can use a processor for automatic speech recognition (ASR): from datasets import load_dataset lj_speech = load_dataset("lj_speech", split="train")
from datasets import load_dataset lj_speech = load_dataset("lj_speech", split="train") For ASR, you're mainly focused on audio and text so you can remove the other columns: lj_speech = lj_speech.map(remove_columns=["file", "id", "normalized_text"]) Now take a look at the audio and text columns:
lj_speech[0]["audio"] {'array': array([-7.3242188e-04, -7.6293945e-04, -6.4086914e-04, , 7.3242188e-04, 2.1362305e-04, 6.1035156e-05], dtype=float32), 'path': '/root/.cache/huggingface/datasets/downloads/extracted/917ece08c95cf0c4115e45294e3cd0dee724a1165b7fc11798369308a465bd26/LJSpeech-1.1/wavs/LJ001-0001.wav', 'sampling_rate': 22050} lj_speech[0]["text"] 'Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition'
Remember you should always resample your audio dataset's sampling rate to match the sampling rate of the dataset used to pretrain a model! lj_speech = lj_speech.cast_column("audio", Audio(sampling_rate=16_000)) Load a processor with [AutoProcessor.from_pretrained]: from transformers import AutoProcessor processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base-960h")
Load a processor with [AutoProcessor.from_pretrained]: from transformers import AutoProcessor processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base-960h") Create a function to process the audio data contained in array to input_values, and tokenize text to labels. These are the inputs to the model: def prepare_dataset(example): audio = example["audio"] example.update(processor(audio=audio["array"], text=example["text"], sampling_rate=16000)) return example
def prepare_dataset(example): audio = example["audio"] example.update(processor(audio=audio["array"], text=example["text"], sampling_rate=16000)) return example Apply the prepare_dataset function to a sample: prepare_dataset(lj_speech[0]) The processor has now added input_values and labels, and the sampling rate has also been correctly downsampled to 16kHz. You can pass your processed dataset to the model now!
Run training on Amazon SageMaker The documentation has been moved to hf.co/docs/sagemaker. This page will be removed in transformers 5.0. Table of Content Train Hugging Face models on Amazon SageMaker with the SageMaker Python SDK Deploy Hugging Face models to Amazon SageMaker with the SageMaker Python SDK
How to convert a πŸ€— Transformers model to TensorFlow? Having multiple frameworks available to use with πŸ€— Transformers gives you flexibility to play their strengths when designing your application, but it implies that compatibility must be added on a per-model basis. The good news is that adding TensorFlow compatibility to an existing model is simpler than adding a new model from scratch! Whether you wish to have a deeper understanding of large TensorFlow models, make a major open-source contribution, or enable TensorFlow for your model of choice, this guide is for you. This guide empowers you, a member of our community, to contribute TensorFlow model weights and/or architectures to be used in πŸ€— Transformers, with minimal supervision from the Hugging Face team. Writing a new model is no small feat, but hopefully this guide will make it less of a rollercoaster 🎒 and more of a walk in the park 🚢. Harnessing our collective experiences is absolutely critical to make this process increasingly easier, and thus we highly encourage that you suggest improvements to this guide! Before you dive deeper, it is recommended that you check the following resources if you're new to πŸ€— Transformers: - General overview of πŸ€— Transformers - Hugging Face's TensorFlow Philosophy In the remainder of this guide, you will learn what's needed to add a new TensorFlow model architecture, the procedure to convert PyTorch into TensorFlow model weights, and how to efficiently debug mismatches across ML frameworks. Let's get started!
Are you unsure whether the model you wish to use already has a corresponding TensorFlow architecture? Β  Check the model_type field of the config.json of your model of choice (example). If the corresponding model folder in πŸ€— Transformers has a file whose name starts with "modeling_tf", it means that it has a corresponding TensorFlow architecture (example).
Step-by-step guide to add TensorFlow model architecture code There are many ways to design a large model architecture, and multiple ways of implementing said design. However, you might recall from our general overview of πŸ€— Transformers that we are an opinionated bunch - the ease of use of πŸ€— Transformers relies on consistent design choices. From experience, we can tell you a few important things about adding TensorFlow models:
Don't reinvent the wheel! More often than not, there are at least two reference implementations you should check: the PyTorch equivalent of the model you are implementing and other TensorFlow models for the same class of problems. Great model implementations survive the test of time. This doesn't happen because the code is pretty, but rather because the code is clear, easy to debug and build upon. If you make the life of the maintainers easy with your TensorFlow implementation, by replicating the same patterns as in other TensorFlow models and minimizing the mismatch to the PyTorch implementation, you ensure your contribution will be long lived. Ask for help when you're stuck! The πŸ€— Transformers team is here to help, and we've probably found solutions to the same problems you're facing.
Here's an overview of the steps needed to add a TensorFlow model architecture: 1. Select the model you wish to convert 2. Prepare transformers dev environment 3. (Optional) Understand theoretical aspects and the existing implementation 4. Implement the model architecture 5. Implement model tests 6. Submit the pull request 7. (Optional) Build demos and share with the world 1.-3. Prepare your model contribution 1. Select the model you wish to convert Let's start off with the basics: the first thing you need to know is the architecture you want to convert. If you don't have your eyes set on a specific architecture, asking the πŸ€— Transformers team for suggestions is a great way to maximize your impact - we will guide you towards the most prominent architectures that are missing on the TensorFlow side. If the specific model you want to use with TensorFlow already has a TensorFlow architecture implementation in πŸ€— Transformers but is lacking weights, feel free to jump straight into the weight conversion section of this page. For simplicity, the remainder of this guide assumes you've decided to contribute with the TensorFlow version of BrandNewBert (the same example as in the guide to add a new model from scratch).
Before starting the work on a TensorFlow model architecture, double-check that there is no ongoing effort to do so. You can search for BrandNewBert on the pull request GitHub page to confirm that there is no TensorFlow-related pull request. 2. Prepare transformers dev environment Having selected the model architecture, open a draft PR to signal your intention to work on it. Follow the instructions below to set up your environment and open a draft PR.
Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account. Clone your transformers fork to your local disk, and add the base repository as a remote: git clone https://github.com/[your Github handle]/transformers.git cd transformers git remote add upstream https://github.com/huggingface/transformers.git Set up a development environment, for instance by running the following command:
Set up a development environment, for instance by running the following command: python -m venv .env source .env/bin/activate pip install -e ".[dev]" Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a failure with this command. If that's the case make sure to install TensorFlow then do: pip install -e ".[quality]" Note: You don't need to have CUDA installed. Making the new model work on CPU is sufficient.
pip install -e ".[quality]" Note: You don't need to have CUDA installed. Making the new model work on CPU is sufficient. Create a branch with a descriptive name from your main branch git checkout -b add_tf_brand_new_bert Fetch and rebase to current main git fetch upstream git rebase upstream/main Add an empty .py file in transformers/src/models/brandnewbert/ named modeling_tf_brandnewbert.py. This will be your TensorFlow model file. Push the changes to your account using:
Add an empty .py file in transformers/src/models/brandnewbert/ named modeling_tf_brandnewbert.py. This will be your TensorFlow model file. Push the changes to your account using: git add . git commit -m "initial commit" git push -u origin add_tf_brand_new_bert
Push the changes to your account using: git add . git commit -m "initial commit" git push -u origin add_tf_brand_new_bert Once you are satisfied, go to the webpage of your fork on GitHub. Click on β€œPull request”. Make sure to add the GitHub handle of some members of the Hugging Face team as reviewers, so that the Hugging Face team gets notified for future changes. Change the PR into a draft by clicking on β€œConvert to draft” on the right of the GitHub pull request web page.
Now you have set up a development environment to port BrandNewBert to TensorFlow in πŸ€— Transformers. 3. (Optional) Understand theoretical aspects and the existing implementation You should take some time to read BrandNewBert's paper, if such descriptive work exists. There might be large sections of the paper that are difficult to understand. If this is the case, this is fine - don't worry! The goal is not to get a deep theoretical understanding of the paper, but to extract the necessary information required to effectively re-implement the model in πŸ€— Transformers using TensorFlow. That being said, you don't have to spend too much time on the theoretical aspects, but rather focus on the practical ones, namely the existing model documentation page (e.g. model docs for BERT). After you've grasped the basics of the models you are about to implement, it's important to understand the existing implementation. This is a great chance to confirm that a working implementation matches your expectations for the model, as well as to foresee technical challenges on the TensorFlow side. It's perfectly natural that you feel overwhelmed with the amount of information that you've just absorbed. It is definitely not a requirement that you understand all facets of the model at this stage. Nevertheless, we highly encourage you to clear any pressing questions in our forum. 4. Model implementation Now it's time to finally start coding. Our suggested starting point is the PyTorch file itself: copy the contents of modeling_brand_new_bert.py inside src/transformers/models/brand_new_bert/ into modeling_tf_brand_new_bert.py. The goal of this section is to modify the file and update the import structure of πŸ€— Transformers such that you can import TFBrandNewBert and TFBrandNewBert.from_pretrained(model_repo, from_pt=True) successfully loads a working TensorFlow BrandNewBert model. Sadly, there is no prescription to convert a PyTorch model into TensorFlow. You can, however, follow our selection of tips to make the process as smooth as possible: - Prepend TF to the name of all classes (e.g. BrandNewBert becomes TFBrandNewBert). - Most PyTorch operations have a direct TensorFlow replacement. For example, torch.nn.Linear corresponds to tf.keras.layers.Dense, torch.nn.Dropout corresponds to tf.keras.layers.Dropout, etc. If you're not sure about a specific operation, you can use the TensorFlow documentation or the PyTorch documentation. - Look for patterns in the πŸ€— Transformers codebase. If you come across a certain operation that doesn't have a direct replacement, the odds are that someone else already had the same problem. - By default, keep the same variable names and structure as in PyTorch. This will make it easier to debug, track issues, and add fixes down the line. - Some layers have different default values in each framework. A notable example is the batch normalization layer's epsilon (1e-5 in PyTorch and 1e-3 in TensorFlow). Double-check the documentation! - PyTorch's nn.Parameter variables typically need to be initialized within TF Layer's build(). See the following example: PyTorch / TensorFlow - If the PyTorch model has a #copied from on top of a function, the odds are that your TensorFlow model can also borrow that function from the architecture it was copied from, assuming it has a TensorFlow architecture. - Assigning the name attribute correctly in TensorFlow functions is critical to do the from_pt=True weight cross-loading. name is almost always the name of the corresponding variable in the PyTorch code. If name is not properly set, you will see it in the error message when loading the model weights. - The logic of the base model class, BrandNewBertModel, will actually reside in TFBrandNewBertMainLayer, a Keras layer subclass (example). TFBrandNewBertModel will simply be a wrapper around this layer. - Keras models need to be built in order to load pretrained weights. For that reason, TFBrandNewBertPreTrainedModel will need to hold an example of inputs to the model, the dummy_inputs (example). - If you get stuck, ask for help - we're here to help you! πŸ€— In addition to the model file itself, you will also need to add the pointers to the model classes and related documentation pages. You can complete this part entirely following the patterns in other PRs (example). Here's a list of the needed manual changes: - Include all public classes of BrandNewBert in src/transformers/__init__.py - Add BrandNewBert classes to the corresponding Auto classes in src/transformers/models/auto/modeling_tf_auto.py - Add the lazy loading classes related to BrandNewBert in src/transformers/utils/dummy_tf_objects.py - Update the import structures for the public classes in src/transformers/models/brand_new_bert/__init__.py - Add the documentation pointers to the public methods of BrandNewBert in docs/source/en/model_doc/brand_new_bert.md - Add yourself to the list of contributors to BrandNewBert in docs/source/en/model_doc/brand_new_bert.md - Finally, add a green tick βœ… to the TensorFlow column of BrandNewBert in docs/source/en/index.md When you're happy with your implementation, run the following checklist to confirm that your model architecture is ready: 1. All layers that behave differently at train time (e.g. Dropout) are called with a training argument, which is propagated all the way from the top-level classes 2. You have used #copied from whenever possible 3. TFBrandNewBertMainLayer and all classes that use it have their call function decorated with @unpack_inputs 4. TFBrandNewBertMainLayer is decorated with @keras_serializable 5. A TensorFlow model can be loaded from PyTorch weights using TFBrandNewBert.from_pretrained(model_repo, from_pt=True) 6. You can call the TensorFlow model using the expected input format 5. Add model tests Hurray, you've implemented a TensorFlow model! Now it's time to add tests to make sure that your model behaves as expected. As in the previous section, we suggest you start by copying the test_modeling_brand_new_bert.py file in tests/models/brand_new_bert/ into test_modeling_tf_brand_new_bert.py, and continue by making the necessary TensorFlow replacements. For now, in all .from_pretrained() calls, you should use the from_pt=True flag to load the existing PyTorch weights. After you're done, it's time for the moment of truth: run the tests! 😬
NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 \ py.test -vv tests/models/brand_new_bert/test_modeling_tf_brand_new_bert.py The most likely outcome is that you'll see a bunch of errors. Don't worry, this is expected! Debugging ML models is notoriously hard, and the key ingredient to success is patience (and breakpoint()). In our experience, the hardest problems arise from subtle mismatches between ML frameworks, for which we have a few pointers at the end of this guide. In other cases, a general test might not be directly applicable to your model, in which case we suggest an override at the model test class level. Regardless of the issue, don't hesitate to ask for help in your draft pull request if you're stuck. When all tests pass, congratulations, your model is nearly ready to be added to the πŸ€— Transformers library! πŸŽ‰ 6.-7. Ensure everyone can use your model 6. Submit the pull request Once you're done with the implementation and the tests, it's time to submit a pull request. Before pushing your code, run our code formatting utility, make fixup πŸͺ„. This will automatically fix any formatting issues, which would cause our automatic checks to fail. It's now time to convert your draft pull request into a real pull request. To do so, click on the "Ready for review" button and add Joao (@gante) and Matt (@Rocketknight1) as reviewers. A model pull request will need at least 3 reviewers, but they will take care of finding appropriate additional reviewers for your model. After all reviewers are happy with the state of your PR, the final action point is to remove the from_pt=True flag in .from_pretrained() calls. Since there are no TensorFlow weights, you will have to add them! Check the section below for instructions on how to do it. Finally, when the TensorFlow weights get merged, you have at least 3 reviewer approvals, and all CI checks are green, double-check the tests locally one last time
NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 \ py.test -vv tests/models/brand_new_bert/test_modeling_tf_brand_new_bert.py and we will merge your PR! Congratulations on the milestone πŸŽ‰ 7. (Optional) Build demos and share with the world One of the hardest parts about open-source is discovery. How can the other users learn about the existence of your fabulous TensorFlow contribution? With proper communication, of course! πŸ“£ There are two main ways to share your model with the community: - Build demos. These include Gradio demos, notebooks, and other fun ways to show off your model. We highly encourage you to add a notebook to our community-driven demos. - Share stories on social media like Twitter and LinkedIn. You should be proud of your work and share your achievement with the community - your model can now be used by thousands of engineers and researchers around the world 🌍! We will be happy to retweet your posts and help you share your work with the community. Adding TensorFlow weights to πŸ€— Hub Assuming that the TensorFlow model architecture is available in πŸ€— Transformers, converting PyTorch weights into TensorFlow weights is a breeze! Here's how to do it: 1. Make sure you are logged into your Hugging Face account in your terminal. You can log in using the command huggingface-cli login (you can find your access tokens here) 2. Run transformers-cli pt-to-tf --model-name foo/bar, where foo/bar is the name of the model repository containing the PyTorch weights you want to convert 3. Tag @joaogante and @Rocketknight1 in the πŸ€— Hub PR the command above has just created That's it! πŸŽ‰ Debugging mismatches across ML frameworks πŸ› At some point, when adding a new architecture or when creating TensorFlow weights for an existing architecture, you might come across errors complaining about mismatches between PyTorch and TensorFlow. You might even decide to open the model architecture code for the two frameworks, and find that they look identical. What's going on? πŸ€” First of all, let's talk about why understanding these mismatches matters. Many community members will use πŸ€— Transformers models out of the box, and trust that our models behave as expected. When there is a large mismatch between the two frameworks, it implies that the model is not following the reference implementation for at least one of the frameworks. This might lead to silent failures, in which the model runs but has poor performance. This is arguably worse than a model that fails to run at all! To that end, we aim at having a framework mismatch smaller than 1e-5 at all stages of the model. As in other numerical problems, the devil is in the details. And as in any detail-oriented craft, the secret ingredient here is patience. Here is our suggested workflow for when you come across this type of issues: 1. Locate the source of mismatches. The model you're converting probably has near identical inner variables up to a certain point. Place breakpoint() statements in the two frameworks' architectures, and compare the values of the numerical variables in a top-down fashion until you find the source of the problems. 2. Now that you've pinpointed the source of the issue, get in touch with the πŸ€— Transformers team. It is possible that we've seen a similar problem before and can promptly provide a solution. As a fallback, scan popular pages like StackOverflow and GitHub issues. 3. If there is no solution in sight, it means you'll have to go deeper. The good news is that you've located the issue, so you can focus on the problematic instruction, abstracting away the rest of the model! The bad news is that you'll have to venture into the source implementation of said instruction. In some cases, you might find an issue with a reference implementation - don't abstain from opening an issue in the upstream repository. In some cases, in discussion with the πŸ€— Transformers team, we might find that fixing the mismatch is infeasible. When the mismatch is very small in the output layers of the model (but potentially large in the hidden states), we might decide to ignore it in favor of distributing the model. The pt-to-tf CLI mentioned above has a --max-error flag to override the error message at weight conversion time.
Share a model The last two tutorials showed how you can fine-tune a model with PyTorch, Keras, and πŸ€— Accelerate for distributed setups. The next step is to share your model with the community! At Hugging Face, we believe in openly sharing knowledge and resources to democratize artificial intelligence for everyone. We encourage you to consider sharing your model with the community to help others save time and resources. In this tutorial, you will learn two methods for sharing a trained or fine-tuned model on the Model Hub:
Programmatically push your files to the Hub. Drag-and-drop your files to the Hub with the web interface. To share a model with the community, you need an account on huggingface.co. You can also join an existing organization or create a new one.
Repository features Each repository on the Model Hub behaves like a typical GitHub repository. Our repositories offer versioning, commit history, and the ability to visualize differences. The Model Hub's built-in versioning is based on git and git-lfs. In other words, you can treat one model as one repository, enabling greater access control and scalability. Version control allows revisions, a method for pinning a specific version of a model with a commit hash, tag or branch. As a result, you can load a specific model version with the revision parameter:
model = AutoModel.from_pretrained( "julien-c/EsperBERTo-small", revision="v2.0.1" # tag name, or branch name, or commit hash ) Files are also easily edited in a repository, and you can view the commit history as well as the difference:
Files are also easily edited in a repository, and you can view the commit history as well as the difference: Setup Before sharing a model to the Hub, you will need your Hugging Face credentials. If you have access to a terminal, run the following command in the virtual environment where πŸ€— Transformers is installed. This will store your access token in your Hugging Face cache folder (~/.cache/ by default):
huggingface-cli login If you are using a notebook like Jupyter or Colaboratory, make sure you have the huggingface_hub library installed. This library allows you to programmatically interact with the Hub. pip install huggingface_hub Then use notebook_login to sign-in to the Hub, and follow the link here to generate a token to login with: from huggingface_hub import notebook_login notebook_login()
Convert a model for all frameworks To ensure your model can be used by someone working with a different framework, we recommend you convert and upload your model with both PyTorch and TensorFlow checkpoints. While users are still able to load your model from a different framework if you skip this step, it will be slower because πŸ€— Transformers will need to convert the checkpoint on-the-fly. Converting a checkpoint for another framework is easy. Make sure you have PyTorch and TensorFlow installed (see here for installation instructions), and then find the specific model for your task in the other framework.
Specify from_tf=True to convert a checkpoint from TensorFlow to PyTorch: pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True) pt_model.save_pretrained("path/to/awesome-name-you-picked") `` </pt> <tf> Specifyfrom_pt=True` to convert a checkpoint from PyTorch to TensorFlow: tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True) Then you can save your new TensorFlow model with its new checkpoint: tf_model.save_pretrained("path/to/awesome-name-you-picked") If a model is available in Flax, you can also convert a checkpoint from PyTorch to Flax: flax_model = FlaxDistilBertForSequenceClassification.from_pretrained( "path/to/awesome-name-you-picked", from_pt=True ) Push a model during training
Push a model during training Sharing a model to the Hub is as simple as adding an extra parameter or callback. Remember from the fine-tuning tutorial, the [TrainingArguments] class is where you specify hyperparameters and additional training options. One of these training options includes the ability to push a model directly to the Hub. Set push_to_hub=True in your [TrainingArguments]: training_args = TrainingArguments(output_dir="my-awesome-model", push_to_hub=True)
training_args = TrainingArguments(output_dir="my-awesome-model", push_to_hub=True) Pass your training arguments as usual to [Trainer]: trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics, )
trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics, ) After you fine-tune your model, call [~transformers.Trainer.push_to_hub] on [Trainer] to push the trained model to the Hub. πŸ€— Transformers will even automatically add training hyperparameters, training results and framework versions to your model card!
trainer.push_to_hub() `` </pt> <tf> Share a model to the Hub with [PushToHubCallback]. In the [PushToHubCallback`] function, add: An output directory for your model. A tokenizer. The hub_model_id, which is your Hub username and model name. from transformers import PushToHubCallback push_to_hub_callback = PushToHubCallback( output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model" )
from transformers import PushToHubCallback push_to_hub_callback = PushToHubCallback( output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model" ) Add the callback to fit, and πŸ€— Transformers will push the trained model to the Hub: model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
Add the callback to fit, and πŸ€— Transformers will push the trained model to the Hub: model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback) Use the push_to_hub function You can also call push_to_hub directly on your model to upload it to the Hub. Specify your model name in push_to_hub: pt_model.push_to_hub("my-awesome-model")
Use the push_to_hub function You can also call push_to_hub directly on your model to upload it to the Hub. Specify your model name in push_to_hub: pt_model.push_to_hub("my-awesome-model") This creates a repository under your username with the model name my-awesome-model. Users can now load your model with the from_pretrained function: from transformers import AutoModel model = AutoModel.from_pretrained("your_username/my-awesome-model")
from transformers import AutoModel model = AutoModel.from_pretrained("your_username/my-awesome-model") If you belong to an organization and want to push your model under the organization name instead, just add it to the repo_id: pt_model.push_to_hub("my-awesome-org/my-awesome-model") The push_to_hub function can also be used to add other files to a model repository. For example, add a tokenizer to a model repository: tokenizer.push_to_hub("my-awesome-model")
The push_to_hub function can also be used to add other files to a model repository. For example, add a tokenizer to a model repository: tokenizer.push_to_hub("my-awesome-model") Or perhaps you'd like to add the TensorFlow version of your fine-tuned PyTorch model: tf_model.push_to_hub("my-awesome-model")
Now when you navigate to your Hugging Face profile, you should see your newly created model repository. Clicking on the Files tab will display all the files you've uploaded to the repository. For more details on how to create and upload files to a repository, refer to the Hub documentation here. Upload with the web interface Users who prefer a no-code approach are able to upload a model through the Hub's web interface. Visit huggingface.co/new to create a new repository:
From here, add some information about your model: Select the owner of the repository. This can be yourself or any of the organizations you belong to. Pick a name for your model, which will also be the repository name. Choose whether your model is public or private. Specify the license usage for your model. Now click on the Files tab and click on the Add file button to upload a new file to your repository. Then drag-and-drop a file to upload and add a commit message.
Now click on the Files tab and click on the Add file button to upload a new file to your repository. Then drag-and-drop a file to upload and add a commit message. Add a model card To make sure users understand your model's capabilities, limitations, potential biases and ethical considerations, please add a model card to your repository. The model card is defined in the README.md file. You can add a model card by:
Manually creating and uploading a README.md file. Clicking on the Edit model card button in your model repository. Take a look at the DistilBert model card for a good example of the type of information a model card should include. For more details about other options you can control in the README.md file such as a model's carbon footprint or widget examples, refer to the documentation here.
GPU inference GPUs are the standard choice of hardware for machine learning, unlike CPUs, because they are optimized for memory bandwidth and parallelism. To keep up with the larger sizes of modern models or to run these large models on existing and older hardware, there are several optimizations you can use to speed up GPU inference. In this guide, you'll learn how to use FlashAttention-2 (a more memory-efficient attention mechanism), BetterTransformer (a PyTorch native fastpath execution), and bitsandbytes to quantize your model to a lower precision. Finally, learn how to use πŸ€— Optimum to accelerate inference with ONNX Runtime on Nvidia and AMD GPUs.
The majority of the optimizations described here also apply to multi-GPU setups! FlashAttention-2 FlashAttention-2 is experimental and may change considerably in future versions. FlashAttention-2 is a faster and more efficient implementation of the standard attention mechanism that can significantly speedup inference by: additionally parallelizing the attention computation over sequence length partitioning the work between GPU threads to reduce communication and shared memory reads/writes between them
FlashAttention-2 is currently supported for the following architectures: * Bark * Bart * DistilBert * Gemma * GPTBigCode * GPTNeo * GPTNeoX * Falcon * Llama * Llava * VipLlava * MBart * Mistral * Mixtral * OPT * Phi * StableLm * Starcoder2 * Qwen2 * Whisper You can request to add FlashAttention-2 support for another model by opening a GitHub Issue or Pull Request. Before you begin, make sure you have FlashAttention-2 installed.
pip install flash-attn --no-build-isolation We strongly suggest referring to the detailed installation instructions to learn more about supported hardware and data types! FlashAttention-2 is also supported on AMD GPUs and current support is limited to Instinct MI210 and Instinct MI250. We strongly suggest using this Dockerfile to use FlashAttention-2 on AMD GPUs.
To enable FlashAttention-2, pass the argument attn_implementation="flash_attention_2" to [~AutoModelForCausalLM.from_pretrained]: thon import torch from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaForCausalLM model_id = "tiiuae/falcon-7b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", )
FlashAttention-2 can only be used when the model's dtype is fp16 or bf16. Make sure to cast your model to the appropriate dtype and load them on a supported device before using FlashAttention-2. You can also set use_flash_attention_2=True to enable FlashAttention-2 but it is deprecated in favor of attn_implementation="flash_attention_2".
You can also set use_flash_attention_2=True to enable FlashAttention-2 but it is deprecated in favor of attn_implementation="flash_attention_2". FlashAttention-2 can be combined with other optimization techniques like quantization to further speedup inference. For example, you can combine FlashAttention-2 with 8-bit or 4-bit quantization:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaForCausalLM model_id = "tiiuae/falcon-7b" tokenizer = AutoTokenizer.from_pretrained(model_id) load in 8bit model = AutoModelForCausalLM.from_pretrained( model_id, load_in_8bit=True, attn_implementation="flash_attention_2", ) load in 4bit model = AutoModelForCausalLM.from_pretrained( model_id, load_in_4bit=True, attn_implementation="flash_attention_2", )
Expected speedups You can benefit from considerable speedups for inference, especially for inputs with long sequences. However, since FlashAttention-2 does not support computing attention scores with padding tokens, you must manually pad/unpad the attention scores for batched inference when the sequence contains padding tokens. This leads to a significant slowdown for batched generations with padding tokens. To overcome this, you should use FlashAttention-2 without padding tokens in the sequence during training (by packing a dataset or concatenating sequences until reaching the maximum sequence length). For a single forward pass on tiiuae/falcon-7b with a sequence length of 4096 and various batch sizes without padding tokens, the expected speedup is:
For a single forward pass on meta-llama/Llama-7b-hf with a sequence length of 4096 and various batch sizes without padding tokens, the expected speedup is: For sequences with padding tokens (generating with padding tokens), you need to unpad/pad the input sequences to correctly compute the attention scores. With a relatively small sequence length, a single forward pass creates overhead leading to a small speedup (in the example below, 30% of the input is filled with padding tokens):
But for larger sequence lengths, you can expect even more speedup benefits: FlashAttention is more memory efficient, meaning you can train on much larger sequence lengths without running into out-of-memory issues. You can potentially reduce memory usage up to 20x for larger sequence lengths. Take a look at the flash-attention repository for more details.
PyTorch scaled dot product attention PyTorch's torch.nn.functional.scaled_dot_product_attention (SDPA) can also call FlashAttention and memory-efficient attention kernels under the hood. SDPA support is currently being added natively in Transformers and is used by default for torch>=2.1.1 when an implementation is available. For now, Transformers supports SDPA inference and training for the following architectures: * Bart * GPTBigCode * Falcon * Gemma * Llama * Phi * Idefics * Whisper * Mistral * Mixtral * StableLm * Starcoder2 * Qwen2
README.md exists but content is empty. Use the Edit dataset card button to edit it.
Downloads last month
0
Edit dataset card