Checking object integrity
Amazon S3 uses checksum values to verify the integrity of data that you upload to or download
from Amazon S3. In addition, you can request that another checksum value be calculated for any
object that you store in Amazon S3. You can select from one of several checksum algorithms to use
when uploading or copying your data. Amazon S3 uses this algorithm to compute an additional
checksum value and store it as part of the object metadata. To learn more about how to use additional checksums to
verify data integrity, see Tutorial: Checking the integrity of data in Amazon S3 with additional checksums
When you upload an object, you can optionally include a precalculated checksum as part of your request. Amazon S3 compares the provided checksum to the checksum that it calculates by using your specified algorithm. If the two values don't match, Amazon S3 reports an error.
Using supported checksum algorithms
Amazon S3 offers you the option to choose the checksum algorithm that is used to validate your data during upload or download. You can select one of the following Secure Hash Algorithms (SHA) or Cyclic Redundancy Check (CRC) data-integrity check algorithms:
-
CRC-32
-
CRC-32C
-
SHA-1
-
SHA-256
When you upload an object, you can specify the algorithm that you want to use:
-
When you're using the AWS Management Console, you select the checksum algorithm that you want to use. When you do, you can optionally specify the checksum value of the object. When Amazon S3 receives the object, it calculates the checksum by using the algorithm that you specified. If the two checksum values don't match, Amazon S3 generates an error.
-
When you're using an SDK, you can set the value of the
ChecksumAlgorithm
parameter to the algorithm that you want Amazon S3 to use when calculating the checksum. Amazon S3 automatically calculates the checksum value. -
When you're using the REST API, you don't use the
x-amz-sdk-checksum-algorithm
parameter. Instead, you use one of the algorithm-specific headers (for example,x-amz-checksum-crc32
).
For more information about uploading objects, see Uploading objects.
To apply any of these checksum values to objects that are already uploaded to Amazon S3, you can copy the object. When you copy an object, you can specify whether you want to use the existing checksum algorithm or use a new one. You can specify a checksum algorithm when using any supported mechanism for copying objects, including S3 Batch Operations. For more information about S3 Batch Operations, see Performing object operations in bulk with Batch Operations.
Important
If you're using a multipart upload with additional checksums, the multipart part
numbers must use consecutive part numbers. When using additional checksums, if you
try to complete a multipart upload request with nonconsecutive part numbers, Amazon S3
generates an HTTP 500 Internal Server Error
error.
After uploading objects, you can get the checksum value and compare it to a precalculated or previously stored checksum value calculated using the same algorithm.
To learn more about using the console and specifying checksum algorithms to
use when uploading objects, see Uploading objects and Tutorial: Checking the integrity of data in Amazon S3 with additional checksums
The following example shows how you can use the AWS SDKs to upload a large file with multipart upload, download a large file, and validate a multipart upload file, all with using SHA-256 for file validation.
You can send REST requests to upload an object with a checksum value to verify the integrity of the data with PutObject. You can also retrieve the checksum value for objects using GetObject or HeadObject.
You can send a PUT
request to upload an object of up to 5 GB in a single operation. For more information, see the PutObject
in the AWS CLI Command Reference. You can also use get-object
and head-object
to retrieve the checksum of
an already-uploaded object to verify the integrity of the data.
For information, see Amazon S3 CLI FAQ in the AWS Command Line Interface User Guide.
Using Content-MD5 when uploading objects
Another way to verify the integrity of your object after uploading is to provide an
MD5 digest of the object when you upload it. If you calculate the MD5 digest for your
object, you can provide the digest with the PUT
command by using the
Content-MD5
header.
After uploading the object, Amazon S3 calculates the MD5 digest of the object and compares it to the value that you provided. The request succeeds only if the two digests match.
Supplying an MD5 digest isn't required, but you can use it to verify the integrity of the object as part of the upload process.
Using Content-MD5 and the ETag to verify uploaded objects
The entity tag (ETag) for an object represents a specific version of that object. Keep in mind that the ETag reflects changes only to the content of an object, not to its metadata. If only the metadata of an object changes, the ETag remains the same.
Depending on the object, the ETag of the object might be an MD5 digest of the object data:
-
If an object is created by the
PutObject
,PostObject
, orCopyObject
operation, or through the AWS Management Console, and that object is also plaintext or encrypted by server-side encryption with Amazon S3 managed keys (SSE-S3), that object has an ETag that is an MD5 digest of its object data. -
If an object is created by the
PutObject
,PostObject
, orCopyObject
operation, or through the AWS Management Console, and that object is encrypted by server-side encryption with customer-provided keys (SSE-C) or server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS), that object has an ETag that is not an MD5 digest of its object data. -
If an object is created by either the multipart upload process or the
UploadPartCopy
operation, the object's ETag is not an MD5 digest, regardless of the method of encryption. If an object is larger than 16 MB, the AWS Management Console uploads or copies that object as a multipart upload, and therefore the ETag isn't an MD5 digest.
For objects where the ETag is the Content-MD5
digest of the object, you
can compare the ETag value of the object with a calculated or previously stored
Content-MD5
digest.
Using trailing checksums
When uploading objects to Amazon S3, you can either provide a precalculated checksum for the object or use an AWS SDK to automatically create trailing checksums on your behalf. If you decide to use a trailing checksum, Amazon S3 automatically generates the checksum by using your specified algorithm and uses it to validate the integrity of the object during upload.
To create a trailing checksum when using an AWS SDK, populate the
ChecksumAlgorithm
parameter with your preferred algorithm. The SDK uses
that algorithm to calculate the checksum for your object (or object parts) and
automatically appends it to the end of your upload request. This behavior saves you time
because Amazon S3 performs both the verification and upload of your data in a single pass.
Important
If you're using S3 Object Lambda, all requests to S3 Object Lambda are signed using
s3-object-lambda
instead of s3
. This behavior affects
the signature of trailing checksum values. For more information about S3 Object Lambda, see Transforming objects with S3 Object Lambda.
Using part-level checksums for multipart uploads
When objects are uploaded to Amazon S3, they can either be uploaded as a single object or through the multipart upload process. Objects that are larger than 16 MB and uploaded through the console are automatically uploaded using multipart uploads. For more information about multipart uploads, see Uploading and copying objects using multipart upload.
When an object is uploaded as a multipart upload, the ETag for the object is not an MD5 digest of the entire object. Amazon S3 calculates the MD5 digest of each individual part as it is uploaded. The MD5 digests are used to determine the ETag for the final object. Amazon S3 concatenates the bytes for the MD5 digests together and then calculates the MD5 digest of these concatenated values. The final step for creating the ETag is when Amazon S3 adds a dash with the total number of parts to the end.
For example, consider an object uploaded with a multipart upload that has an ETag of
C9A5A6878D97B48CC965C1E41859F034-14
. In this case,
C9A5A6878D97B48CC965C1E41859F034
is the MD5 digest of all the digests
concatenated together. The -14
indicates that there are 14 parts associated
with this object's multipart upload.
If you've enabled additional checksum values for your multipart object, Amazon S3 calculates the checksum for each individual part by using the specified checksum algorithm. The checksum for the completed object is calculated in the same way that Amazon S3 calculates the MD5 digest for the multipart upload. You can use this checksum to verify the integrity of the object.
To retrieve information about the object, including how many parts make up the entire object, you can use the GetObjectAttributes operation. With additional checksums, you can also recover information for each individual part that includes each part's checksum value.
For completed uploads, you can get an individual part's checksum by using the GetObject or HeadObject operations and specifying a part number or byte range that aligns to a single part. If you want to retrieve the checksum values for individual parts of multipart uploads still in progress, you can use ListParts.
Because of how Amazon S3 calculates the checksum for multipart upload objects, the checksum value
for the object might change if you copy it. If you're using an SDK or the REST API and
you call CopyObject, Amazon S3 copies
any object up to the size limitations of the CopyObject
API operation. Amazon S3
does this copy as a single action, regardless of whether the object was uploaded in a
single request or as part of a multipart upload. With a copy command, the checksum of
the object is a direct checksum of the full object. If the object was originally
uploaded using a multipart upload, then the checksum value changes even though the data
has not.
Note
Objects that are larger than the size limitations of the CopyObject
API operation must use multipart copy commands.
Important
When you perform some operations using the AWS Management Console, Amazon S3 uses a multipart upload if the object is greater than 16 MB in size. In this case, the checksum is not a direct checksum of the full object, but rather a calculation based on the checksum values of each individual part.
For example, consider an object 100 MB in size that you uploaded as a single-part direct upload using the REST API. The checksum in this case is a checksum of the entire object. If you later use the console to rename that object, copy it, change the storage class, or edit the metadata, Amazon S3 uses the multipart upload functionality to update the object. As a result, Amazon S3 creates a new checksum value for the object that is calculated based on the checksum values of the individual parts.
The preceding list of console operations is not a complete list of all the possible actions that you can take in the AWS Management Console that result in Amazon S3 updating the object using the multipart upload functionality. Keep in mind that whenever you use the console to act on objects over 16 MB in size, the checksum value might not be the checksum of the entire object.