Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf : add input validation, prevent integer overflows #709

Merged
merged 8 commits into from
Jan 29, 2024
Merged

Conversation

ggerganov
Copy link
Owner

Input validation when reading GGUF files

Without these checks it is possible for crafted GGUF files to cause integer overflows, followed by heap overflows.

Thanks to Databricks team for responsibly reporting these

@slaren
Copy link
Collaborator

slaren commented Jan 27, 2024

Not directly related to this, but shouldn't these cases be default: instead to detect invalid types above GGUF_TYPE_COUNT?

ggml/src/ggml.c

Lines 19360 to 19363 in 866cc68

case GGUF_TYPE_COUNT: GGML_ASSERT(false && "invalid type"); break;
}
} break;
case GGUF_TYPE_COUNT: GGML_ASSERT(false && "invalid type");

@slaren
Copy link
Collaborator

slaren commented Jan 27, 2024

I think this may also cause a buffer overflow if n_dims > 4:

ggml/src/ggml.c

Lines 19391 to 19394 in 866cc68

ok = ok && gguf_fread_el (file, &info->n_dims, sizeof(info->n_dims), &offset);
for (uint32_t j = 0; j < info->n_dims; ++j) {
ok = ok && gguf_fread_el(file, &info->ne[j], sizeof(info->ne[j]), &offset);
}

@slaren
Copy link
Collaborator

slaren commented Jan 27, 2024

This type is not checked and then later is used to call ggml functions such as ggml_type_name that may cause an overflow:

ggml/src/ggml.c

Line 19395 in 866cc68

ok = ok && gguf_fread_el (file, &info->type, sizeof(info->type), &offset);

@ggerganov ggerganov requested a review from slaren January 28, 2024 17:57
src/ggml.c Outdated
ok = ok && gguf_fread_el (file, &info->type, sizeof(info->type), &offset);
ok = ok && gguf_fread_el (file, &info->offset, sizeof(info->offset), &offset);

ok = ok && (info->type < GGML_TYPE_COUNT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that the underlying type of the enum is unsigned, so I think it would be good to also check for type >= 0.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes look pretty good, I would also suggest checking the return of calloc/malloc for error, currently it will return a null pointer and crash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed you're using GGUF_TYPE_SIZE indexing using an arbitrary value read from the file. This could be used to index to a large value elsewhere and cause a wrap again, a check needs to be added when reading in the type value.

@slaren
Copy link
Collaborator

slaren commented Jan 28, 2024

ggml/src/ggml.c

Lines 19439 to 19453 in fb345bd

const int64_t ne =
(int64_t) info->ne[0] *
(int64_t) info->ne[1] *
(int64_t) info->ne[2] *
(int64_t) info->ne[3];
if (ne % ggml_blck_size(info->type) != 0) {
fprintf(stderr, "%s: tensor '%s' of type %d (%s) number of elements (%" PRId64 ") is not a multiple of block size (%d)\n",
__func__, info->name.data, (int)info->type, ggml_type_name(info->type), ne, ggml_blck_size(info->type));
fclose(file);
gguf_free(ctx);
return NULL;
}
const size_t size_cur = ggml_row_size(info->type, ne);

There are a few potential integer overflows here when computing the size, and this size may be later used to allocate a buffer, which then may result in a buffer overflow when reading the data.

@slaren
Copy link
Collaborator

slaren commented Jan 28, 2024

ggml/src/ggml.c

Line 19400 in fb345bd

ok = ok && gguf_fread_el (file, &info->offset, sizeof(info->offset), &offset);

We should probably check that the offset and the end of the tensor is within the size of the file.

@slaren
Copy link
Collaborator

slaren commented Jan 28, 2024

ggml/src/ggml.c

Line 19399 in fb345bd

ok = ok && gguf_fread_el (file, &info->type, sizeof(info->type), &offset);

We should check that the type is valid.

@slaren
Copy link
Collaborator

slaren commented Jan 28, 2024

ggml/src/ggml.c

Lines 19466 to 19469 in fb345bd

const size_t mem_size =
params.no_alloc ?
(ctx->header.n_tensors )*ggml_tensor_overhead() :
(ctx->header.n_tensors + 1)*ggml_tensor_overhead() + ctx->size;

It seem unlikely to happen in practice, even with no_alloc a large enough n_tensors could cause an integer overflow. It would probably be good to wrap all the integer operations in functions that check for overflows.

@slaren
Copy link
Collaborator

slaren commented Jan 28, 2024

ggml/src/ggml.c

Line 19288 in fb345bd

ctx->kv = malloc(ctx->header.n_kv * sizeof(struct gguf_kv));

This is another potential integer overflow that may cause a the buffer to be too small.

@ggerganov
Copy link
Owner Author

Think I covered all of the listed concerns. Will merge this and if you spot something else, directly open a PR. Thanks

@ggerganov ggerganov merged commit aec8550 into master Jan 29, 2024
4 of 9 checks passed
@ggerganov ggerganov deleted the gg/gguf-sec branch January 29, 2024 12:00
@ggerganov ggerganov mentioned this pull request Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants