Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch: Request Entity Too Large #28117

Closed
MarkusAmshove opened this issue Nov 19, 2023 · 9 comments
Closed

Elasticsearch: Request Entity Too Large #28117

MarkusAmshove opened this issue Nov 19, 2023 · 9 comments
Labels
Milestone

Comments

@MarkusAmshove
Copy link
Contributor

Description

I've tried to enable code indexing in our instance using Elasticsearch, but I get the following error for a lot of repositories:

workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 20 items, backoff for a few seconds
indexer.go:128:func2() [E] Codes indexer handler: index error for repo 713: elastic: Error 413 (Request Entity Too Large)
workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 1 items, backoff for a few seconds

I've changed the setting http.max_content_length in the Elasticsearch config to the maximum possile value 2147483647b but the error still comes up.

This also comes up for a lot of repositories, not just our biggest ones.

I'm unsure how the indexer works, does it take the whole sourcecode of a branch and pumps it into elastic? Is some kind of batching per x files needed?

Gitea Version

1.21.0

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

Running Gitea on Linux amd64 with the official binary and Elasticsearch within Docker

Database

None

@MarkusAmshove
Copy link
Contributor Author

MarkusAmshove commented Nov 19, 2023

The repository sizes (as reported in the gitea web ui) from some repositories that I picked out of the log are:

  • 1.1 MiB
  • 7.8 MiB
  • 2 MiB
  • 598 MiB
  • 154 MiB
  • 2.8 MiB

That makes me wonder if the small repositories are batched together with the big ones which then exceeds the request limit.

Reducing the max file size to MAX_FILE_SIZE=10000 does not seem to resolve the issue.

@inferno-umar
Copy link
Contributor

Me 2 having the same issue.
My repo size in UI is 319MiB not indexing

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Feb 4, 2024

to inferno-umar : Does this answer help?

https://stackoverflow.com/questions/58490210/the-remote-server-returned-an-error-413-request-entity-too-large-elasticsear


update: MarkusAmshove's report said that they have tried http.max_content_length, I am wondering whether these problems are the same.

@wxiaoguang wxiaoguang added issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail and removed issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail labels Feb 4, 2024
@wxiaoguang
Copy link
Contributor

Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273

if len(reqs) > 0 {
_, err := b.inner.Client.Bulk().
Index(b.inner.VersionedIndexName()).
Add(reqs...).
Do(ctx)
return err
}

@inferno-umar
Copy link
Contributor

inferno-umar commented Feb 4, 2024

Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273

if len(reqs) > 0 {
_, err := b.inner.Client.Bulk().
Index(b.inner.VersionedIndexName()).
Add(reqs...).
Do(ctx)
return err
}

Yeah! you're right Gitea is putting everything in 1 request before sending it to elastic search, not batching it, as shown in my error logs below:

....
...exer/code/indexer.go:128:func2() [E] Codes indexer handler: index error for repo 28: elastic: Error 413 (Request Entity Too Large)
...queue/workergroup.go:102:doWorkerHandle() [E] Queue "code_indexer" failed to handle batch of 1 items, backoff for a few seconds
.....

@inferno-umar
Copy link
Contributor

After finding out I pushed my elasticsearch maximum limit to 2147483647b then I'm getting the following error 429 (too many requests):

 ...exer/code/indexer.go:128:func2() [E] Codes indexer handler: index error for repo 28: elastic: Error 429 (Too Many Requests): [in_flight_requests] Data too large, data for [<http_request>] would be  [1272753908/1.1gb], which is larger than the limit of [1073741824/1gb] [type=circuit_breaking_exception]

@inferno-umar
Copy link
Contributor

I'm trying to fix this issue in the code by batching the requests

Unfortunately, after a quick look, I think your guess is right ... maybe Gitea does put everything into one request and send it to elasticsearch, since the first elasticsearch PR: #10273

if len(reqs) > 0 {
_, err := b.inner.Client.Bulk().
Index(b.inner.VersionedIndexName()).
Add(reqs...).
Do(ctx)
return err
}

@lunny lunny added this to the 1.21.6 milestone Feb 6, 2024
lunny pushed a commit that referenced this issue Feb 7, 2024
Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in #28117

This issue occured in large repositories while Gitea tries to 
index the code using ElasticSearch.

I've applied necessary changes that takes batch length from below config
(app.ini)
```
[queue.code_indexer]
BATCH_LENGTH=<length_int>
```
and batches all requests to Elasticsearch in chunks as configured in the
above config
GiteaBot pushed a commit to GiteaBot/gitea that referenced this issue Feb 7, 2024
…#29062)

Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in go-gitea#28117

This issue occured in large repositories while Gitea tries to 
index the code using ElasticSearch.

I've applied necessary changes that takes batch length from below config
(app.ini)
```
[queue.code_indexer]
BATCH_LENGTH=<length_int>
```
and batches all requests to Elasticsearch in chunks as configured in the
above config
wxiaoguang pushed a commit that referenced this issue Feb 7, 2024
Backport #29062 by @inferno-umar

Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in #28117

This issue occured in large repositories while Gitea tries to 
index the code using ElasticSearch.

Co-authored-by: dark-angel <[email protected]>
@lunny
Copy link
Member

lunny commented Feb 19, 2024

Fixed by #29075

@lunny lunny closed this as completed Feb 19, 2024
silverwind pushed a commit to silverwind/gitea that referenced this issue Feb 20, 2024
…#29062)

Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in go-gitea#28117

This issue occured in large repositories while Gitea tries to 
index the code using ElasticSearch.

I've applied necessary changes that takes batch length from below config
(app.ini)
```
[queue.code_indexer]
BATCH_LENGTH=<length_int>
```
and batches all requests to Elasticsearch in chunks as configured in the
above config
6543 pushed a commit to 6543-forks/gitea that referenced this issue Feb 26, 2024
…o-gitea#29062)

Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in go-gitea#28117

This issue occured in large repositories while Gitea tries to
index the code using ElasticSearch.

I've applied necessary changes that takes batch length from below config
(app.ini)
```
[queue.code_indexer]
BATCH_LENGTH=<length_int>
```
and batches all requests to Elasticsearch in chunks as configured in the
above config

(cherry picked from commit 5c0fc90)
Copy link

github-actions bot commented Mar 1, 2024

Automatically locked because of our CONTRIBUTING guidelines

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants