Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Database timeout on large platform library #1123

Open
Perrylicious opened this issue Aug 27, 2024 · 12 comments
Open

[Bug] Database timeout on large platform library #1123

Perrylicious opened this issue Aug 27, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@Perrylicious
Copy link

Perrylicious commented Aug 27, 2024

RomM version
RomM 3.4.0

Describe the bug
On platforms with large numbers of entries, the database response seemingly times out

To Reproduce
Steps to reproduce the behavior:

  1. Have the DOS platform
  2. Acquire 8284+ titles for the DOS platform
  3. Try to open the DOS platform
  4. Frontend shows Error "AxiosError: Request failed with status code 504" and a 404 page "platform not found" is displayed

Expected behavior
The platform should open. This is not platform specific, the Nint*ndo DS platform with 7782 titles also times out.

Logs
From the log files of the container (sanitized for local hosts and IPs):

2024/08/27 11:48:05 [error] 27#27: *399 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.x.x, server: localhost, request: "GET /api/roms?platform_id=20&search_term=&order_by=name&order_dir=asc HTTP/1.1", upstream: "https://unix:/tmp/gunicorn.sock/api/roms?platform_id=20&search_term=&order_by=name&order_dir=asc", host: "xxxxxx", referrer: "https://xxxxxxxx/platform/20"

INFO: [nginx][2024-08-27 11:48:05] 192.168.1.88 - - "GET /api/roms?platform_id=20&search_term=&order_by=name&order_dir=asc HTTP/1.1" 504 569 "https://xxxxxxxx/platform/20" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36" rt=60.002 uct="0.000" uht="-" urt="60.003"

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Firefox and Chrome were tested
  • Version: latest on Browsers and current on Windows updates

Additional context
There are requests that append a &limit=15 to their calls but this request seems to pull the entire library at once. The CPU of the virtual machine - 4 cores available and the only thing on the host pulling any significant CPU - goes up to 25% during the abovementioned DB query, then drops back to zero after the timeout.

Maybe options would be to either not pull the entirety of the platform at once OR introduce an env variable to increase timeout

@Perrylicious Perrylicious added the bug Something isn't working label Aug 27, 2024
@gantoine
Copy link
Member

Acquire 8284+ titles for the DOS platform

tenor

@gantoine
Copy link
Member

We don't paginate collections at the moment, but the last release (3.4.0) introduced a regression that doubles gallery fetch times. Can you try the 3.5.0-beta.1 and see if it fixes the timeouts?

@Perrylicious
Copy link
Author

Perrylicious commented Aug 27, 2024

Just pulled 3.5.0-beta.1 and unfortunately timeouts remain. There was some sort of background process running on startup that took quite some time and ended in


INFO:	  [RomM][2024-08-27 20:57:44] ✔️ Scan completed 
20:57:44 high: Job OK (0c511034-9f11-40c7-9966-68097c54d235)
20:57:44 Result is kept for 500 seconds
20:57:44 Cleaning registries for queue: high
20:57:44 Cleaning registries for queue: default
20:57:44 Cleaning registries for queue: low

However, still 404 on the DOS and NDS collections (8284 and 7782 items respectively, according to the ROMM frontend).

Frontend message: Couldn't fetch roms for platform ID 63: AxiosError: Request failed with status code 504

63 is the DOS platform, 20 is NDS (first post)

While the loading circle is spinning, I can browse firmware that I uploaded for the platform, so it knows where it is before the timeout makes it display 404, if that helps. Using the filter to try and limit display items at that stage has no effect though.

Logs:

INFO:	  [nginx][2024-08-27 21:00:56]	192.168.1.88 - - "GET /api/stats HTTP/1.1" 200 91 "https://rom.mylocaldomain.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=1.695 uct="0.000" uht="1.694" urt="1.694"

INFO:	  [nginx][2024-08-27 21:00:57]	192.168.1.88 - - "GET /assets/platforms/undefined.ico HTTP/1.1" 404 153 "https://rom.mylocaldomain.io/platform/63" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=0.000 uct="-" uht="-" urt="-"

[2024-08-27 21:01:22 +0000] [18] [CRITICAL] WORKER TIMEOUT (pid:750)

INFO:	  [nginx][2024-08-27 21:01:23]	192.168.1.88 - - "GET /api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc HTTP/1.1" 502 157 "https://rom.mylocaldomain.io/platform/63" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=25.924 uct="0.000" uht="-" urt="25.925"

2024/08/27 21:01:23 [error] 29#29: *385 upstream prematurely closed connection while reading response header from upstream, client: 192.168.1.88, server: localhost, request: "GET /api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc HTTP/1.1", upstream: "https://unix:/tmp/gunicorn.sock:/api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc", host: "rom.mylocaldomain.io", referrer: "https://rom.mylocaldomain.io/platform/63"

[2024-08-27 21:01:23 +0000] [18] [ERROR] Worker (pid:750) was sent code 134!

[2024-08-27 21:01:23 +0000] [5221] [INFO] Booting worker with pid: 5221

INFO:	  [RomM][2024-08-27 21:01:24] Connecting to redis in /src/.venv/bin/gunicorn...

INFO:	  [RomM][2024-08-27 21:01:24] Redis connection established in /src/.venv/bin/gunicorn!

INFO:	  [RomM][2024-08-27 21:01:24] Connecting to redis in /src/.venv/bin/gunicorn...

INFO:	  [RomM][2024-08-27 21:01:24] Redis connection established in /src/.venv/bin/gunicorn!

[2024-08-27 21:01:25 +0000] [5221] [INFO] Started server process [5221]

[2024-08-27 21:01:25 +0000] [5221] [INFO] Waiting for application startup.

[2024-08-27 21:01:25 +0000] [5221] [INFO] Application startup complete.

EDIT: Meanwhile, at the DB container:

2024-08-27 21:01:23 12 [Warning] Aborted connection 12 to db: 'romm' user: 'romm-user' host: '172.18.0.3' (Got an error writing communication packets)

So from timing I'd say that's a 30 second timeout, if that could be increased by means of an env that might give the query enough time to cook. I don't know if pagination is feasible if you need to query all roms from the platform for stats.

The host machine is an older i5-6500T with 8GB of RAM available to ROMM if that has any bearing on this.

EDIT 2: The .88 client in the above logs is an NPMplus reverse proxy for SSL termination. To make sure it's not somehow related to the reverse proxy, I also ran the same query against unsecured HTTP directly via the romm host IP. TL;DR same result

[2024-08-27 21:22:14 +0000] [18] [CRITICAL] WORKER TIMEOUT (pid:5221)

INFO:	  [nginx][2024-08-27 21:22:14]	192.168.1.201 - - "GET /ws/socket.io/?EIO=4&transport=websocket HTTP/1.1" 101 116 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=24.415 uct="0.000" uht="0.022" urt="24.415"

2024/08/27 21:22:14 [error] 29#29: *388 upstream prematurely closed connection while reading response header from upstream, client: 192.168.1.201, server: localhost, request: "GET /api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc HTTP/1.1", upstream: "https://unix:/tmp/gunicorn.sock:/api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc", host: "192.168.1.87", referrer: "https://192.168.1.87/platform/63"

INFO:	  [nginx][2024-08-27 21:22:14]	192.168.1.201 - - "GET /api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc HTTP/1.1" 502 157 "https://192.168.1.87/platform/63" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=19.796 uct="0.000" uht="-" urt="19.796"

[2024-08-27 21:22:14 +0000] [18] [ERROR] Worker (pid:5221) was sent code 134!

[2024-08-27 21:22:14 +0000] [6976] [INFO] Booting worker with pid: 6976

[2024-08-27 21:22:16 +0000] [2997] [INFO] None - "WebSocket /ws/socket.io/?EIO=4&transport=websocket" [accepted]

[2024-08-27 21:22:16 +0000] [2997] [INFO] connection open

INFO:	  [RomM][2024-08-27 21:22:16] Connecting to redis in /src/.venv/bin/gunicorn...

INFO:	  [RomM][2024-08-27 21:22:16] Redis connection established in /src/.venv/bin/gunicorn!

INFO:	  [RomM][2024-08-27 21:22:16] Connecting to redis in /src/.venv/bin/gunicorn...

INFO:	  [RomM][2024-08-27 21:22:16] Redis connection established in /src/.venv/bin/gunicorn!

[2024-08-27 21:22:17 +0000] [6976] [INFO] Started server process [6976]

[2024-08-27 21:22:17 +0000] [6976] [INFO] Waiting for application startup.

[2024-08-27 21:22:17 +0000] [6976] [INFO] Application startup complete.

@gantoine
Copy link
Member

@Perrylicious Thanks to your very thorough comments I was able to track down the main issue, and we just merged a fix in! It'll be available in the next release. If you still see it after 3.5.0 is release please comment on this issue again.

@Perrylicious
Copy link
Author

Perrylicious commented Sep 1, 2024

Hi, just pulled 3.5.0 release and issues persist. Logs look exactly the same. Ran a quick scan after pulling 3.5.0, waited for it to be done, tried to open DOS platform, exact same behaviour.

Anything else I can do or produce logs of to help?

Also, this used to work until ~3.4.0? Maybe it stopped working earlier, but at some point I was editing DOS games out of the 8284 to match and provide additional local info. I'm not too fussed about losing that, but I can't access 2 platforms with the current version

INFO:	  [RomM][2024-09-01 09:04:14] ✔️ Scan completed 

09:04:14 high: Job OK (2f25f97e-fed1-4933-a4d0-835764370dff)

09:04:14 Result is kept for 500 seconds

INFO:	  [nginx][2024-09-01 09:04:14]	192.168.1.88 - - "GET /ws/socket.io/?EIO=4&transport=websocket HTTP/1.1" 101 3738 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=3105.108 uct="0.000" uht="0.014" urt="3105.108"

[2024-09-01 09:04:14 +0000] [21] [INFO] connection closed

09:04:14 Cleaning registries for queue: high

09:04:14 Cleaning registries for queue: default

09:04:14 Cleaning registries for queue: low

192.168.1.201:0 - "GET /api/platforms HTTP/1.0" 200

INFO:	  [nginx][2024-09-01 09:04:15]	192.168.1.88 - - "GET /api/platforms HTTP/1.1" 200 53818 "https://rom.xxxxxxxx.io/scan" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=0.051 uct="0.000" uht="0.051" urt="0.051"

2024/09/01 09:05:55 [error] 27#27: *377 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.1.88, server: localhost, request: "GET /api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc HTTP/1.1", upstream: "https://unix:/tmp/gunicorn.sock/api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc", host: "rom.xxxxxxxxxxxx.io", referrer: "https://rom.xxxxxxxxxxx.io/platform/63"

INFO:	  [nginx][2024-09-01 09:05:55]	192.168.1.88 - - "GET /api/roms?platform_id=63&search_term=&order_by=name&order_dir=asc HTTP/1.1" 504 167 "https://rom.xxxxxxxxxxxxxxx.io/platform/63" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0" rt=60.060 uct="0.000" uht="-" urt="60.060"

[2024-09-01 09:06:25 +0000] [18] [CRITICAL] WORKER TIMEOUT (pid:21)

[2024-09-01 09:06:26 +0000] [18] [ERROR] Worker (pid:21) was sent code 134!

[2024-09-01 09:06:26 +0000] [4596] [INFO] Booting worker with pid: 4596

INFO:	  [RomM][2024-09-01 09:06:27] Connecting to redis in /src/.venv/bin/gunicorn...

INFO:	  [RomM][2024-09-01 09:06:27] Redis connection established in /src/.venv/bin/gunicorn!

INFO:	  [RomM][2024-09-01 09:06:27] Connecting to redis in /src/.venv/bin/gunicorn...

INFO:	  [RomM][2024-09-01 09:06:27] Redis connection established in /src/.venv/bin/gunicorn!

[2024-09-01 09:06:28 +0000] [4596] [INFO] Started server process [4596]

[2024-09-01 09:06:28 +0000] [4596] [INFO] Waiting for application startup.

[2024-09-01 09:06:28 +0000] [4596] [INFO] Application startup complete.

@gantoine gantoine reopened this Sep 1, 2024
@gantoine gantoine self-assigned this Sep 1, 2024
@gantoine
Copy link
Member

gantoine commented Sep 1, 2024

Hmmm I was testing it with 10K games, but that was on my local machine (not the server), with pretty fast read speeds. What changed in 3.4.0 is that we moved the logic of detecting sibling roms to the backend, which needs to compare each rom with eachother (using a vritual table).

What I'm going to do is limit the returned roms to ~2500, and you need to find a specific one you can search using the 🔎 icon in the navbar. How does that sound?

@NicksGhost
Copy link

Hmmm I was testing it with 10K games, but that was on my local machine (not the server), with pretty fast read speeds. What changed in 3.4.0 is that we moved the logic of detecting sibling roms to the backend, which needs to compare each rom with eachother (using a vritual table).

What I'm going to do is limit the returned roms to ~2500, and you need to find a specific one you can search using the 🔎 icon in the navbar. How does that sound?

Hello, I have collected about 14K of arcade ROMs. I just upgraded to 3.5.0 but still can't open the arcade page. A 504 error code appears. Is there any solution at present?

@gantoine
Copy link
Member

gantoine commented Sep 1, 2024

@NicksGhost Wait for the next patch release in a couple days

@Perrylicious
Copy link
Author

Perrylicious commented Sep 2, 2024

I've been ruminating over this for a few hours and here's what I came up with:

While for the vast majority of platforms that number would suffice, 2500 is a very low number for some other platforms (especially those that enable homebrew) and if that is output alphabetically, the user would need to be able to pull the remainder when you reach the "bottom" of those 2500.

What I worry about is that if you filter to 2500 entries at time of pull, any UI-based filtering such as "show items without metadata" or "show games with missing covers" will also only filter through those 2500 entries from my understanding. My main reason to open the collection, apart from the joy of looking at the covers, is to make sure that I either match unmatched games - which is another can of worms, because for a (fictional) ROM name like "Bada-bang" the search isn't fuzzy enough to recognize "Badabang" or "Bada Bang" as viable title options so you have to correct that to match the IGDB/Mobygames DB - or add covers / other metadata.

Here's a few options where I don't know how feasible they are. Please note I'm not trying to do some backseat programming, I am not skilled enough to actually help in-depth, so these are very naive layman ideas where I don't know what they would take to realize.

If this is calculating groups and siblings every time at the time of a DB pull, that might increase the load time. Especially on systems where homebrew is encouraged or systems where the common ROM repositories also hold a lot of "DEMO" and "BETA" versions alongside the region/language version per game, there might be a lot of siblings and groups. Would it maybe be helpful to put the siblings or grouped IDs into a DB table or field attached to all grouped or sibling items so you don't have to run these relations or calculations on the fly? That way, on a pull, you only pull the main item with the "+3" overlay and only when you open it, the siblings are actually pulled. "Main item" can already be designated by the checkbox on the game or could be "first uploaded" of the siblings.

Otherwise, you can maybe sequence the pull, e.g. pull first 500 for fast display, then have a loading status while you load in the the rest or use a "load more results" button. Keeping the user informed instead of them looking at a throbber might also give them a better idea about system performance or loading times.

I don't think a user-side pre-filter on large collections would be a good UX (e.g. telling the user "this is a large collection. Sort by new?"). If you feel like this is just a volume issue (e.g. number of DB items to be pulled) and not something that runs into memory constraints or kills processes for sport, I'll still put up the idea of increasing timeout before the DB query dies by means of an env.

@gantoine
Copy link
Member

gantoine commented Sep 2, 2024

you reach the "bottom" of those 2500

that would take you a good 10 minutes of scrolling right now, if you need to find a specific game use the search button in the navbar

any UI-based filtering such as "show items without metadata" or "show games with missing covers" will also only filter through those 2500

that is correct, and the fix for this is to paginate the list of roms and do the filtering on the server. there is a plan to do this, but it'll take a while before we get around to it. once we do have it everything should just work. 👌🏼

helpful to put the siblings or grouped IDs into a DB table or field attached to all grouped or sibling items so you don't have to run these relations or calculations on the fly

the siblings are already in a virtual table, but joining that table on 10K+ roms takes a serious amount of time either way. we still need to calculate siblings when getting a list of roms to show the 3+ icons/merge covers in the UI.

@Perrylicious
Copy link
Author

Perrylicious commented Sep 8, 2024

Just tested this on 3.5.1 and while yes, the platform does open, the cutoff is a huge limitation that makes the platform practically unuseable. Having to know every rom you have after 2500 so you can jump to it by searching is not a good UX, because that's the opposite of what the platform is for. If I still have to wrangle the folder itself, I don't need to put in the ROMM layer.

that would take you a good 10 minutes of scrolling right now, if you need to find a specific game use the search button in the navbar

That's OK for a temporary fix to get platforms to load.

However if this is where development is headed, then ROMM can not be used for large platform rom collections, full stop. And I don't even have that large a collection for these platforms; the DOS one isn't even the complete repository of known DOS games and ROMM can't handle something like the NDS archivary set.

The person above with their 15.000 arcade games collection could only see a sixth of their collection of arcade games before they would have to switch to "find by search" which defies the aspect of browsing your collection and looking at the covers.

I open the DOS platform to fix missing metadata and covers but it cuts off at roms starting with the letter E. What other games do I have beyond E that need their data fixed? I don't know from just looking at the folder and ROMM won't tell me. The solution can't be to just manually search for every game past the first 2500 and then assess what fixes they need.

My usual workflow is to open the platform, filter for games with missing metadata - which from what I assume won't be filtering beyond the first 2500 entries - and then edit what's missing. I didn't need 10 minutes to scroll to the end, I just went to the last page of roms that ROMM is deciding to show me, and it ends there.

To make this functionally useable on large platform libraries for archivary purposes, there should be a way to load in the rest of the roms. Can you put in a "load more" button that performs the same DB request but with the equivalent of "limit=2500&startentry=2501"?

If I misunderstand the scope of the project and that's not the use case for ROMM, I apologize.

@gantoine
Copy link
Member

gantoine commented Sep 8, 2024

Bro chill, we're going to add pagination back at some point, but that requires moving filters to the server, as right now all filtering happens client side.

However if this is where development is headed, then ROMM can not be used for large platform rom collections, full stop.

This issue is currently limited to maybe two dozen users, so we're not going to scramble to make that change. It'll happen with it happens (or when someone contributes a PR to make it happen).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants