Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support native YUV 4:4:4 encoding (Windows-only for now) #2533

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ns6089
Copy link
Contributor

@ns6089 ns6089 commented May 15, 2024

Description

Adds support for YUV 4:4:4 encoding, requires changes on moonlight side. Windows-only for now.

moonlight-common-c pull request: moonlight-stream/moonlight-common-c#91 merged
moonlight-qt pull request: moonlight-stream/moonlight-qt#1282 draft

Current state

  • nvidia gpus support is implemented by using nvenc directly (Windows-only)
  • intel gpus support was implemented blindly, no idea if it works at all (Windows-only)
  • amd gpus don't support YUV 4:4:4 at all
  • nvenc doesn't accept any direct3d surfaces for 10-bit 4:4:4 encoding, so we have to use cuda interop
    • cuda runtime can't be unloaded once loaded
      • it doesn't seem to affect gpu idle power state, so we should be fine
    • nvenc in cuda mode leaks cpu memory on decoder destruction (nvenc-mapped cuda surfaces can't be unmapped and unregistered)
      • fixed by slightly adjusting the api calls
  • linux support may be possible through ffmpeg, but not yet implemented or even investigated

Screenshot

moonlight_yuv444

Issues Fixed or Closed

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Dependency update (updates to dependencies)
  • Documentation update (changes to documentation)
  • Repository update (changes to repository files, e.g. .github/...)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

  • I want maintainers to keep my branch updated

@mirh
Copy link

mirh commented May 28, 2024

Cool, didn't realize NVENC supported that (even though AV1 is still software only right?).
And I see an image is worth a thousand words. Would there also be benefits with the lossless profile? Or is that just a bandwidth (and/or latency?) monstrosity?

@ns6089
Copy link
Contributor Author

ns6089 commented May 28, 2024

AV1 is still software only right?

Yes, no hardware encoding for AV1 4:4:4 on current generation of gpus

Would there also be benefits with the lossless profile?

I'm not against the idea of supporting lossless encoding options (NVENC can do it for H.264 and HEVC, not AV1), but current netcode imposes hard limit on maximum video packet size, and this limit is very easy to hit on lossless. So need to improve the netcode first.

@mirh
Copy link

mirh commented May 28, 2024

Supposedly.. it may even be possible to dynamically switch between lossy and lossless?
For as much as maybe it's not really necessary, if yuv444 can already score a SSIM of 0.98.

@ns6089
Copy link
Contributor Author

ns6089 commented May 28, 2024

Supposedly.. it may even be possible to dynamically switch between lossy and lossless?

You're describing near lossless encoding, or including both DCT transform and quantization bypasses into rate control assessment. As far as I know, NVENC is not capable of this (quantization can be dynamically lossless when rate control selects QP=4, but DCT transform bypass is static on/off switch).

@ns6089
Copy link
Contributor Author

ns6089 commented Jun 25, 2024

Should be more or less done.
Still need to figure out how to best handle the NVENC/CUDA driver bug, and maybe adjust how 8-bit colors are mapped into 10-bit colors depending on particular client renderer (255 colors don't evenly map into 1023 colors).
But the code should be already complete and correct, as far as I can tell.

@ns6089 ns6089 marked this pull request as ready for review June 25, 2024 16:49
@ns6089
Copy link
Contributor Author

ns6089 commented Jun 25, 2024

Ah, and I still have no idea if Intel encoder works correctly since I don't have supported hardware at hand right now.

@ns6089 ns6089 changed the title Support YUV 4:4:4 encoding Support native YUV 4:4:4 encoding (Windows-only for now) Jun 26, 2024
@ns6089
Copy link
Contributor Author

ns6089 commented Jun 28, 2024

Resolved the CUDA bug, only minor stuff is left.

@ReenigneArcher
Copy link
Member

We have some new patterns for docs which help produce cleaner doxygen docs. https://docs.lizardbyte.dev/projects/sunshine/en/master/source_code/source_code.html

@ns6089
Copy link
Contributor Author

ns6089 commented Jun 28, 2024

Alright, I will update the comments to what the codebase will be using at the time of merge. Currently this PR is held back at moonlight side, and merging one without the other is pointless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants