Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve descrambler performance #105

Merged
merged 1 commit into from
Mar 1, 2016
Merged

Conversation

glenvt18
Copy link
Contributor

@glenvt18 glenvt18 commented Mar 1, 2016

Hi @manio.

I've come across the issue, described here (about TBS 5922). In short, some tuners ("good" ones) return from poll() when there are 8-10 packets available to read, some other ("bad" ones) - when there are only 1-3 packets. If the descrambler processes all the data available in the ring buffer ("good" tuner, fast CPU), then the ring buffer sleeps for 100 ms, and a new (big) chunk of data is available on the next call . But, if a (small) number of packets arrives during descrambling ("bad" tuner, weak CPU), the ring buffer will never (or rarely) sleep always feeding the descrambler with small chunks of data. This keeps the batch buffer of a parallel descrambler heavily underfilled (10-20%) causing huge CPU load (4x-20x raise). "Bad" tuners are not so uncommon - in fact, 2 out of 3 tuners I've tested turned out to be "bad" with ARM and even Atom platforms being affected by this issue. I wouldn't say that those tuners are bad or their drivers are broken. That's just the way they implement things.

The solution you proposed in #43 does not always work well. The delay value depends on a number of factors such as tuner, hardware, descrambler implementation and the batch size. It can only be found experimentally for a particular combination of them. And even then the fill ratio is not close to 100%. Though, this delay helps to reduce TS buffer thread CPU load. And it only works for a dvb device (not satip or iptv).

To address this issue I've come up with a simple algorithm which uses a low-water mark to keep the fill ratio high. Processing (descrambling or filtering) is only allowed when there are at least low-water mark bytes in the input buffer. Otherwise, the thread sleeps for Timeout ms waiting for a bigger chunk. The watermark is then updated with the number of bytes received during the sleep. The water mark value is limited considering the size of the device's ring buffer. In other words, the algorithm tries to to keep as much data as possible, but not more than the limit, and doesn't introduce zapping lag more than Timeout ms. Measuring the water mark in terms of time, not bytes, helps to handle several "major" streams (pids) which bit rates differ a lot. The algorithm assumes that the device uses cRingBufferLinear (which is the case for dvb, satip and iptv devices). It should work with a simple (not ring) buffer too, but without any performance improvements. An average fill ratio (=efficiency), I measured, is 99.6% for streams with bit rates >= 2 Mbit/s, with a "bad" tuner. With a "good" tuner this algorithm increases performance by 5-20%.

This hack simulates "bad" tuner behaviour with a "good" tuner. Tuned for 2-4 Mbit/s.

This patch shows what happens inside and measures the fill ratio (grep for 'Decrypt block').

Please review.

Use a simple algorithm to keep a parallel CSA descrambler batch
buffer fill ratio very close to 100%. This helps to achieve maximum
performance of the descrambler.
@manio
Copy link
Owner

manio commented Mar 1, 2016

Thank you very much for this! :)
Merging...

manio added a commit that referenced this pull request Mar 1, 2016
Improve descrambler performance
@manio manio merged commit 9f55ebe into manio:master Mar 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants