Improve descrambler performance #105

glenvt18 · 2016-03-01T02:16:45Z

I've come across the issue, described here (about TBS 5922). In short, some tuners ("good" ones) return from poll() when there are 8-10 packets available to read, some other ("bad" ones) - when there are only 1-3 packets. If the descrambler processes all the data available in the ring buffer ("good" tuner, fast CPU), then the ring buffer sleeps for 100 ms, and a new (big) chunk of data is available on the next call . But, if a (small) number of packets arrives during descrambling ("bad" tuner, weak CPU), the ring buffer will never (or rarely) sleep always feeding the descrambler with small chunks of data. This keeps the batch buffer of a parallel descrambler heavily underfilled (10-20%) causing huge CPU load (4x-20x raise). "Bad" tuners are not so uncommon - in fact, 2 out of 3 tuners I've tested turned out to be "bad" with ARM and even Atom platforms being affected by this issue. I wouldn't say that those tuners are bad or their drivers are broken. That's just the way they implement things.

The solution you proposed in #43 does not always work well. The delay value depends on a number of factors such as tuner, hardware, descrambler implementation and the batch size. It can only be found experimentally for a particular combination of them. And even then the fill ratio is not close to 100%. Though, this delay helps to reduce TS buffer thread CPU load. And it only works for a dvb device (not satip or iptv).

To address this issue I've come up with a simple algorithm which uses a low-water mark to keep the fill ratio high. Processing (descrambling or filtering) is only allowed when there are at least low-water mark bytes in the input buffer. Otherwise, the thread sleeps for Timeout ms waiting for a bigger chunk. The watermark is then updated with the number of bytes received during the sleep. The water mark value is limited considering the size of the device's ring buffer. In other words, the algorithm tries to to keep as much data as possible, but not more than the limit, and doesn't introduce zapping lag more than Timeout ms. Measuring the water mark in terms of time, not bytes, helps to handle several "major" streams (pids) which bit rates differ a lot. The algorithm assumes that the device uses cRingBufferLinear (which is the case for dvb, satip and iptv devices). It should work with a simple (not ring) buffer too, but without any performance improvements. An average fill ratio (=efficiency), I measured, is 99.6% for streams with bit rates >= 2 Mbit/s, with a "bad" tuner. With a "good" tuner this algorithm increases performance by 5-20%.

This hack simulates "bad" tuner behaviour with a "good" tuner. Tuned for 2-4 Mbit/s.

This patch shows what happens inside and measures the fill ratio (grep for 'Decrypt block').

Please review.

Use a simple algorithm to keep a parallel CSA descrambler batch buffer fill ratio very close to 100%. This helps to achieve maximum performance of the descrambler.

manio · 2016-03-01T17:49:13Z

Thank you very much for this! :)
Merging...

Improve descrambler performance

Improve descrambler performance

bfdca9d

Use a simple algorithm to keep a parallel CSA descrambler batch buffer fill ratio very close to 100%. This helps to achieve maximum performance of the descrambler.

manio added a commit that referenced this pull request Mar 1, 2016

Merge pull request #105 from glenvt18/fill_control

9f55ebe

Improve descrambler performance

manio merged commit 9f55ebe into manio:master Mar 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve descrambler performance #105

Improve descrambler performance #105

glenvt18 commented Mar 1, 2016

manio commented Mar 1, 2016

Improve descrambler performance #105

Improve descrambler performance #105

Conversation

glenvt18 commented Mar 1, 2016

manio commented Mar 1, 2016