Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all converters to starch-based implementations #97

Merged
merged 23 commits into from
Jan 21, 2021
Merged

Conversation

mutability
Copy link

Short version:

  • use starch to generate code to select from multiple converter implementations at runtime
  • build implementations specialized for AVX2 (gcc autovectorization), NEON (intrinsics)
  • provide default settings to select suitable implementations for armv6 (e.g. Pi 0W), armv7a with NEON (e.g. Pi 4), generic x86, and x86 with AVX2
  • add a --wisdom option to allow overriding the defaults
  • provide benchmark tools to generate a wisdom file specialized for the local machine
  • add support in the dump1090-fa packaging to generate and read machine-specific wisdom from /etc/dump1090-fa/wisdom.local

Other stuff:

  • tweak the zero offset of UC8 implementations slightly to better match how the RTL2832 behaves (the processed IQ data appears to be centered at 127.4, not 127.5)
  • add converter tests to verify that they are producing correct output

Things that go faster:

  • UC8 converters (e.g. rtlsdr path): x86 +10%, Pi 4 +100%, Pi 0W +50%
  • SC16/SC16Q11 converters (e.g. bladerf path): x86 +500%, Pi 4 +450%

Things that go slower:

  • Pi 0W SC16/SC16Q11 about 8% slower. This includes switching from the old limited-table-size implementation to a higher precision version that makes use of all bits (and we don't really expect to be running a 12/16-bit SDR on a 0W)

Things that are no longer supported:

  • --dcfilter option. This could be reimplemented if there's demand, but it was always an experiment and I'm not sure there's actually any benefit to using it unless you're using a zero-IF tuner (and the commonly available zero-IF tuners can't tune to 1090MHz)

To simplify the build, the two dependencies (Google's cpu_features library and the starch infrastructure) have been copied directly into the dump1090 repo rather than being included by e.g. submodule reference.

main user-visible changes:

 * ensure you check out submodules ('git clone --recurse-submodules")
 * --version shows the CPU features and DSP implementations in use
 * --wisdom allows overriding of the built-in architecture wisdom
 * --dcfilter no longer supported
 * "starch-benchmark" binary will benchmark all options on the
   current machine and can produce a wisdom file to feed to
   the --wisdom option

If you have a usecase for --dcfilter, please get in touch and
let me know - it's an edge case and for now there's no starch/DSP
support for it, but support can be written if needed.

In almost all cases the new conversion routines are slightly or
substantially faster than the old conversion routines. The only case
that is slower is SC16/SC16Q11 on a Pi 0, which is around 10% slower
due to changing from heavily approximated lookup tables to higher
quality results (but SC16 is probably already out of reach of a Pi 0)
(reads a UC8 capture; measures min/max/mean I and Q)
Looking at actual UC8 captures from a RTL2832, the mean I and Q
are actually at 127.4, so use that as the zero point.

This means that the resulting I/Q maximum values could be as large as
127.6. Switch to 128 for simplicity.
 * add a u32->float exact path
 * ditch the approximation path
 * add a NEON VRSQRTE path
 * add a 12-bit table path (using the full signed I/Q value, not absolute value)
This runs sample input through the DSP functions that are
allowed to be inexact and dumps the results as a TSV suitable for
feeding to gnuplot to look at the actual errors.
@mutability mutability merged commit bff71dc into dev Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant