Move all converters to starch-based implementations #97

mutability · 2021-01-21T11:06:59Z

Short version:

use starch to generate code to select from multiple converter implementations at runtime
build implementations specialized for AVX2 (gcc autovectorization), NEON (intrinsics)
provide default settings to select suitable implementations for armv6 (e.g. Pi 0W), armv7a with NEON (e.g. Pi 4), generic x86, and x86 with AVX2
add a --wisdom option to allow overriding the defaults
provide benchmark tools to generate a wisdom file specialized for the local machine
add support in the dump1090-fa packaging to generate and read machine-specific wisdom from /etc/dump1090-fa/wisdom.local

Other stuff:

tweak the zero offset of UC8 implementations slightly to better match how the RTL2832 behaves (the processed IQ data appears to be centered at 127.4, not 127.5)
add converter tests to verify that they are producing correct output

Things that go faster:

UC8 converters (e.g. rtlsdr path): x86 +10%, Pi 4 +100%, Pi 0W +50%
SC16/SC16Q11 converters (e.g. bladerf path): x86 +500%, Pi 4 +450%

Things that go slower:

Pi 0W SC16/SC16Q11 about 8% slower. This includes switching from the old limited-table-size implementation to a higher precision version that makes use of all bits (and we don't really expect to be running a 12/16-bit SDR on a 0W)

Things that are no longer supported:

--dcfilter option. This could be reimplemented if there's demand, but it was always an experiment and I'm not sure there's actually any benefit to using it unless you're using a zero-IF tuner (and the commonly available zero-IF tuners can't tune to 1090MHz)

To simplify the build, the two dependencies (Google's cpu_features library and the starch infrastructure) have been copied directly into the dump1090 repo rather than being included by e.g. submodule reference.

main user-visible changes: * ensure you check out submodules ('git clone --recurse-submodules") * --version shows the CPU features and DSP implementations in use * --wisdom allows overriding of the built-in architecture wisdom * --dcfilter no longer supported * "starch-benchmark" binary will benchmark all options on the current machine and can produce a wisdom file to feed to the --wisdom option If you have a usecase for --dcfilter, please get in touch and let me know - it's an edge case and for now there's no starch/DSP support for it, but support can be written if needed. In almost all cases the new conversion routines are slightly or substantially faster than the old conversion routines. The only case that is slower is SC16/SC16Q11 on a Pi 0, which is around 10% slower due to changing from heavily approximated lookup tables to higher quality results (but SC16 is probably already out of reach of a Pi 0)

(reads a UC8 capture; measures min/max/mean I and Q)

Looking at actual UC8 captures from a RTL2832, the mean I and Q are actually at 127.4, so use that as the zero point. This means that the resulting I/Q maximum values could be as large as 127.6. Switch to 128 for simplicity.

…rements

u32 & floats.

* add a u32->float exact path * ditch the approximation path * add a NEON VRSQRTE path * add a 12-bit table path (using the full signed I/Q value, not absolute value)

This runs sample input through the DSP functions that are allowed to be inexact and dumps the results as a TSV suitable for feeding to gnuplot to look at the actual errors.

… data

…s/releases/tag/v0.6.0

from https://github.com/flightaware/starch

mutability added 23 commits January 14, 2021 19:43

No need to build with SC16Q11_TABLE_BITS any more

68c2683

Add oneoff/uc8_capture_stats

b8bd480

(reads a UC8 capture; measures min/max/mean I and Q)

Switch UC8 conversion to 127.4 center, 128 range.

decc035

Looking at actual UC8 captures from a RTL2832, the mean I and Q are actually at 127.4, so use that as the zero point. This means that the resulting I/Q maximum values could be as large as 127.6. Switch to 128 for simplicity.

Switch to the new UC8 zero offset in benchmarks, fix some bugs

97fcbc8

Fix some bugs in SC16/SC16Q11 validation, tighten the max error requi…

792b151

…rements

Ditch UC8 approximation path, add a NEON VRQSQRTE path.

26c7e17

Tweak the SC16 exact path, add a new impl that uses a mix of

5721831

u32 & floats.

SC16Q11 impl tweaks:

9230c2d

* add a u32->float exact path * ditch the approximation path * add a NEON VRSQRTE path * add a 12-bit table path (using the full signed I/Q value, not absolute value)

Ditch SC16 approximation path, add NEON vrsqrte path

26aa539

Add oneoff/dsp_error_measurement

67b942c

This runs sample input through the DSP functions that are allowed to be inexact and dumps the results as a TSV suitable for feeding to gnuplot to look at the actual errors.

Update make clean, make wisdom targets

d27e314

Update wisdom based on benchmarking

975cb8c

Preserve the raw wisdom benchmark data

71d5abe

Update to latest starch

0a51daf

Update .gitignore for new wisdom files

e2f800e

Update starch generated code

222d42a

Build starch-benchmark as part of the 'all' target

46ea726

Use wisdom from /etc/dump1090-fa/wisdom.local if present

5d709f5

Package starch-benchmark and a helper script to generate local wisdom…

8b75120

… data

Remove submodules in preparation for importing them directly

02fb902

Import cpu_features v0.6.0 from https://github.com/google/cpu_feature…

b476e84

…s/releases/tag/v0.6.0

Import starch at commit a725c8491dc33a321565d451b385131e589d8490

a5da49c

from https://github.com/flightaware/starch

mutability merged commit bff71dc into dev Jan 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move all converters to starch-based implementations #97

Move all converters to starch-based implementations #97

mutability commented Jan 21, 2021

Move all converters to starch-based implementations #97

Move all converters to starch-based implementations #97

Conversation

mutability commented Jan 21, 2021