Hacker News new | past | comments | ask | show | jobs | submit login
Reverse-engineering my speakers' API to get reasonable volume control (jamesbvaughan.com)
190 points by jamesbvaughan 14 hours ago | hide | past | favorite | 90 comments





Hey OP - if your thinking about a smart knob with haptic feedback etc - check this open source project out that you can build yourself - its amazing:

https://github.com/scottbez1/smartknob

Video demo here: https://www.youtube.com/watch?v=ip641WmY4pA&t=1s


Wow!

I had dreamed of physical "analog" controls as a standard feature on keyboards and input devices for various applications -- would be great productivity booster for power users.

I wish this catches on and gets mass adoption.



Yeah - me too - I dont have the time/tools to build these myself - but would happily pay for them as a product.

"you can build yourself"

Technically yes, but that is a serious project.


Yeah - I actually asked on the Discord if anyone was selling them pre-made - I would buy some if available.

I dont have time/tools etc to do this.


If it’s $200 in parts then a premade item would be at least $400 and probably $500. There probably isn’t much of a market at those prices.

The bill of materials needs to come way down to sell it for much less than that. Buying the parts in hobbyist-level bulk (100 pcs) probably would only shave $100 from the final price.


Oh man, this has been on my to-build list for ages, right up there with the 6 axis “3D mouse” build. I love the attention to detail.

That seems like it could be perfect. Thanks!

For those considering buying speakers: (1) do it, (2) get passive ones and a separate amp. Honestly it’s such a mature market that buying these active speakers just creates e-waste. Keep the e-waste to the amp. You can get really solid speakers for $300 and a cheap amp with BT for $50-100, replacing them basically independently depending on your needs.

Quality of active speakers are really good these days, they have matched amps and speakers from neumann, genelec etc also has active crossover which is superior than any passive setup. Mature market sure, but even companies like KEF who didn't offered or focussed much on active systems, have growing range of options now.

Quality active speakers very rarely break, but one should buy them from studio gear (e)shops instead of the nearest mall. I totally agree on other gimmicks that would add single points of failure and should be kept out.

Which speakers would you recommend in that price range?

The most expensive boxes you can afford from Canton. Or buy second hand - they will sound like new.

But speakers are something very subjective, and while I’m still delighted with the Canton I got for a discount, your ear might prefer something different. This is something where comparing in a physical store with a passionate salesperson can be worth it.


I would recommend some older B&W (Bowers & Wilkins) from 6xx or DM6xx series. You can find them cheap on eBay etc, I use those since 15 years and cannot complain.

Seems like the author moved from a “speakers + networked amp” setup, to a one where with active speakers where everything is built-in.

When I was buying speakers for my apartment some time ago, I was similarly considering going for the all-in-one options like this, but I’m glad I didn’t. I prefer the “dumb passive speaker + networked amp”, as it allows you to pick / replace / upgrade the separate components. Went for the KEF LS50, and for the amp Lyngdorf TDAI-1120. And that’s despite KEF having the all-in-one active version of those speakers. Very happy with my choices!


I have the “wireless” powered kef ls50 and the regular ones.

The wireless model has significantly better bass response and sounds much better to my ear.

I actually had a fault with them recently and they stopped working, I’d bought the speakers used on ebay and even had I had a warranty they were past 5 years old by the point the fault developed. Regardless, kef repaired them entirely for free. 10/10 would buy again.


I don't even bother with a networked amp, you can plug a chromecast/airplay/bluetooth receiver into a dumb amp.

> Seems like the author moved from a “speakers + networked amp” setup, to a one where with active speakers where everything is built-in.

Close! I moved from a "speakers + non-networked amp + streamer" setup.

I'm still running separate components for nearer-field listening at my desk, where I've got KEF Q150s powered by a small Schiit pre-amp and amp.


Nice! I don't have any Schiit gear, but few months ago I started reading the founders book 'Schiit Happened'. I got halfway through (and then got distracted by other books, need to pick it back up), but can definitely recommend it for anyone interested in audio, and especially if you already have some Schiit!

In a similar vein, I was delighted to discover that my rather elderly Audiolab M-DAC (used as a pre-amp) exposes its master volume control over USB digital input when plugged into a RPi.

Although I don't use USB for audio (rather buggy) the control interface works perfectly.

I duly created a websockets API that allows me to remotely control the volume over wifi via a phsyical rotary controller. Allows me to conceal all the hifi equipment in a cupboard upstairs but remotely control it from a knob downstairs in the kitchen.

Like the OP I also implemented volume limits to prevent accidental damage to the speakers (primarily from twiddles by little fingers)....works a treat!


> Those methods either give me a tiny slider that I can only use 10% of or about 15 steps where the jump from step 3 to step 4 takes the speakers from “a bit too quiet” to “definitely bothering the neighbors” levels.

Volume controls need to be logarithmic, not linear.

To a first degree approximation, everybody gets this wrong.


Volume controls also shouldn't just be a flat wideband gain - they should respect how we actually perceive sound so the timbre doesn't change as the level changes (when you turn the volume down, you are typically left with just the stuff in the vocal frequency range, and lose all the bass etc).

Doing this stuff well is pretty hard (e.g. designing filters that can do that kind of volume adjustment is hard because you want to be constantly adjusting them, which means you need to be super careful with your filter state) but I have heard what it sounds like, and once you hear it you get angry at all other volume controls.


> Volume controls also shouldn't just be a flat wideband gain - they should respect how we actually perceive sound so the timbre doesn't change as the level changes (when you turn the volume down, you are typically left with just the stuff in the vocal frequency range, and lose all the bass etc).

The amp I'm upgrading from was interesting in this regard. In addition to the main volume knob, it had a loudness knob. The manual actually recommended keeping the volume knob fixed most of the time and using the loudness knob to set the listening level throughout the day.

From the manual:

> 1. Set the LOUDNESS control to the FLAT position.

> 2. Rotate the VOLUME control on the front panel (or press VOLUME +/– on the remote control) to set the sound output level to the loudest listening level that you would listen to.

> 3. Rotate the LOUDNESS control counterclockwise until the desired volume is obtained.


Amazing. This is probably the correct way do make amp controls. I'd say the volume should be a multi turn trim potentiometer in the back of the device so you don't have to brief your guests on correct operation.

That's because most volume controls also affect the output impedance.

Yep. I was always using only the low end of the volume slider and having the same problem as OP that the steps were too coarse. So for Musium, I added a logarithmic volume control with a step size of 1 dB. That difference is on the border of being perceivable in the range I usually listen at, which is fine enough in practice.

https://docs.ruuda.nl/musium/loudness/


Even friggen Apple, who seem to have a rep for caring about such things. I'd say I'd need at least 5 more levels between off and the lowest volume on my iPhone. It's way too loud to use in a quiet room. I have to try to cover the speaker.

I know it is tangential, but this about his old system caught my attention:

  With that system, I could set the amplifier’s analog volume knob such that the max volume out of the streamer corresponded to my actual maximum preferred listening volume, giving me access to the full range of Spotify or AirPlay’s volume controls.
Assuming an analog input, this might result in a noticable quality reduction at low volumes.

Tangential fun fact: amps have a fixed gain, because it's hard to make a variable gain without distortion [1]. The volume knob doesn't control amplification, in fact it controls an attenuation stage, because it's easier to make variable attenuation with low distortion.

[1] that's why there were so many different "classes" of amps, they're all making different tradeoffs about how they're doing the amplification.


That helps explain why the "volume" as represented on disk in the debug bundle as "attenuation" and was measured in negative dB.

According to Claude, the attenuation stage is before the power amp stage. Does that mean worse SNR whether the volume is controlled using the volume control or via the input?

(Ignoring the additional quantization issue with a scaled digital input.)


Class D doesn't have any attenuation. This is a big factor in their greater efficiency.

You could put the attenuation stage after the power amp stage but it would require big beefy resistors that could absorb a lot of power. They'd get hot and the whole thing would be very inefficient.

But hey, very low distortion.


Reducing excessively before the DAC and high gain after the DAC is far more likely to result in quality reduction, due to quantization error. Having reasonable levels before the DAC and just the right amount of gain after the DAC (e.g., via an amplifier's attenuator setting) is the best possible scenario. So TFA's prior setup may have been superior in this regard, depending on how the digital volume control on the new speakers is implemented (i.e., before the DAC, or as a VCA after the DAC).

Where this breaks down is if the analog signal path (after the DAC) consists of something noisy after the attenuator. Passive attenuation (like built into an amp, or the master fader of a mixer, etc.) won't add noise, but something active like an outboard EQ would. The attenuation to set desired max level must be completely last (before power amp) to avoid noise.


We really don't want to be touching the digital signal. The state of the art is to change the DAC reference level, putting the DAC output at the sweet spot for the analog stage for any given ultimate output level.

In this case, I was using the optical out from a WiiM mini into a Yamaha amp. I don't know much about digital audio, but I know that I was able to control the volume of the WiiM's digital output with that setup.

On the other hand, I use a Schiit Asgard at my desk, where I have it connected to my Mac via USB-C. In that setup, I have no control over the volume level going in to the Asgard. MacOS just disables the software volume control when I'm using that audio output.


I think for a WiiM mini to control the volume on the digital output, it would need to scale down every sample. This is probably fine over some range (it has a 24 bit output, so putting the volume at two thirds, would still result in 16 bits, the same as CD). But I'm curious what would happen at very low volumes, e.g. if you're down to only 4 bits.

Each bit is ~6dB, so 2/3 perceived volume is still 23 bits. 8 bits is 48dB, which is less than 1% of the original volume, and still using 16 bits

And this is why all computer audio should be 24 bit internally. A lot of the newer pro apps are actually using 32 bit floats

I struggled with volume control on my computer, it’s always too coarse. I found out that you can hold Option+Shift when pressing the volume button on Mac to do more fine grained adjustments.

While I’m all for physical controls, especially ones that self-adjust to reflect the state of the remote device at all times, I wonder if the author just doesn’t know you can finely adjust volume in iOS control center by force/long pressing and then dragging.

This is a great point! When I'm using AirPlay, that feature is really useful. I'm more often using Spotify Connect though, where I'm limited to either using the physical volume buttons on my phone, the small slider in the desktop app, or the slider that's many clicks in to the Spotify mobile app.

In reality though, this project is more about the fun of it than about it being a really pressing need.


It also works when using Spotify Connect on your iOS device. If you can use your volume buttons to control it you can also adjust it with the slider in the control centre.

You're correct, TIL!

That's really helpful to know. At this point though, I'm excited enough to build a volume knob that I'll probably still do it.

edit: After trying this out it a bit, it's definitely an improvement over the small sliders and a huge improvement over the stepped volume changes from the buttons, but I'm still left wishing I could make use of more than ~10% of the slider's full range.


spotify has so many user hostile practices that I am completely mystified why the majority of the population seems to prefer them in a world where youtube music exists.

The only competitor that I've given a fair shot is Apple Music. I'm not thrilled with either. Between those two, Spotify wins solely for Spotify Connect. I much prefer the way it works to AirPlay.

I haven't really tried YouTube Music, but I'll give it a go. I've been meaning to try out Tidal too but haven't yet.


If you’ve a computer to run it on, I highly recommend trying Roon out as a superior alternative to Spotify connect.

What do you prefer about Roon?

They’re not increasing my subscription to give me stuff I never asked for.

On an actual technical side, I can stream to multiple devices concurrently, the interface is cleaner and it supports a local music library.


Nice speakers! I'm guessing the model, if correct here are some spins for those interested https://www.erinsaudiocorner.com/loudspeakers/jbl_4329p/

Weird you can't limit digital output. I also listen mostly at low volumes and have the same issue. Part of the reason I chose Sonos is this is built in across all products, alongside an effective loudness toggle. This has been particularly helpful with some little Genelec 8010s I use as desktop speakers plugged into a Sonos Port.


How are you going with the new Sonos app upgrade?

Perfectly fine, I actually prefer it in almost every way.

The only issue I agree is truly serious is the latest release draining battery on iOS with or without background activity disabled. That’s a real bad bug to introduce and I’m surprised it hasn’t been fixed yet.


This begs the question, why buy oversized speakers of which you can only use 10% of the range instead of smaller speakers?

Is the sound quality better when not approaching to maximum volume?


You can't buy quality sounding speakers without them being high powered.

Do you mean physically, or as in there's no market for high-quality low-powered ones?

Both. To fill a larger room with enough balanced sound with all types of music implies enough headroom that most of the time it will be overpowered. Also everybody would take a mostly overpowered speaker over a sometimes underpowered one.

The "small quality speakers" category is filled by decent bluetooth speakers and a few pc/desktop 2.0 models.


What the hell browser makers... Make it so that file:// URLS are extremely locked down and doesn't have enough rights to even fetch files in the same directory (or even itself), yet grant localhost URLs full permissions...

There's a reason why local web applications aren't a collection of HTML and JS files, and are instead full copies of the Chromium browser.


The difference is that file:// URLs can be opened by your grandparent opening a .html file that they downloaded, whereas http://localhost requires you to actually set up a web server.

Imagine double-clicking a malicious page.html and suddenly your entire Documents folder can be fetched and manipulated by JavaScript. Yikes.

But to your latter point, yeah, there’s no reason sandboxed web apps couldn’t be given better file:// permissions.


What are you talking about? The OP made requests from his Bun server. CORS would obviously break any request made directly from js in the browser

You could check the m5stack dial for your next step.

A rotary knob with an integrated esp32.

https://shop.m5stack.com/products/m5stack-dial-esp32-s3-smar...


ooh, thanks for this. I will check that out!

My cheap bestbuy tv is the same way... 0-100, and anything above 10 is extremely loud...

Why do speakers even expose a web api in the first place? It’s just easily available without any security?

Hope this person segmented this device away from other devices. The lack of basic security in the IoT space is astounding to me.


It is concerning. On this particular model, it's available over plain HTTP, provides no auth settings, and provides an easy input for uploading new firmware.

https://jamesbvaughan.com/volume-controller-1/basic-web-inte...


I was looking for this comment. Basically he managed to get a sort of unauthenticated R/W access to the file system.

This is really concerning


“The S in IoT stands for secure.”

I love this type of posts, and I’m amazed the speaker exposes its API like this.

The kid in me thinks there could also be a way perhaps to transmit audio through this (or another) API? (very low chance, but…)


On my todo list is to build my own set of network attached speakers. superficially, it does not appear too hard, that is, seemingly possible for even my very limited mechanical integration skills.

BOM: a halfway nice powered speaker with an integral amp and a single board computer. mount the sbc onto the speaker. then use a audio server to ship the sound around. I am a huge obsd fanboy so I would use sndiod but the linux ones(pipewire?) would probably be better for the task.

The main thing stopping me from doing it is that it turns out I dislike dumping sound into the atmosphere, nothing wrong with it, I just don't enjoy blasting music. so I just stick to my headphones, and think about it once in a while.


Wow, I never thought speakers would connect to the network, let alone be running Linux and a webserver for a hidden interface. Honestly, I thought speakers still weren't any more complicated than "convert this digital signal into a signal playable by the driver" with a volume knob.

The digital to analog conversion is even complicated for a speaker. They just take the analog signal. It's really just an electromagnet that moves a cone that vibrates and pushes air around.

  // Yes, this is JavaScript embedded in HTML embedded in TypeScript.


  // I only recently learned that you can reference elements by ID this way.
  // It's kind of horrible but also I love it on tiny pages like this.
You have to kind of embrace the duality of every moment.

  -Spencer Dinwiddie

Fantastic, this could be iterated on by setting it up as a custom airplay speaker with re-mapped volume thresholds.

https://github.com/mikebrady/shairport-sync


Do the speakers require the Content-Type header to be set? If not, POST wouldn't require CORS permissions.

They do require the Content-Type header

So an 'upgrade', huh? Spending time and effort trying to fix the perceived 'upgrade'?

There's nothing as good as analogue connections / controls, i stay away from anything with an 'app' control.


I wish someone would solve the opposite problem in PCs and laptops—that of too little audio gain. Designers never leave any gain in reserve for when audio input levels are too low.

Why do they do this? The problem is so obvious that you'd reckon they're doing it to deliberately annoy users.

The problem doesn't stop there, the lack of gain with Bluetooth is notorious. Almost every Bluetooth device I own has insufficient gain, franky it's a damn nuisance. The audio in the two sound bars that I own is so low on some audio material that I'm thinking of pulling them apart to see if I can find an op amp and increase its feedback resistor to obtain more gain. I should NOT have to do this.

Let me give you an example, the audio levels on many YouTube videos can be all over the place. Often the audio can be 6 to 10 dB below what it ought to be, thus it's impossible to listen on a laptop's speakers, especially so when one is listening in a location where the background noise is high.

What's wrong with the designers who design this digital stuff, don't they ever use the equipment themselves?

Haven't they ever seen a traditional radio or HiFi where the volume potentiometer is off at the 7 o'clock position, 12 noon is the maximum volume with a nominal one volt input signal or a radio station that's using normal levels of modulation, and the reserve gain is the range from the noon position to the 5PM one?

Do I have to say it again? The reserve gain is for when the input signal is lower that it ought to be. The world is not ideal, audio signals can be far from ideal—even from high tech companies like Google.

Occasionally help comes along, VLC has settings that allow the gain to be set to over 100℅ but I've often had situations where even VLC hasn't had the necessary reserve.

I've come to the conclusion the designers and programmers of this digital equipment haven't a clue about how ordinary amplifiers work. Or they have never taken the trouble to find out. They just assume a 16-bit input has 65536 levels and that's the range. Full stop! They never give consideration to what happens when the peak audio input covers perhaps less than one third that range of bits.

To get enough volume I've even had to use the audio equalizer, that's when one has been available, and often there is not. To get the extra gain I slide all sliders to maximum. Having to do this frequently is an ergonomic nightmare.

This is what happens when the arrogant digital world is too prowd to take a leaf out of the analog world—the world that managed to get these issues right about a century or more ago.


> Why do they do this?

So they don't damage users' speakers, their hearing, or generally cause annoyance.

> They just assume a 16-bit input has 65536 levels and that's the range. Full stop! They never give consideration to what happens when the peak audio input covers perhaps less than one third that range of bits.

1/3 of 65536 is still +-11,000 voltage levels, or 14.4 bits of information. That's a really good place for a signal to be! It leaves a bit (literally) of room for the peaks without clipping.

Now if you meant 1/3 of 16 bits = 5.3 bits of information, that is indeed a poorly recorded signal. +-20 voltage levels. It's going to sound terrible whether you boost it digitally or analogly. (is that a word?)


"So they don't damage users' speakers, their hearing, or generally cause annoyance."

Damage speakers? Simply, not an issue unless they're one step removed from rubbish. Also, have you ever heard of output compression and clipping that would protect them? That approach is 101 electronics.

Hearing is not an issue as they're driven by flea power (they're not headphones). Even hearing these pissy little speakers when running flat out is difficult enough. And my hearing is fine.

And where are the regulations that specify a maximum sound level rating for laptops?

By comparison, my 4.5 x 2.5" palm-size Sony transistor radio type ICF-510MK2 (which I'm currently holding in my hand) not only has stacks and stacks of gain on very low level audio (I've never needed to turn the volume up past halfway), and it simply blows my expensive Lenovo laptop away when it comes to maximum output level (I've no trouble hearing it several rooms away). There are no regulations covering how loud it sounds. OK, I've now given everyone a reference device for comparison. I'd put it up against any laptop I've heard in recent times and it'd win hands down every time. BTW, I only paid $9 for it but that was a few years back.

You're right about the bits, it was a throwaway figurative comment to make a point.

I cannot understand why so many people come to the defence of poor ratshit design. My expensive Lenovo laptop, like my Dell laptop, are not fit for purpose when it comes to the audio subsystem. If I can't hear it on a nominal range of audio signals such as those mentioned from Google then, by definition, they're not fit for purpose.

The same nonsense has been wheeled out in recent days in defense of Microsoft's BSOD/crash. As I said on another post that Dark Ages Windows OS ought to be ditched or rewritten (once running, BSODs should never occur unless there's a hardware fault no matter what's loaded into the kernel). If it goes belly-up then it's bad design, QED.

Why defend the indefeasible? That people do and don't complain is why our lives are surrounded by so much shitty partially-functioning tech.


This is one area where I feel Apple did pretty well.

My M1 Air has great sound and a solid max output level.

Recently, I was given an old Sound Pop gen 2. That thing also delivers a lot of sound while also handling low input levels decently.


You're right. I'm not an Apple user but their equipment and performance is pretty much tops. From the outset Jobs was aware that ergonomics and usability were the key to success.

Apple is now about the richest company in history, so why don't others manufacturers realize this and copy their example?

It doesn't make sense why other manufacturers alienate users over unintelligible sound. The extra cost of getting it correct is negligible. Why can't they see that?


Just use Linux, you can set all sorts of gains and limits on Pulse Audio, for each app and speaker individually.

I normally do but using Windows is sometimes unavoidable.

I'd add that in addition to the two Windows laptops mentioned in my reply to the above post, I've also a Toshiba laptop that's running Linux, it's audio hardware is so shitty that it can't be rectified by Linux. Not only is it's audio output low but its crummy little speakers sound terrible. Linux, which I love, sometimes is only part of the solution


I think the article incorrectly states that KEF is owned by Harman, I can only find evidence on the contrary.

It seems that you’re correct! I’m not sure what led me to believe that. I’ll update the post when I get home later.

edit: fixed

I dug into the API similarities between the speakers more and it seems like they're both using this software called StreamSDK [1]. I hadn't heard of that and it's given me more to research on these.

[1] https://www.streamunlimited.com/stream-sdk/


This runs in your laptop? Very interesting.

For now, yes. I'd also like to make a standalone wireless physical volume knob and I may try to make an iPhone app for it. Ideally I'd be able to override the behavior of the volume buttons, similar to how Spotify is able to do that on iPhones when you're using Spotify Connect.

I love this. Thanks for posting.

Your Father and Mother must be very proud of you.


That position of the couch infuriating! And yes I bet you moved that lovely looking couch back after the photo :)

Don't worry - that couch isn't for listening :)

I've got a very comfy and intentionally positioned chair on the other side of the room for that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: