Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotary encoder unresponsive on slave half of split keyboard #7055

Closed
3 tasks
splitkb opened this issue Oct 16, 2019 · 33 comments · Fixed by #7325 or #7505
Closed
3 tasks

Rotary encoder unresponsive on slave half of split keyboard #7055

splitkb opened this issue Oct 16, 2019 · 33 comments · Fixed by #7325 or #7505

Comments

@splitkb
Copy link
Contributor

splitkb commented Oct 16, 2019

Describe the Bug

I am using a split keyboard I designed myself, which supports an encoder on both halves. While testing, I found that the master side of the encoder always works as expected, but the slave side encoder drops a lot of keypresses. See below for more test cases.

System Information

  • Keyboard: Kyria (Firmware not yet in master branch, it's in this fork until I've verified it works)
    • Revision (if applicable): 1.0
  • Operating system: Windows 10
  • AVR GCC version: 5.4.0
  • ARM GCC version: N/A
  • QMK Firmware version: 0.7.40
  • Any keyboard related software installed?
    • AutoHotKey
    • Karabiner
    • Other:

The used encoder was the ALPS EC11K1524406 which has a resolution of 2.

Additional Context

I went through the following tests, and made sure to flash both halves of the keyboard after each recompile:

  • make clean
  • Reverted commit 46c49ae4e639509ceb6e81c460855d2eb89086b8
  • Disabled Link Time Optimization
  • Added #define TAP_CODE_DELAY 10, also with values 50 and 100
  • Tried #define SELECT_SOFT_SERIAL_SPEED 2 and #define SELECT_SOFT_SERIAL_SPEED 0
  • Tried basic logging using https://docs.qmk.fm/#/newbs_testing_debugging?id=testing-and-debugging, and code like print("pgdn enc 1;");
  • Tried #define EE_HANDS and flashing right EEPROM to each half

When switching the USB cable to the other half, that halves encoder will work perfectly fine, and cause the slave half to exhibit the same behaviours as the other way around. The connected side will always work.

There is no predictability to when the slave encoder does or does not work. When rotating fast it'll be sure to send a few events, but when rotating slowly, tick by tick, it might work 10% of the time.

Other dual split keyboards are the Iris rev3 and rev4, and the Sol rev2.

I've discussed this issue with @zvecr on Discord.

@callebjorkell
Copy link

callebjorkell commented Oct 17, 2019

I can also see this behaviour on my own kyria.

I can add that while the master side encoder does work much more reliably for me as well, it's not perfect and does drop some keypresses when turning the encoder. It also seems like the passed clockwize boolean is inverted in this case. I guess that could be a secondary bug though, as it doesn't sound like it should be affected by the bug described above, but I thought I'd mention it in case it is connected somehow.

@XScorpion2
Copy link
Contributor

Got a link to the encoders you are using?

@zvecr
Copy link
Member

zvecr commented Oct 17, 2019

@XScorpion2 Lost somewhere in discord, it was mentioned as a EC11K1524406. Given if you swap master to the other hand it works fine, the initial guesses are around the split_common/transport.c code, rather than the encoder.c detection code.

@splitkb
Copy link
Contributor Author

splitkb commented Oct 17, 2019

@XScorpion2 It is indeed the EC11K1524406 (Datasheet).

I recall not having this issue before merging master to my branch about six weeks ago, but I haven't been able to spend the time to reproduce the older scenario yet. I will check that tomorrow.

@XScorpion2
Copy link
Contributor

@zvecr I suspect the error is probably lower than split_common/transport.c in the split_common/serial.c code and how it interacts with certain boards. I have heard whispers that using #define SERIAL_USE_MULTI_TRANSACTION in the keyboards config.h file has helped general serial communication issues, so I would be very interested to see if it helps here, though it requires compiling & re-flashing both halves.

@splitkb
Copy link
Contributor Author

splitkb commented Oct 18, 2019

I have just tried to apply the #define SERIAL_USE_MULTI_TRANSACTION fix, but it seems not to affect the rotary encoder. I performed make clean beforehand and flashed both halves.

@XScorpion2
Copy link
Contributor

Testing on my encoder here's what I noticed so far:

RGBKB Sol R2
#define SOFT_SERIAL_PIN D3
#define EE_HANDS
PEC12R-4217F-S0024 (24 Dents, 24 Pulses) - https://www.digikey.com/en/datasheets/bournsinc/bourns-inc-pec12r

Slave:
360 degree rotation at 1 tick a second > Vol 100% to 76%
Note: KC_VOLD on my PC changes vol by 2%, so this is 12 registered ticks (24% / 2)

Master:
360 degree rotation at 1 tick a second > Vol 100% to 52%
Note: KC_VOLD on my PC changes vol by 2%, so this is 24 registered ticks (48% / 2)

So it is dropping ever second tick. I think I may know why, but won't be able to debug/fix this weekend due to out of town wedding.

@XScorpion2
Copy link
Contributor

Was able to sneak in some debugging, the issue boils down to speed. Main hand board is doing more work and updates encoder state slower than off hand. So what's happening is that the main hand only reads off hand state at half speed causing it to miss encoder pulses. As soon as I disabled the oled, the main hand was running fast enough to read all the individual pulses from the encoder and work correctly. I have a few ideas on how to fix this.

@callebjorkell
Copy link

I can confirm this. I flashed QMK with OLED_DRIVER_ENABLE = no in my rules.mk (no other changes) and it works much better for me as well.

This solved my encoder problem on the master side (it now worked 60/60 "ticks") and the slave side improved from around a 50% success rate to a ~90% success rate. In other words, I see the same behaviour on both master and slave sides, it's just that it's been way more pronounced on the slave side. Disabling the OLED apparently frees up enough resources to make master work completely, and the slave side work "much better", but still not perfect.

@splitkb
Copy link
Contributor Author

splitkb commented Oct 20, 2019

Without having knowledge of the core code, I assumed the slave would be reading the encoder state and passing the results through to the master, so that only the turn events would have to be handled.

The way I understand it, currently there are several solutions:

  1. Revise the way the encoder state is handled
  2. Improve performance of driving the display
  3. Use a chip that has more performance as to work around the issue
  4. Recommend customers not to use an OLED on the master

As for 1, revising the encoder state change handling, I can imagine the slave reading the state changes and buffering the resulting events for them to be handled by the master could work. It might be undesirable as that could lead to heightened latency.

Option 2, improving the performance of driving the display would be most desirable as it would enable me to keep using the same hardware and keeping latency low. I'm not sure if the display needs to be written to as often as happens now for example, and perhaps I'm using the OLED API in a suboptimal way.

Option 3, using more performant hardware, could be an option but seems difficult. The first thing that comes to mind is using the Proton C, which poses several limitations (the Kyria is using serial for transport between halves, commercial availability of the controller). I'd prefer to go in the direction of option 2.

Option 4, not using an OLED on the master, would be another workaround. It might be necessary, but I hope we'll be able to improve performance.

How can we best move this issue forward?

@XScorpion2
Copy link
Contributor

XScorpion2 commented Oct 20, 2019

It's a bit bigger than just OLED vs Encoder. Encoders themselves are extremely timing sensitive, as rotating an encoder a single dent has 4 state changes (pulses) and the keyboard has to poll fast enough to catch those pulses for it to properly register 1:1. So anything that slows the keyboard down or fast spinning of the encoder will cause it to miss out on a pulse. For example, disabling rgb matrix which is 1-2ms per keyboard tick was enough to improve slave encoder behavior too.

I have some ideas on how to improve slave and master encoder pulse detection that will help (option 1 like) , but it's be impossible to fix it 100% as users still can spin the encoder faster than what is possible for us to register along side of all the other keyboard functionality.

@callebjorkell
Copy link

I'd assume that a user spinning the encoder quickly causing dropped events isn't really an issue, as the user would also not be able to keep track on the number of physical ticks that has taken place in those cases. If we can get it to a place where ~100% of the slow turning ticks get registered, at least I would be very happy.

Would a possible solution be to just poll the encoder multiple times per event loop? Do we even know how fast the polling needs to be in order to properly register a single tick on the encoder properly?

@splitkb
Copy link
Contributor Author

splitkb commented Oct 20, 2019

@callebjorkell came with a good question, since we're both using EC11K encoders instead of the more conventional EC11E or others: it seems as though the resulting waveform is slightly offset when compared to other encoders. Could this be the cause of the behaviour, or would you say it's still performance related?

EncoderWaveForms

I know that I had to use a resolution of 2 instead of the default of 4 and that seemed to work perfectly fine for me, but it's still a deviation to consider.

@XScorpion2
Copy link
Contributor

Not really, as the code just pills the pins individually. So while the on/off overlap is slightly different, it still comes down to how fast we poll the pins state. Polling multiple times a loop would work as long as there was some delay in between. Which is not something I'd like to do blocking the rest of the keyboard actions. I plan on looking into adding additional update calls for the encoder before, after and between the oled and rgb update calls as a test and also some changes to how we track encoder state. That should help most cases except the extreme fast spin case.

@callebjorkell
Copy link

Polling multiple times a loop would work as long as there was some delay in between. Which is not something I'd like to do blocking the rest of the keyboard actions.

I guess just polling multiple times wouldn't really solve anything regardless of delays, since you'd still have the problem with the rest of the loop taking too long a time. I suppose it would make the problem less frequent, but that's it, right?

I plan on looking into adding additional update calls for the encoder before, after and between the oled and rgb update calls

This was what I suggested. I did try this out, but I'm not sure I did it right since I didn't actually see a noticeable change. It's not at all unlikely though that I made the change in the wrong place, so don't take my word for it 😃

@XScorpion2
Copy link
Contributor

This was what I suggested. I did try this out, but I'm not sure I did it right since I didn't actually see a noticeable change. It's not at all unlikely though that I made the change in the wrong place, so don't take my word for it 😃

This is just to fix the master encoder behavior, slave encoder part is more to do with what I said "some changes to how we track encoder state".

@callebjorkell
Copy link

I'm up for helping out with testing as well if there's anything that needs to be checked/verified.

@XScorpion2
Copy link
Contributor

Will be poking at it this weekend. Was at a wedding last weekend.

@splitkb
Copy link
Contributor Author

splitkb commented Nov 3, 2019

Are there updates on this issue since the last time? I don't know how to approach this issue myself.

@XScorpion2
Copy link
Contributor

I've made some progress but nothing commit worthy yet. Dealing with obligations slowing me down. (work, wedding, family, etc)

@splitkb
Copy link
Contributor Author

splitkb commented Nov 3, 2019

@XScorpion2 not to worry, I very much appreciate your effort! If there's anything I can do to help, please let me know. If you need hardware to reproduce the issue, also let me know, I can ship that to you.

@XScorpion2
Copy link
Contributor

I've got hardware that reproduces the issue and a few code chunks that fix it, just not to the level of quality I want. Just need a few hours of quite time to wrap things up. Currently I'm having a very heated discussion about where to go for lunch with the in-laws and wife in the car in the middle of no where...

@splitkb
Copy link
Contributor Author

splitkb commented Nov 3, 2019

Oof. Better lay down the phone and talk it out I reckon! 😅 Do try to enjoy the food 🙂

@XScorpion2
Copy link
Contributor

Alright, so here's where I'm at:
Code works, but doesn't handle when the value wrapps correctly. IE: Counter 254 + 4 = 2

The gist I need to figure out is given 2 values, find the shortest delta between them and the direction (positive vs negative) while also taking into account wrapping. I've got a few complicated solutions for this, but I know I'm just missing the obvious solution.

@XScorpion2
Copy link
Contributor

You know what, I'm an idiot. Just figured it out. My type casting was wrong.

@XScorpion2
Copy link
Contributor

Try this:
#7325
Untested, sorry

@callebjorkell
Copy link

I'm away from my keyboard for three weeks as of yesterday, so I can't actually test this now 😞. I assume someone else will get around to it before that, but I will definitely do it when I'm back home.

Fix looks like it should work though, essentially you're trying to figure out direction and distance since last time, and then replaying all the pulses that should have happened for that to be the case, right?

@XScorpion2
Copy link
Contributor

@callebjorkell exactly right

@splitkb
Copy link
Contributor Author

splitkb commented Nov 15, 2019

The code in pull request #7325 fixes this issue when merged. Thank you very much!

@zvecr
Copy link
Member

zvecr commented Nov 28, 2019

Reopening now #7325 has been reverted.

@drashna
Copy link
Member

drashna commented Dec 1, 2019

Can confirm that this is fixed on the latest PR. At least for the kyria. I don't have an iris or what not to test on.

@skbolton
Copy link

skbolton commented Dec 1, 2019

@drashna i can test on my iris in the morning. By latest pr do you mean master?

@noroadsleft
Copy link
Member

@skbolton You should be able to test this if you have Git set up by running:

git fetch upstream pull/7505/head:pr/7505  # fetches the changes from PR 7505 to a new local branch; upstream should point to the QMK repository (git remote -v to check)
git checkout pr/7505
make keebio/iris/<revision>:<keymap>:flash  # make sure to substitute your Iris revision and keymap name here

This will leave your master branch intact so you can just check it back out and compile/flash again to restore your current Iris firmware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants