Skip to content
This repository has been archived by the owner on Oct 27, 2020. It is now read-only.

UVC driver usually crashes Linux kernel if run multuiple times in a row #681

Closed
Windwoes opened this issue Jan 12, 2019 · 18 comments
Closed

Comments

@Windwoes
Copy link

Here's a folder with videos of all my tests

Title basically says it all.

Here's the syslog of the event:

[  967.760390] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  967.760452] pgd = e8e38000
[  967.760548] [00000000] *pgd=00000000
[  967.760656] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  967.760711] CPU: 0    Not tainted  (3.4.0-g853158b #1)
[  967.760810] PC is at xhci_free_segments_for_ring+0x2c/0x88
[  967.760866] LR is at xhci_free_segments_for_ring+0x54/0x88
[  967.760964] pc : [<c0623f24>]    lr : [<c0623f4c>]    psr: a0000093
[  967.760967] sp : e83e7c18  ip : 60000093  fp : e83e7c3c
[  967.761109] r10: 00000000  r9 : 00000001  r8 : 00000000
[  967.761162] r7 : e9942000  r6 : e845f940  r5 : 00000000  r4 : 00000000
[  967.761255] r3 : 00000001  r2 : c1131c8c  r1 : 012bf000  r0 : ee002180
[  967.761353] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  967.761409] Control: 10c5787d  Table: 3123806a  DAC: 00000015

It traces back to:

#18 pc 0001da97  /data/app/com.qualcomm.ftcrobotcontroller-2/lib/arm/libRobotCore.so (_Z22uvc_user_callback_mainP17uvc_stream_handle+274)

Also, I did a hotplug recovery test (this time using ZTEs since I figured the Tech Team would have verified it worked with those) and it FAILED all 3 tests I did.

@cmacfarl
Copy link
Collaborator

Can you please clarify what you mean by "run multiple times in a row"? -- Thanks.

@Windwoes
Copy link
Author

@cmacfarl sure. So basically if I run the sample Vuforia webcam OpMode, stop it, and repeat 3 more times, the Linux kernel dies and the entire Android OS crashes, as you can see in this video.

@cmacfarl
Copy link
Collaborator

Ack. Thanks.

@cmacfarl
Copy link
Collaborator

This was tested over the weekend using two Moto E4's, an Anker hub, USB battery pack, and a Logitech C920, and was not reproducible by following the procedures in the video.

It's unclear from your text above whether or not your test with ZTE's followed the procedures documented in your video or whether your hotplug description refers to something else. Please clarify and supply an inventory of the hardware devices under test. Thanks.

@Windwoes
Copy link
Author

Windwoes commented Jan 15, 2019

@cmacfarl

Hardware used:

However, as you can see in the other videos, the crash occurs even without the meter and USB hub.

Also, to confirm that it is an issue with your UVC driver, I compiled and ran the the sample UVCCamera app which uses libuvc (the same library you use) and it ran multiple times with no issues at all.

Regarding the ZTE hotplug test, that is totally separate and unrelated, I was just trying to point out that there seems to be a general instability in your UVC driver.

@rgatkinson
Copy link
Collaborator

Nit (just for posterity): the libuvc used in the FTC SDK is not bug-for-bug compatible with saki's. :-)

@qwertychouskie
Copy link

This should be reported to Google as a Denial of Service (DoS) attack, a user-space app should NEVER be able to crash the kernel. BTW, sometimes getting reboots on our S5, may be this issue. Will test soon.

@Windwoes
Copy link
Author

@qwertychouskie I believe the Tech Team's UVC driver may be triggering a bug in the USB driver stack, as the syslog mentions PC is at xhci_free_segments_for_ring+0x2c/0x88, and xhci is the USB driver.

@qwertychouskie
Copy link

qwertychouskie commented Jan 19, 2019

Confirmed to reproduce on the Galaxy S5 with out team's Auto program (https://github.com/FTCTeam10298/2018-19-code). The first 4th run we got an emergency stop, something about calling getCameraFocus (or something similar) on a null object. We did Restart Robot, then it kept saying that the camera could not be found (as in when it is not plugged in). Unplugging the MicroUSB OTG cord from the phone and plugging it back in cleared this. The second 4th run, the phone froze and rebooted. Once someone figures out what kernel code is crashing, this should be reported as a kernel bug.

@Windwoes
Copy link
Author

@qwertychouskie any way you could grab the syslog for your kernel crash so we can verify that you're seeing the issue at

(_Z22uvc_user_callback_mainP17uvc_stream_handle+274)

as well?

@qwertychouskie
Copy link

@FLOAT23 Not to be "that guy", but the Rev Expansion Hubs are way better than the Modern Robotics modules, especially when it comes to handling hot reconnects, I highly recommend the Rev hub. We use 2 Rev hubs and the Galaxy S5s and it's great.

@Windwoes
Copy link
Author

@cmacfarl @rgatkinson any update on whether this will be fixed in the foreseeable future? If not then I'll probably work on integrating my own UVC driver.

@sbdevelops
Copy link

sbdevelops commented Apr 11, 2019

My team switched to using a Logitech C920 recently, and I've been testing autonomous over the past couple days, noticing occasional crashes when starting autonomous (which immediately is supposed to start Vuforia/TFOD). From all of my research, this driver issue seems to be the cause. I've experienced the same crash symptoms as the video produced by @FROGbots-4634. I'll post syslogs next time a crash occurs. Samsung Galaxy S5 is being used as our RC phone.
@cmacfarl @rgatkinson With my team going to Houston in a few days, how do I ensure this does not occur during competition?

@Windwoes
Copy link
Author

@sbdevelops you can ensure it won't happen during competition by never running it more than 3 times in a row without unplugging/replugging the phone.

@Windwoes
Copy link
Author

@rgatkinson @cmacfarl will this be addressed in v5.x?

@Windwoes
Copy link
Author

@cmacfarl @rgatkinson I have confirmed this issue still exists in SDK v5.2, and running on another device: 1st gen Pixel XL running 7.1.2. While the symptoms are not exactly the same as the Nexus 5, the UVC driver can still cause kernel to crash if run multiple times. I definitely think this warrants investigation....

@Windwoes
Copy link
Author

Windwoes commented Oct 7, 2019

@cmacfarl @rgatkinson this issue also affects the Moto G5 Plus, again in a slightly different manner. On the G5 I was not able to get the Linux kernel to crash, but I was able to get the OpMode to crash very occasionally with "IllegalArgumentException - pointer must not be null" but more reliably I can get the camera to fail to initialize and seemingly disappear from the USB bus and then re-appear a second later (Message "Warning: unable to find Webcam 1" appears for a second).

@Windwoes
Copy link
Author

Fixed in v5.5

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants