Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too long restart on rpi2, 3 #653

Closed
a-x- opened this issue Jan 27, 2018 · 81 comments
Closed

Too long restart on rpi2, 3 #653

a-x- opened this issue Jan 27, 2018 · 81 comments
Labels

Comments

@a-x-
Copy link

a-x- commented Jan 27, 2018

Apprx. 1 munite restart via systemctl

Happened between 1.1.1-23-gd65b8e8 (i'm not sure) and 3.1.7 versions

@a-x-
Copy link
Author

a-x- commented Jan 27, 2018

@a-x-
Copy link
Author

a-x- commented Jan 27, 2018

shairport-sync -V
3.1.7-OpenSSL-Avahi-ALSA-soxr-metadata-sysconfdir:/etc

@a-x-
Copy link
Author

a-x- commented Jan 27, 2018

I'm not sure how to show systemd logs

journalctl 
No journal files were found.

I reconfigure it right now:
/etc/systemd/journald.conf:

[Journal]
Storage=persistent

@mikebrady
Copy link
Owner

Thans for the post, but it's rather hard to work out what the issue is. What is 1.1.1-23-gd65b8e8?

BTW, you need to be privileged to read the journal, and you don't need to alter its setting. Try, e.g:

$sudo journalctl…

@Alphakilo
Copy link

Alphakilo commented Feb 4, 2018

@a-x- :
Try journalctl --system or sudo journalctl --system.
The output of systemd-analyze blame --no-pager --system would be interesting as well.

@davidhq
Copy link

davidhq commented May 20, 2018

I also noticed this on RaspberryPi, I'm on latest commit:

commit d2713a40a7833cf1ef74f6691561fc8c77534e07
Author: Mike Brady <[email protected]>
Date:   Sat May 19 09:39:25 2018 +0100

Also, when it works, works great, but lot of times, streaming from iPhone produces no sound and also forked-daapd cannot stream (checking the speaker immediately unchecks it).

The only solution is to restart the whole device since sudo systemctl restart shairport-sync hangs......

May 20 12:35:05 ela systemd[1]: shairport-sync.service: State 'stop-sigterm' timed out. Killing.
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Killing process 477 (shairport-sync) with signal SIGKILL.
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Main process exited, code=killed, status=9/KILL
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Unit entered failed state.
May 20 12:35:05 ela systemd[1]: shairport-sync.service: Failed with result 'timeout'.

@mikebrady
Copy link
Owner

mikebrady commented May 20, 2018

Thanks for the report. Would you be kind enough to set the log_verbosity to 2 in the settings file and try to capture a log while this problem is occurring? That should make it easier to track down. Also, if you could say what kind of Pi and what versions of the OS, if would be very helpful.

[Update] Also, the output of $ systemd-analyze blame --no-pager --system, as suggested by @Alphakilo, would be useful too! Thanks.

@davidhq
Copy link

davidhq commented May 21, 2018

Linux living-room 4.14.34-v7+ #1110 SMP Mon Apr 16 15:18:51 BST 2018 armv7l GNU/Linux
2018-04-18-raspbian-stretch-lite

Both RPi 3B and 3B+ instances

I set log level to 2 on one of them, since then I could restart shairport-sync (!!)... so no report from this one, it just happened on another, couldn't restart again, I then set verbosity to 2 and rebooted the machine (it also took some time)...

I think shairport gets stuck probably somehow because I noticed the issue again by trying to set the output of forked-daapd to this instance and it immediately turned off..

Something is not the best with shairport besides this restarting issue, this is probably connected...

So, again, I noticed: cannot send output from forked-daapd, cannot play from iPhone (seems to be playing and the song progresses, but nothing plays).

Report after reboot:

$ systemd-analyze blame --no-pager --system
          6.907s dhcpcd.service
          4.653s hciuart.service
          3.101s apt-daily.service
          2.769s dev-mmcblk0p2.device
          1.271s raspi-config.service
           769ms networking.service
           676ms lighttpd.service
           659ms keyboard-setup.service
           482ms ssh.service
           433ms dphys-swapfile.service
           338ms ntp.service
           313ms systemd-fsck@dev-disk-by\x2dpartuuid-c7cb7e34\x2d01.service
           301ms systemd-udev-trigger.service
           265ms systemd-fsck-root.service
           234ms systemd-logind.service
           234ms systemd-udevd.service
           198ms systemd-tmpfiles-setup-dev.service
           191ms rsyslog.service
           177ms fake-hwclock.service
           163ms systemd-journald.service
           149ms avahi-daemon.service
           148ms plymouth-start.service
           129ms sys-kernel-debug.mount
           125ms bluetooth.service
           111ms systemd-remount-fs.service
           108ms dev-mqueue.mount
           100ms run-rpc_pipefs.mount
            94ms wifi-country.service
            88ms systemd-tmpfiles-setup.service
            82ms systemd-sysctl.service
            79ms kmod-static-nodes.service
            70ms triggerhappy.service
            69ms systemd-update-utmp.service
            69ms systemd-modules-load.service
            69ms systemd-journal-flush.service
            67ms alsa-restore.service
            63ms rc-local.service
            57ms nfs-config.service
            55ms plymouth-read-write.service
            53ms console-setup.service
            47ms systemd-user-sessions.service
            47ms sys-kernel-config.mount
            39ms systemd-rfkill.service
            37ms systemd-random-seed.service
            35ms plymouth-quit.service
            32ms systemd-update-utmp-runlevel.service
            31ms plymouth-quit-wait.service
            22ms boot.mount

@mikebrady
Copy link
Owner

Thanks for this. All that seems fine. I have it running on a 3B here on Stretch Lite 4.14.41-v7+ #1113 without problems, so I wonder, what other applications do you have running on the Pi?

Something that might be worth trying is to delay automatic startup for a few seconds -- see here in the TROUBLESHOTING.md page.

Is it possible to get the output of $ sudo systemctl status shairport-sync?

@davidhq
Copy link

davidhq commented May 22, 2018

You are right, I forgot to turn off wifi power management, after 1 day of testing with power management turned off, for now no issues with shairtport-sync — Thank you!

I have an update for your document (I can make a pull request if you want), so this:

Alternatively, (also for the Raspberry Pi), add the following line:

wireless-power off
to the file /etc/network/interfaces.

was never working for me, tried again, power management=on after reboot again (so it didn't work)...

What works:

sudo nano /etc/network/if-up.d/off-power-manager

Type:

#!/bin/sh
/sbin/iwconfig wlan0 power off

sudo chmod +x /etc/network/if-up.d/off-power-manager

@davidhq
Copy link

davidhq commented May 24, 2018

UPDATE:

The problem still present but much less frequently... I managed to confuse shairport by trying to rewing a track streaming from iPhone Music app directly to shairport instance.. it became unresponsive, I sshed into pi, tried to stop the service, it hung... then I rebooted (it also took a minute or two). Power management is off, unfortunately logging was disabled on this instance.. I enabled it, waiting for this to happen again, then I'll send the logs

@davidhq
Copy link

davidhq commented May 27, 2018

UPDATE:

Problem is present with forked-daapd source as well (so not just streaming from iPhone)

I managed to capture the logs:

May 27 20:20:28 lab shairport-sync[494]: Already playing.
May 27 20:20:28 lab shairport-sync[494]: Connection 13: RTSP thread terminated.
May 27 20:31:58 lab shairport-sync[494]: New RTSP connection from 192.168.0.10:36868 to self at 192.168.0.92:5000 on conversation thread 14.
May 27 20:31:59 lab shairport-sync[494]: Connection 14: ANNOUNCE
May 27 20:31:59 lab shairport-sync[494]: RTSP Conversation thread 8 already playing when asked by thread 14.
May 27 20:31:59 lab shairport-sync[494]: ANNOUNCE failed to get the player
May 27 20:31:59 lab shairport-sync[494]: Already playing.
May 27 20:31:59 lab shairport-sync[494]: Connection 14: RTSP thread terminated.

I don't think "Already playing." part is true... because nothing was playing for a long time. Maybe what is different with my setup is that I have three different forked-daapd libraries as source and I play from different ones at different times... of course I have to stop the play from some forked-daapd source before starting from another.

After service stop issued:

May 27 20:36:32 lab shairport-sync[494]: Request to shut down all rtsp conversation threads
May 27 20:36:32 lab shairport-sync[494]: asking playing threads to stop
May 27 20:38:02 lab systemd[1]: shairport-sync.service: State 'stop-sigterm' timed out. Killing.
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Killing process 494 (shairport-sync) with signal SIGKILL.
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Main process exited, code=killed, status=9/KILL
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Unit entered failed state.
May 27 20:38:02 lab systemd[1]: shairport-sync.service: Failed with result 'timeout'.

So as you can see - 2 min delay before anything happens...

Hope something can be seen from this?

Tried attaching gdb but I already issued stop command and I didn't manage to find anything in 2 minutes because I don't really know how to use gdb correctly in this case (should I try with something else when it happens again?).

@mikebrady
Copy link
Owner

Many thanks for your great work and for your patience. It seems that, for some reason, Shairport Sync is hanging up in certain circumstances -- not correctly terminating a session.

As you'll have seen from recent comments in the development branch, some bugs have recently been fixed. The version you are using incorporates these fixes, so it would be very interesting for me to see if we could catch Shairport Sync [still] locking up. Of course, it may be unrelated.

So, if you could update to the latest development version, please, and set the log verbosity to 2, it would be a big help. The amount of output on log level 2 has been modified so that it will print RTSP messages coming from the sources (all except for one very common one), so we should be able to see the requests for a play session and for the termination of a session coming in, and get an idea what's happening. It shouldn't be too chatty. The idea is that if it happens again, you would have a history in the log of all the calls to set up and terminate the now-hung-up session and of the session that was refused because the hung-up session was already playing.

Using gdb for this would be very difficult -- effectively impossible IMHO -- because it would require manual intervention from time to time.

@mikebrady
Copy link
Owner

As a matter of interest, what output device are you using please?

@davidhq
Copy link

davidhq commented May 29, 2018

Hi,

Raspberry Pi B and B+ via mostly USB soundcard (hw1,0) and one RPi uses HifiBerry Digi+.

I have new report with development version:

May 29 10:01:03 lab shairport-sync[504]:   Type: "CSeq", content: "2"
May 29 10:01:03 lab shairport-sync[504]:   Type: "Server", content: "AirTunes/105.1"
May 29 10:01:03 lab shairport-sync[504]: Connection 7: RTSP thread terminated.
May 29 10:01:20 lab shairport-sync[504]: New RTSP connection from 192.168.0.60:58298 to self at 192.168.0.92:5000 on conversation thread 8.
May 29 10:01:20 lab shairport-sync[504]: RTSP thread 8 received an RTSP Packet of type "ANNOUNCE":
May 29 10:01:20 lab shairport-sync[504]:   Type: "CSeq", content: "2"
May 29 10:01:20 lab shairport-sync[504]:   Type: "User-Agent", content: "forked-daapd/26.0"
May 29 10:01:20 lab shairport-sync[504]:   Type: "Client-Instance", content: "5AFCFBF5BBCE89AD"
May 29 10:01:20 lab shairport-sync[504]:   Type: "DACP-ID", content: "5AFCFBF5BBCE89AD"
May 29 10:01:20 lab shairport-sync[504]:   Type: "Content-Type", content: "application/sdp"
May 29 10:01:20 lab shairport-sync[504]:   Type: "Apple-Challenge", content: "KXjVZVPPIWLWw5Ja841Rpw"
May 29 10:01:20 lab shairport-sync[504]:   Type: "Content-Length", content: "570"
May 29 10:01:20 lab shairport-sync[504]: Connection 8: ANNOUNCE
May 29 10:01:20 lab shairport-sync[504]: RTSP Conversation thread 6 already playing when asked by thread 8.
May 29 10:01:20 lab shairport-sync[504]: ANNOUNCE failed to get the player
May 29 10:01:20 lab shairport-sync[504]: Already playing.
May 29 10:01:20 lab shairport-sync[504]: RTSP thread 8: RTSP Response:
May 29 10:01:20 lab shairport-sync[504]:   Type: "Apple-Response", content: "26aZ0SBsS5PBTr1E75XX+cG6yScNT2pvI/G0CqGM5mmb6+/Pbx/iUvpcvPpKK0TBzVLn2L2+Ju72SGfVw4UAk13qx6k1K0JpIudUVAghOcjQmV6j680euJZTMGHH0f/mRBKV5cSmAgec6ll6DCJgo5azLrnJdeOeAFmh7jkt1KbpfDncftn1559jA3GrIDICpaJmIiMSgoaSrxhuZDyYAzUGfbW0f3SzOPRt1mkiZEOZkzUsxYVz9muW6Xg1SShv25vGHCCsoL9FYtK1pc5arryj6R5851qz8KaMrjkq38pr9Xx9kZtR0dw/2KoXyy4pH0G41eylAzJiYiZ0+xd5kw"
May 29 10:01:20 lab shairport-sync[504]:   Type: "CSeq", content: "2"
May 29 10:01:20 lab shairport-sync[504]:   Type: "Server", content: "AirTunes/105.1"
May 29 10:01:20 lab shairport-sync[504]: Connection 8: RTSP thread terminated.

So on one of the devices I cannot play again, It's loading the connection to speaker in mobile app (Remote), then fails... Again Already playing. ...

@mikebrady
Copy link
Owner

Thanks. Do you have the full "history" where thread 6 was initiated and where an attempt was make to tear it down please?

@davidhq
Copy link

davidhq commented May 29, 2018

@mikebrady
Copy link
Owner

Fantastic, thanks. Let me have a look. This is great.

@mikebrady
Copy link
Owner

So, it looks like a TEARDOWN request for thread 6 was received by not fully acted upon. Let me think a bit further about this. This is really good stuff.

@davidhq
Copy link

davidhq commented May 29, 2018

Nice, I'm happy the issue is found... today I'll dedicate some time for studying the source as well... I should probably get back to reading more C code. Then I'll try to understand how the fix is done.

@mikebrady
Copy link
Owner

Thanks. I'm almost certain that the problem is one of overlapping concurrent execution of critical sections of threads. It seems that forked-daapd can cause this (not it's fault), so I will play around with it can see if can induce it. I may push out an update that drops down to a greater level of detail for a TEARDOWN...

@mikebrady
Copy link
Owner

Hello again. I've just pushed an update which should give some more debug information during a TEARDOWN, so it might offer a little more precision on where the problem is arising. If you would update to it, I'd be most grateful. Then if the same scenario as you looked above happened, we could see how far the TEARDOWN got before it stalled... Meantime, I'd installing forked-daapd...

@davidhq
Copy link

davidhq commented May 29, 2018

Great, I updated on all 5 players and will send a log when the problem happens again in a day or two...... thank you

@davidhq
Copy link

davidhq commented May 30, 2018

New log:

https://gist.github.com/davidhq/1aecb28d409d2814bd862768fcb941a4

I noticed one of the players with the weakest wifi has the most issues... and the one on ethernet cable never had this problem occurring. Another one with a bit problematic wifi (when it works, it works, but sometimes the signal doesn't come through) also showed this problem.

Also you mentioned that forked-daapd can cause this, but as I mentioned also streaming from latest iOS as well... since I figured out this is not isolated to streaming from iOS, I'm only testing as I actually use it most of the time - via forked-daapd.

@mikebrady
Copy link
Owner

That's really good, thanks.

@davidhq
Copy link

davidhq commented May 30, 2018

OK, here's from another instance where it just happened:

journalctl:

https://gist.github.com/davidhq/58d70fe3e5861bc292977de52da8ff0c

/var/log/syslog:

https://gist.github.com/davidhq/7cb29f906ad6ce73a7dc3ed668833b37

@mikebrady
Copy link
Owner

It's the same hang-up in all these cases. I might be coming back with an update to get a bit more forensic at that point.

@mikebrady
Copy link
Owner

mikebrady commented May 30, 2018

Hello again. I am afraid I'm having difficulty figuring out what's happening (still), so I have pushed yet another update with more debug messages in the hope that I can zero-in on where the problem is. It's a "strong signal", since it seems to be rather readily reproducible, but I can't think what it is. Perhaps, whenever you get the opportunity, you would [yet again] update and see if you can capture another incident. BTW, if you could also turn on the "delta" timing, it might turn out to be useful -- do this by setting the configuration file's diagnostics section's log_show_time_since_last_message setting to "yes" .

Here is what you'd get if a TEARDOWN went fully to plan:

       10.915953597|RTSP thread 2 received an RTSP Packet of type "TEARDOWN":
         0.001060988|  Type: "CSeq", content: "10"
         0.000735991|  Type: "Session", content: "1"
         0.000774991|  Type: "User-Agent", content: "iTunes/12.7.5 (Macintosh; OS X 10.13.4) (dt:1)"
         0.000723992|  Type: "Client-Instance", content: "900B806C3A8FAA81"
         0.000710992|  Type: "DACP-ID", content: "900B806C3A8FAA81"
         0.000479994|  Type: "Active-Remote", content: "3634232794"
         0.000212998|Connection 2: TEARDOWN
         0.000488994|TEARDOWN: synchronously terminating the player thread of RTSP conversation thread 2 (2).
         0.000746991|player_stop
         0.000584993|player_thread_lock acquired
         0.000671993|player_thread exists
         0.000677992|player_thread signalled
         0.000212997|buffer_get_frame exiting due to thread stop request.
         0.000099999|Connection 2: player thread main loop exit.
         0.000081999|Requesting output device to stop.
         0.000069000|Cancelling timing, control and audio threads
         0.002349972|Timing Receiver Cleanup.
         0.003647958|Timing Receiver Cleanup Successful.
         0.003739957|Control Receiver Cleanup.
         0.000257997|Control Receiver Cleanup Successful.
         0.002294973|Audio Receiver Cleanup.
         0.001574982|Audio Receiver Cleanup Successful.
         0.001836979|Joining terminated threads.
         0.002498971|Timing thread terminated.
         0.001032988|Control thread terminated.
         0.000121999|Audio thread terminated.
         0.000073999|Freeing audio buffers and decoders.
         0.003806956|Connection 2: player thread terminated.
         0.001969977|pend
         0.001537982|TEARDOWN: successful termination of playing thread of RTSP conversation thread 2.
         0.001614982|RTSP thread 2: RTSP Response:
         0.000846990|  Type: "CSeq", content: "10"
         0.000199998|  Type: "Server", content: "AirTunes/105.1"
         0.000765991|  Type: "Connection", content: "close"
         2.016539702|Connection 2: RTSP thread terminated.

My slight guess is that it won't get past player_stop, but it would be nice to confirm.

@davidhq
Copy link

davidhq commented May 30, 2018

Ok, no problem. I'm in the mood for helping with this, so I can do as many updates as needed.. and as you say, the problem happens reliably each day so there is steady progress.

@davidhq
Copy link

davidhq commented May 31, 2018

May 31 19:50:05 lab shairport-sync[498]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7981.8,     1506.8,    26162.0.
May 31 19:50:25 lab shairport-sync[498]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7981.3,     1834.1,    30734.0.
May 31 19:50:45 lab shairport-sync[498]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7981.3,     1388.4,    22397.0.
May 31 19:51:03 lab shairport-sync[498]: RTSP thread 4 received an RTSP Packet of type "TEARDOWN":
May 31 19:51:03 lab shairport-sync[498]:   Type: "CSeq", content: "67"
May 31 19:51:03 lab shairport-sync[498]:   Type: "User-Agent", content: "forked-daapd/26.0"
May 31 19:51:03 lab shairport-sync[498]:   Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 19:51:03 lab shairport-sync[498]:   Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 19:51:03 lab shairport-sync[498]:   Type: "Session", content: "1"
May 31 19:51:03 lab shairport-sync[498]: Connection 4: TEARDOWN
May 31 19:51:03 lab shairport-sync[498]: TEARDOWN: synchronously terminating the player thread of RTSP conversation thread 4 (2).
May 31 19:51:03 lab shairport-sync[498]: player_stop
May 31 19:51:03 lab shairport-sync[498]: player_thread_lock acquired
May 31 19:51:03 lab shairport-sync[498]: player_thread exists
May 31 19:51:03 lab shairport-sync[498]: player_thread signalled
May 31 20:17:01 lab CRON[2324]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 31 20:47:23 lab shairport-sync[498]: New RTSP connection from 192.168.0.10:59820 to self at 192.168.0.93:5000 on conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: Successfully created RTSP receiver thread 5.
May 31 20:47:23 lab shairport-sync[498]:     CSeq: 1.
May 31 20:47:23 lab shairport-sync[498]:     User-Agent: forked-daapd/26.0.
May 31 20:47:23 lab shairport-sync[498]:     Client-Instance: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]:     DACP-ID: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5 received an RTSP Packet of type "OPTIONS":
May 31 20:47:23 lab shairport-sync[498]:   Type: "CSeq", content: "1"
May 31 20:47:23 lab shairport-sync[498]:   Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]:   Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]: Connection 5: OPTIONS
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5: RTSP Response:
May 31 20:47:23 lab shairport-sync[498]:   Type: "CSeq", content: "1"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Server", content: "AirTunes/105.1"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Public", content: "ANNOUNCE, SETUP, RECORD, PAUSE, FLUSH, TEARDOWN, OPTIONS, GET_PARAMETER, SET_PARAMETER"
May 31 20:47:23 lab shairport-sync[498]:     CSeq: 2.
May 31 20:47:23 lab shairport-sync[498]:     User-Agent: forked-daapd/26.0.
May 31 20:47:23 lab shairport-sync[498]:     Client-Instance: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]:     DACP-ID: 57F0E53515CB41F9.
May 31 20:47:23 lab shairport-sync[498]:     Content-Type: application/sdp.
May 31 20:47:23 lab shairport-sync[498]:     Apple-Challenge: vPuBnEiuIi/F09+PMKwGNQ.
May 31 20:47:23 lab shairport-sync[498]:     Content-Length: 570.
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5 received an RTSP Packet of type "ANNOUNCE":
May 31 20:47:23 lab shairport-sync[498]:   Type: "CSeq", content: "2"
May 31 20:47:23 lab shairport-sync[498]:   Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]:   Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Content-Type", content: "application/sdp"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Apple-Challenge", content: "vPuBnEiuIi/F09+PMKwGNQ"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Content-Length", content: "570"
May 31 20:47:23 lab shairport-sync[498]: Connection 5: ANNOUNCE
May 31 20:47:23 lab shairport-sync[498]: RTSP Conversation thread 4 already playing when asked by thread 5.
May 31 20:47:23 lab shairport-sync[498]: ANNOUNCE failed to get the player
May 31 20:47:23 lab shairport-sync[498]: Already playing.
May 31 20:47:23 lab shairport-sync[498]: RTSP thread 5: RTSP Response:
May 31 20:47:23 lab shairport-sync[498]:   Type: "Apple-Response", content: "2MHjszAIxTCCpb/oPou3wK3eftTO09IRhRZfeww6yvtpOZXPZseWDCqoCUNsPximsfLoi5z8BJZGyfYHGZ8oEUvqgxSZQeZusEdOEFX9uXODXc2M1OjQR9/p3GmoSkQjyJGjbKZciGkP/4JtwfAEJt1+9Ga8FIoMGToEkYCFZUKq5xSs4Zzcxj+I3UGXSXue2natVnp2F/bWxyh6boBZjbeoxz+JinuckVBmqW2aSyhZpm4Tw2QuvsdaxQxRQL7HANpds3aUS1ad4JDm+go9y7f0Xx139MJbMjrY60X75GAm2lxs77TlGSfXtcHwPHdEyPg4ib6Ig+0q6Ui4o4PXyA"
May 31 20:47:23 lab shairport-sync[498]:   Type: "CSeq", content: "2"
May 31 20:47:23 lab shairport-sync[498]:   Type: "Server", content: "AirTunes/105.1"
May 31 20:47:23 lab shairport-sync[498]: RTSP conversation thread 5 -- connection closed.
May 31 20:47:23 lab shairport-sync[498]: Synchronously terminate playing thread of RTSP conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: player_stop
May 31 20:47:23 lab shairport-sync[498]: player_thread_lock acquired
May 31 20:47:23 lab shairport-sync[498]: player thread of RTSP conversation 5 is already deleted.
May 31 20:47:23 lab shairport-sync[498]: Successful termination of playing thread of RTSP conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: Request termination of RTSP conversation thread 5.
May 31 20:47:23 lab shairport-sync[498]: Connection 5: RTSP thread terminated.
May 31 20:48:23 lab shairport-sync[498]: found RTSP connection thread 5 in a non-running state.
May 31 20:48:23 lab shairport-sync[498]: RTSP connection thread 5 deleted...
May 31 20:55:28 lab shairport-sync[498]: New RTSP connection from 192.168.0.10:38810 to self at 192.168.0.93:5000 on conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: Successfully created RTSP receiver thread 6.
May 31 20:55:28 lab shairport-sync[498]:     CSeq: 1.
May 31 20:55:28 lab shairport-sync[498]:     User-Agent: forked-daapd/26.0.
May 31 20:55:28 lab shairport-sync[498]:     Client-Instance: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]:     DACP-ID: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6 received an RTSP Packet of type "OPTIONS":
May 31 20:55:28 lab shairport-sync[498]:   Type: "CSeq", content: "1"
May 31 20:55:28 lab shairport-sync[498]:   Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]:   Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]: Connection 6: OPTIONS
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6: RTSP Response:
May 31 20:55:28 lab shairport-sync[498]:   Type: "CSeq", content: "1"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Server", content: "AirTunes/105.1"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Public", content: "ANNOUNCE, SETUP, RECORD, PAUSE, FLUSH, TEARDOWN, OPTIONS, GET_PARAMETER, SET_PARAMETER"
May 31 20:55:28 lab shairport-sync[498]:     CSeq: 2.
May 31 20:55:28 lab shairport-sync[498]:     User-Agent: forked-daapd/26.0.
May 31 20:55:28 lab shairport-sync[498]:     Client-Instance: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]:     DACP-ID: 57F0E53515CB41F9.
May 31 20:55:28 lab shairport-sync[498]:     Content-Type: application/sdp.
May 31 20:55:28 lab shairport-sync[498]:     Apple-Challenge: kSF8+vlBKk3ZcSrFo09maA.
May 31 20:55:28 lab shairport-sync[498]:     Content-Length: 570.
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6 received an RTSP Packet of type "ANNOUNCE":
May 31 20:55:28 lab shairport-sync[498]:   Type: "CSeq", content: "2"
May 31 20:55:28 lab shairport-sync[498]:   Type: "User-Agent", content: "forked-daapd/26.0"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Client-Instance", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]:   Type: "DACP-ID", content: "57F0E53515CB41F9"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Content-Type", content: "application/sdp"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Apple-Challenge", content: "kSF8+vlBKk3ZcSrFo09maA"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Content-Length", content: "570"
May 31 20:55:28 lab shairport-sync[498]: Connection 6: ANNOUNCE
May 31 20:55:28 lab shairport-sync[498]: RTSP Conversation thread 4 already playing when asked by thread 6.
May 31 20:55:28 lab shairport-sync[498]: ANNOUNCE failed to get the player
May 31 20:55:28 lab shairport-sync[498]: Already playing.
May 31 20:55:28 lab shairport-sync[498]: RTSP thread 6: RTSP Response:
May 31 20:55:28 lab shairport-sync[498]:   Type: "Apple-Response", content: "atyW1PGIXNYBwl+F1/FKenUibQ4nm0JZusbfMob9uZRG3l773KfbxeqFfISv8fKa0NTwqVqQg/GKFC+a8jnJnSplFffutHyG7znvDYZmlr1vcFeRkeQqpm3fTHwCQVQSyktcGchX/nypMsLKDCQTuQhPG6vJGLswwMX4Kfc8OyvoAoW+GJ6zmqDPjP5H+t+Qev/sgl7a4+Co2SajDh86uQWyqD5IsNV2JOKzbzLN1ArHl/EEOlNQU5DnQTOUTDnz+1fI8YZW3jaEQB7PZwbAdnMrbCD3R2QXLqvTXvbw74mBfVQ+dlliI9RYcGiD2O0spiQdHnyo4I4essjjmOBFJg"
May 31 20:55:28 lab shairport-sync[498]:   Type: "CSeq", content: "2"
May 31 20:55:28 lab shairport-sync[498]:   Type: "Server", content: "AirTunes/105.1"
May 31 20:55:28 lab shairport-sync[498]: RTSP conversation thread 6 -- connection closed.
May 31 20:55:28 lab shairport-sync[498]: Synchronously terminate playing thread of RTSP conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: player_stop
May 31 20:55:28 lab shairport-sync[498]: player_thread_lock acquired
May 31 20:55:28 lab shairport-sync[498]: player thread of RTSP conversation 6 is already deleted.
May 31 20:55:28 lab shairport-sync[498]: Successful termination of playing thread of RTSP conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: Request termination of RTSP conversation thread 6.
May 31 20:55:28 lab shairport-sync[498]: Connection 6: RTSP thread terminated.
May 31 20:56:28 lab shairport-sync[498]: found RTSP connection thread 6 in a non-running state.
May 31 20:56:28 lab shairport-sync[498]: RTSP connection thread 6 deleted...

@mikebrady
Copy link
Owner

mikebrady commented Jun 14, 2018

Hi David. Your logs do indicate that there are problems remaining with terminating a play session reliably under some circumstances. Sadly, I can't reproduce the conditions that cause them. So, two things: one, I think enough progress has been made to move the fixes into a Release Candidate and, two, it seems to me that we should be moving to a slightly different system for terminating a player thread. It will take some time -- time I won't have for another couple of weeks. So, if you could show us some output with statistics enabled, that would be great, and meantime, as I get time, I'll start transitioning to a new thread termination system.

@davidhq
Copy link

davidhq commented Jun 16, 2018

Hello !

Here logs with stats:

Kitchen: https://gist.github.com/davidhq/6eb2a516ae70929eb797921a178eca72

music stuttered about 1 minute before I saved the log..

not sure if waiting for a mutex, maximum expected time of 30000 microseconds exceeded "player.c:802". debug_mutex_lock at "player.c:802" expected max wait: 0.030000000, actual wait: 0.030508996 sec. is indicative of this?

Lab log:
https://gist.github.com/davidhq/6a84833f08778af0a78a90751d861cbb

For lab I now do suspect some strange intermittent issues... Kitchen it should not happen...
I moved RaspberryPi in Lab on top of the speaker, before it was in front so that speaker was between the RPi and the Access Point. Not sure if magnetic field from speaker magnet can do something to the signal... will have to learn more about this... or I simply have a blind spot on this spot.. there is another RPi in the same room which doesn't seem to drop speed to 5MB/s like this "lab" one does. So this seem to be a separate issue for me to figure out.

Regarding stuttering in general, I also tried setting audio_backend_buffer_desired_length_in_seconds = 0.25 from default 0.15 for kitchen, didn't help... maybe the CPU is fast enough and something happens on the wifi or with processing of the wifi data.

Please let me know what is visible in the logs and thank you

@mikebrady
Copy link
Owner

Thanks for these. I don't have time to study them in detail yet -- it'll have to be mid-week next week. However, the Kitchen log is showing statistics that show that the sender's clock is way off -- look at all the adjustment of around -800 ppm. This is way outside what would be normal now. I've seen Macs needing adjustment of 130 ppm, but I've only ever heard of this before with some old Dell laptops. The other thing is that somehow lots and lots of UDP packets are being lost -- it seems that almost 10% of UDP packets are being lost. For example, at the end of the log, it seems that 57,008 resend requests were made for a total number of packets of 674,016. That is extremely high. To eliminate the possibility that it's a faulty source, can you try another, e.g. an iPhone or an iTunes machine?

@davidhq
Copy link

davidhq commented Jun 16, 2018

Hmm interesting, however all clocks are in sync via ntp, I also checked now...

I use two instances of forked-daapd: on a beefy server on LAN and another on one of the Raspberries.

I also tried streaming from a Mac:

Jun 16 16:29:07 outside shairport-sync[14542]:        3.0,    -723.1,     723.1,        1003,      0,      0,      0,      0,   6140,  246,  263
Jun 16 16:29:15 outside shairport-sync[14542]:        1.4,   -1084.8,    1084.8,        2006,      0,      0,      0,      0,   6076,  258,  262
Jun 16 16:29:23 outside shairport-sync[14542]:        1.1,    -861.1,     861.1,        3009,      0,      0,      0,      0,   6109,  257,  262
Jun 16 16:29:31 outside shairport-sync[14542]:        1.0,    -914.9,     914.9,        4012,      0,      0,      0,      0,   6079,  257,  262
Jun 16 16:29:39 outside shairport-sync[14542]:        1.0,    -863.9,     863.9,        5015,      0,      0,      0,      0,   6072,  258,  263
Jun 16 16:29:47 outside shairport-sync[14542]:        1.0,    -917.7,     917.7,        6018,      0,      0,      0,      0,   6074,  258,  263
Jun 16 16:29:55 outside shairport-sync[14542]:        1.0,    -878.0,     878.0,        7021,      0,      0,      0,      0,   6074,  257,  262

net correction in ppm is similar as before (too high), but there is are no resend requests.

There is a lot of resend requests in all other cases (streaming from both instances of forked-daapd, also both are on a wire).

I checked for UDP packet loss with iperf3, most of the times there was 0% loss or 1-2% sometimes.

hmm indeed :S

@davidhq
Copy link

davidhq commented Jun 16, 2018

I think I have something:

I installed v3.1.1 (44fbe8b) on 3 of the devices:

Jun 16 17:09:00 kitchen shairport-sync[812]:        1.2,    -903.5,     903.5,        2006,      0,      0,      0,      0,   6124,  263,  265
Jun 16 17:09:08 kitchen shairport-sync[812]:        0.9,    -770.4,     781.7,        3009,      0,      0,      0,      0,   6127,  262,  265
Jun 16 17:09:16 kitchen shairport-sync[812]:        0.8,    -790.2,     790.2,        4012,      0,      0,      0,      0,   6117,  261,  265
Jun 16 17:09:24 kitchen shairport-sync[812]:        1.1,    -861.1,     861.1,        5015,      0,      0,      0,      0,   6082,  262,  265
Jun 16 17:09:32 kitchen shairport-sync[812]:        1.1,    -926.2,     926.2,        6018,      0,      0,      0,      0,   6107,  263,  265
Jun 16 17:09:40 kitchen shairport-sync[812]:        1.1,    -844.1,     844.1,        7021,      0,      0,      0,      0,   5966,  263,  265
Jun 16 17:09:48 kitchen shairport-sync[812]:        0.9,    -832.7,     832.7,        8024,      0,      0,      0,      0,   6118,  263,  265
Jun 16 17:09:56 kitchen shairport-sync[812]:        1.0,    -793.1,     793.1,        9027,      0,      0,      0,      0,   6094,  262,  265
Jun 16 17:10:04 kitchen shairport-sync[812]:        1.1,    -810.1,     810.1,       10030,      0,      0,      0,      0,   6182,  263,  265
Jun 16 17:08:55 outside shairport-sync[492]:        1.0,    -798.7,     798.7,       36108,      0,      0,      0,      0,   6243,  259,  265
Jun 16 17:09:03 outside shairport-sync[492]:        1.0,    -858.2,     858.2,       37111,      0,      0,      0,      0,   6118,  261,  265
Jun 16 17:09:11 outside shairport-sync[492]:        1.1,    -875.2,     875.2,       38114,      0,      0,      0,      0,   6128,  261,  265
Jun 16 17:09:19 outside shairport-sync[492]:        1.0,    -827.1,     827.1,       39117,      0,      0,      0,      0,   6262,  262,  265
Jun 16 17:09:27 outside shairport-sync[492]:        1.0,    -773.2,     773.2,       40120,      0,      0,      0,      0,   6226,  262,  265
Jun 16 17:09:35 outside shairport-sync[492]:        1.0,    -878.0,     878.0,       41123,      0,      0,      0,      0,   6209,  263,  265
Jun 16 17:09:43 outside shairport-sync[492]:        1.1,    -861.1,     861.1,       42126,      0,      0,      0,      0,   6173,  263,  265
Jun 16 17:09:51 outside shairport-sync[492]:        1.0,    -846.9,     846.9,       43129,      0,      0,      0,      0,   6257,  263,  265
Jun 16 17:09:59 outside shairport-sync[492]:        1.0,    -807.2,     807.2,       44132,      0,      0,      0,      0,   6278,  262,  265
Jun 16 17:10:07 outside shairport-sync[492]:        1.0,    -841.2,     841.2,       45135,      0,      0,      0,      0,   6183,  263,  265
Jun 16 17:10:15 outside shairport-sync[492]:        1.1,    -880.9,     880.9,       46138,      0,      0,      0,      0,   6087,  263,  265
Jun 16 17:10:23 outside shairport-sync[492]:        1.0,    -778.9,     778.9,       47141,      0,      0,      0,      0,   6198,  262,  265
Jun 16 17:10:31 outside shairport-sync[492]:        1.0,    -844.1,     844.1,       48144,      0,      0,      0,      0,   6155,  263,  265
Jun 16 17:10:39 outside shairport-sync[492]:        1.1,    -883.7,     883.7,       49147,      0,      0,      0,      0,   6177,  263,  265
Jun 16 17:10:47 outside shairport-sync[492]:        1.1,    -838.4,     838.4,       50150,      0,      0,      0,      0,   6104,  262,  265
Jun 16 17:09:32 midroom shairport-sync[361]:        0.1,      17.0,    1756.1,       54162,      0,      0,      0,      0,   6001,  263,  265
Jun 16 17:09:40 midroom shairport-sync[361]:        0.1,      -2.8,    1758.9,       55165,      0,      0,      0,      0,   5966,  263,  265
Jun 16 17:09:48 midroom shairport-sync[361]:        0.1,      48.2,    1764.6,       56168,      0,      0,      0,      0,   6095,  263,  265
Jun 16 17:09:56 midroom shairport-sync[361]:        0.1,      28.3,    1761.8,       57171,      0,      0,      0,      0,   6028,  262,  265
Jun 16 17:10:04 midroom shairport-sync[361]:        0.1,     -14.2,    1736.3,       58174,      0,      0,      0,      0,   5962,  253,  265
Jun 16 17:10:12 midroom shairport-sync[361]:       -0.0,      45.3,    1744.8,       59177,      0,      0,      0,      0,   6059,  263,  265
Jun 16 17:10:20 midroom shairport-sync[361]:        0.1,      17.0,    1790.1,       60180,      0,      0,      0,      0,   6018,  262,  265
Jun 16 17:10:28 midroom shairport-sync[361]:        0.0,      39.7,    1727.8,       61183,      0,      0,      0,      0,   6093,  263,  265
Jun 16 17:10:36 midroom shairport-sync[361]:        0.1,       2.8,    1741.9,       62186,      0,      0,      0,      0,   6093,  263,  265
Jun 16 17:10:44 midroom shairport-sync[361]:        0.0,      39.7,    1761.8,       63189,      0,      0,      0,      0,   6036,  262,  265
Jun 16 17:10:52 midroom shairport-sync[361]:       -0.0,      17.0,    1750.4,       64192,      0,      0,      0,      0,   6132,  262,  265
Jun 16 17:11:00 midroom shairport-sync[361]:       -0.0,      42.5,    1747.6,       65195,      0,      0,      0,      0,   5981,  254,  265

There is no more packet loss (resends).
I also added another Raspberry in perfect line of sight to the access point (3m from it). "net correction in ppm" is low on this one... so does this really have to do with clock or is wifi behind a wall the reason?

Latest development version still has lots of resends:

Jun 16 17:10:35 ela shairport-sync[512]:   Type: "Server", content: "AirTunes/105.1"
Jun 16 17:10:38 ela shairport-sync[512]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7620.9,     7061.5,    21934.0.
Jun 16 17:10:42 ela shairport-sync[512]:        0.8,    -787.4,     810.1,      227681,    415,  25021,    327,  31542,   5293,  117,  267
Jun 16 17:10:50 ela shairport-sync[512]:        0.9,    -781.7,     793.1,      228684,    415,  25309,    333,  31917,   5412,   88,  267
Jun 16 17:10:55 ela shairport-sync[512]: waiting for a mutex, maximum expected time of 30000 microseconds exceeded "player.c:802".
Jun 16 17:10:55 ela shairport-sync[512]: debug_mutex_lock at "player.c:802" expected max wait: 0.030000000, actual wait: 0.030251255 sec.
Jun 16 17:10:56 ela shairport-sync[512]: Packet reception interval stats: mean, standard deviation and max for the last 2,500 packets in microseconds:     7422.9,     8083.3,    23558.0.
Jun 16 17:10:58 ela shairport-sync[512]:        1.1,    -917.7,     917.7,      229687,    415,  25619,    354,  32305,   5447,   71,  267
Jun 16 17:11:06 ela shairport-sync[512]:        1.0,    -844.1,     844.1,      230690,    415,  25856,    354,  32542,   5500,  111,  267
Jun 16 17:11:14 ela shairport-sync[512]:        0.8,    -807.2,     829.9,      231693,    415,  25856,    354,  32542,   5664,  200,  267

Hopefully I didn't overlook something?

@mikebrady
Copy link
Owner

Thanks for the very interesting update. NTP only ensures the time-of-day is correct. If the system clock is running slow or fast, then the NTP protocol will make appropriate corrections to the time, but it doesn't actually speed up or slow down the clock itself, so I'm afraid that's not really relevant.

Given that the net correction in ppm for the Kitchen device is almost the same irrespective of the source, maybe the problem lies with the clock on the Kitchen device, that it's way out of whack. If that were so, then – unless you were very unlucky – the other Shairport Sync devices shouldn't register net correction in ppm value as large. If they were in the range ±150 ppm from the Mac, I think that would be okay. My recent experience with iOS on an iPhone 6 is that the correction will be less than 30 ppm on average, and will often be zero for long periods.

@mikebrady
Copy link
Owner

Our posts crossed. The midroom figures look more-or-less normal. The kitchen, outside and ela devices all have very high net correction figures and very high resend request levels.

If would be interesting indeed to put the latest version on the midroom and to put the version running on midroom on one of the other devices and see what happens. (It's very easy to get confused doing this -- believe me :))

It's hard to believe that you could have two Raspberry Pis with clocks that are so far out of whack. So, do you have something else running on them that might be loading up the USB or Etherenet subsystems? (It used to be that Pis would drop packets and other undesirable things if their USB ports were heavily loaded -- it seems the USB ports and the Ethernet port, and maybe WiFi, share the same output bus.)

@mikebrady
Copy link
Owner

I think I have something:

I installed v3.1.1 (44fbe8b) on 3 of the devices:

Jun 16 17:09:00 kitchen shairport-sync[812]: 1.2, -903.5, 903.5, 2006, 0, 0, 0, 0, 6124, 263, 265
Jun 16 17:09:08 kitchen shairport-sync[812]: 0.9, -770.4, 781.7, 3009, 0, 0, 0, 0, 6127, 262, 265
Jun 16 17:09:16 kitchen shairport-sync[812]: 0.8, -790.2, 790.2, 4012, 0, 0, 0, 0, 6117, 261, 265
Jun 16 17:09:24 kitchen shairport-sync[812]: 1.1, -861.1, 861.1, 5015, 0, 0, 0, 0, 6082, 262, 265
Jun 16 17:09:32 kitchen shairport-sync[812]: 1.1, -926.2, 926.2, 6018, 0, 0, 0, 0, 6107, 263, 265

What source were you using in this case?

@davidhq
Copy link

davidhq commented Jun 16, 2018

Actually midroom is on low values only when playing to default jack... I didn't take that into consideration before, so:

Jun 16 17:45:40 midroom shairport-sync[366]:   period_size = 256 frames (precisely).
Jun 16 17:45:40 midroom shairport-sync[366]:   buffer_time = 743038 us (>).
Jun 16 17:45:40 midroom shairport-sync[366]:   buffer_size = 32768 frames (>).
Jun 16 17:45:40 midroom shairport-sync[366]:   periods_per_buffer = 128 (precisely).
Jun 16 17:45:49 midroom shairport-sync[366]:       -2.9,     420.1,     677.4,        1003,      0,      0,      0,      0,   5862,  258,  264
Jun 16 17:45:57 midroom shairport-sync[366]:       -0.1,     104.8,    1685.3,        2006,      0,      0,      0,      0,   6084,  261,  264
Jun 16 17:46:05 midroom shairport-sync[366]:        0.0,      42.5,    1713.6,        3009,      0,      0,      0,      0,   6025,  262,  264
Jun 16 17:46:13 midroom shairport-sync[366]:        0.1,      22.7,    1722.1,        4012,      0,      0,      0,      0,   6070,  261,  264

when

defaults.ctl.card 0
defaults.pcm.card 0

and

Jun 16 18:00:30 midroom shairport-sync[352]:   periods_per_buffer = 341315 (>).
Jun 16 18:00:39 midroom shairport-sync[352]:        2.3,    -653.6,     653.6,        1003,      0,      0,      0,      0,   6267,  263,  264
Jun 16 18:00:47 midroom shairport-sync[352]:        1.2,    -900.7,     900.7,        2006,      0,      0,      0,      0,   6206,  262,  264
Jun 16 18:00:55 midroom shairport-sync[352]:        1.0,    -790.2,     790.2,        3009,      0,      0,      0,      0,   6175,  261,  264
Jun 16 18:01:03 midroom shairport-sync[352]:        1.1,    -863.9,     863.9,        4012,      0,      0,      0,      0,   6037,  252,  264
Jun 16 18:01:11 midroom shairport-sync[352]:        1.0,    -835.6,     835.6,        5015,      0,      0,      0,      0,   6124,  262,  264
Jun 16 18:01:27 midroom shairport-sync[352]:        1.1,    -827.1,     827.1,        6018,      0,      0,      0,      0,   6271,  262,  264
Jun 16 18:01:35 midroom shairport-sync[352]:        1.0,    -810.1,     810.1,        7021,      0,      0,      0,      0,   6281,  262,  264
Jun 16 18:01:43 midroom shairport-sync[352]:        1.1,    -810.1,     810.1,        8024,      0,      0,      0,      0,   6182,  261,  264

when

defaults.ctl.card 1
defaults.pcm.card 1

What really is different on 3.1.1 is no resends...

I couldn't read your two posts in detail right now (in hurry), I just managed to test what I'm reporting here... does it change anything?

@davidhq
Copy link

davidhq commented Jun 16, 2018

I always use forked-daapd

$ /usr/sbin/forked-daapd -v
Forked Media Server: Version 26.0
Copyright (C) 2009-2015 Julien BLACHE <[email protected]>
Based on mt-daapd, Copyright (C) 2003-2007 Ron Pedde <[email protected]>
Released under the GNU General Public License version 2 or later

@davidhq
Copy link

davidhq commented Jun 16, 2018

USB soundcard in all cases is:

https://www.aliexpress.com/item/Ugreen-External-USB-Audio-Sound-Card-Mic-Adapter-Speaker-3-5mm-Jack-Stereo-Audio-Cable-Headset/32802432756.html

Components:
Realtek ALC4040

http:https://www.wpgholdings.com/yosung/news_detail/zhtw/program/21388

ALC4040 Series
• Tensilica USB Audio Core inside
• Digital-to-Analog Converter with 100dBA SNR
• Analog-to-Digital Converter with 94dBA SNR
• Stereo digital microphone and analog microphone inputs
• Power management and enhanced power saving
• Single digital power supply from 1.6v to 3.6v.
• Small Package : QFN48 6mmx6mm, CSP28  4mmx3.5mm

(I broken-open one of them), probably not relevant, but still

@mikebrady
Copy link
Owner

Thanks for all this. It is possible to turn off resend requests in the development version. There is a diagnostics section in the configuration file and you can turn it off there. It may be that the resend requester is too aggressive.

@mikebrady
Copy link
Owner

Right, so it looks like the USB DACs might be the cause of the very large level of net correction. That's a (possible) discovery. If you could show some stats of a system that has been generating lots of resend requests but now with them turned off, I wonder how what fraction of packets are actually missed in the end... If it turns out to be small, then I could lower the action of the resend requester...

@davidhq
Copy link

davidhq commented Jun 16, 2018

ok, later today

@davidhq
Copy link

davidhq commented Jun 16, 2018

https://gist.github.com/davidhq/fadbadd64b93dad31b3c8d820a4d95c4

Two sessions, first disable_resend_requests: no, then yes (~40min)

@mikebrady
Copy link
Owner

Thanks David -- I can only see one session, the one with disable set to "yes"...

@davidhq
Copy link

davidhq commented Jun 16, 2018

Sorry: https://gist.github.com/davidhq/18dd87d006bcec62f5eb6567e12f53ad

Here are previous couple....

@davidhq
Copy link

davidhq commented Jun 16, 2018

Except that I think other than this 12 min session this device was testing 3.1.1 :) so I think there is one useful session from this only.. do you need more? I will set it to 'no' now and play on it...

@mikebrady
Copy link
Owner

Yes please — it Is fascinating. It does look like the resend requesting is too aggressive, but it would be great to get a picture of it with requesting back on, if it’s not too inconvenient.

@davidhq
Copy link

davidhq commented Jun 16, 2018

Yes, here is another 28min session in addition to 12min above: https://gist.github.com/davidhq/69ec73f7a1d319223bcb66ad89d6c91d

@davidhq
Copy link

davidhq commented Jun 16, 2018

Just let me know if I should do the same for other devices except 'lab' and for how long

@mikebrady
Copy link
Owner

Thanks again. I have two hard days work to get through before I can get back to this, so perhaps we can pause until then. I’ll be thinking about the very interesting evidence in the meantime.

@davidhq
Copy link

davidhq commented Jun 16, 2018

SURE! Enjoy the next days, until soon... I will now disable resend requests on all devices and let you know if there were any problems on the outside with this on the latest development version.

@davidhq
Copy link

davidhq commented Jun 25, 2018

Hello Mike, reporting my findings in last week...

Not sure if it's good news or not so much, but version 3.1.1 works almost without issues. There is no "stuttering", so far there is no thread stop / spawn issues (I think it only happened once but I'm not sure anymore, so if it happens it's very rare, not like 3.2* where it was regular occurrence).

There was one other problem only twice, seems not related to the two issues above and forked-daapd could be the source of it. What happened was songs were rewinding, restarting, skipping in a strange way - I thought my girlfriend was switching songs, and she thought I was :) So either ghosts, russian hackers, or some other third issue which for now doesn't matter but I will investigate later.

@mikebrady
Copy link
Owner

mikebrady commented Jun 29, 2018

Thanks for the update, David. I have just pushed a development update with a more gentle resend request algorithm. At your leisure, I'd be grateful if you'd try it out. It's interesting that 3.1.1 works well for you, but AFAIK it would have the same (or worse) thread stop logic in it. Thus, I'd be anxious to wring the bugs out of the development branch.

@davidhq
Copy link

davidhq commented Jul 1, 2018

Hello Mike,
I will test this week... so since last report there was not one thread stop problem with 3.1.1. So if the code is different, it's definitely not worse in my case... maybe it's just a lucky coincidence that constellation of code in 3.1.1 doesn't cause issues even if it's messy.

Regarding the other problem with dropping sound: also not a single occurrence since last report 6 days ago either.

I have a question: resending requests should be useful to maintain a perfect reproduction when packets are lost? And the right amount is determined by trial and error? I personally would rather not have this since 3.1.1 without resending works great. Could be that this feature is overdoing it a little? I'm willing to test as I said but now at leat according to my current understanding this is not something I'd use later when I'm not testing... it adds complexity and potential problems with no obvious benefit. I'm sure I don't understand it correctly but still, speaking from practical experience.

Until soon, thank you

@github-actions
Copy link

This issue has been inactive for 60 days so will be closed 7 days from now. To prevent this, please remove the "stale" label or post a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants