Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

playing tts/audio on VTO #177

Open
luzik opened this issue Mar 16, 2022 · 21 comments
Open

playing tts/audio on VTO #177

luzik opened this issue Mar 16, 2022 · 21 comments

Comments

@luzik
Copy link

luzik commented Mar 16, 2022

It would be awesome, to be able to send tts or audio via VTO speaker.

My personal use case is to connect face recognition with voice messages. Something like "Hello MyName"

If there is no direct command for that, my VTO have a place where I can store mp3 audio for various events. Maybe rroller/dahua could generate mp3, upload it to VTO, and trigger an action for that ?

@Saiyajin53
Copy link

you can change the orginal voice with your own mp3 but there is a limit with 20kb only :/

@itkfilelor
Copy link
Contributor

itkfilelor commented Mar 22, 2022

I may have found the api command for the Amcrest AD110 doorbell, in theory it would be the same for the Dahua ones. Doing some tests and will report back.

UPDATE: Ok, so apparently "we" have already known about the endpoint for sometime. From what I have found is it is really sketch for files, it needs to be rather short and lower quality, else the device gets overwhelmed. I plan to work on some premade tts recordings and see where it leads.
MORE: I found this
So i took a google tts file I made in HA and converted it like they showed in the thread:
sox -v 0.8 audio_test.mp3 -r 8k -c 1 audio_test.al
Then I sent:
sleep 45 && curl -vvv \ --limit-rate 8K \ -F "file=@audio_test.al;type=Audio/G.711A" \ -H "Content-Type: Audio/G.711A" \ http:https://admin:password@<ip>/cgi-bin/audio.cgi\?action\=postAudio\&httptype\=singlepart\&channel\=1
set a timer on my phone and ran my fat arse upstairs and waited. I heard the TTS on my doorbell within 1.5s of the timer expiring. There was a little garbage at the beginning and end but the voice came over clear.
When I have a chance I will see about making it a media_player entity.

@luzik
Copy link
Author

luzik commented Mar 24, 2022

My VTO

curl -vvv --user "admin:pass" --limit-rate 8K -F "file=@audio_test.al;type=Audio/G.711A" -H "Content-Type: Audio/G.711A" "http:https://192.168.1.30/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1"
*   Trying 192.168.1.30:80...
* Connected to 192.168.124.30 (192.168.124.30) port 80 (#0)
* Server auth using Basic with user 'admin'
> POST /cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1 HTTP/1.1
> Host: 192.168.124.30
> Authorization: Basic XXXXX
> User-Agent: curl/7.74.0
> Accept: */*
> Content-Length: 11138
> Content-Type: Audio/G.711A; boundary=------------------------a90a8721f68274a4
>
* We are completely uploaded and fine

....and hang

@luzik
Copy link
Author

luzik commented Mar 24, 2022

But it actually plays nicely on my VTO!!

Just not response ending session

@luzik
Copy link
Author

luzik commented Mar 24, 2022

With VTO2211G I do not need --limit-rate nor auth ?!?
To get connection close I added --speed-limit 1 --speed-time 1 that close connection where transfer drops below 1byte/sec in 1 sec window.

Can dahua be visible as HA MediaPlayer class device? or maybe it is wrong idea ?
It would be awesome to include automatic audio convertion and play function in https://github.com/rroller/dahua

@itkfilelor
Copy link
Contributor

Yeah I had the hang as well. I've never messed with any form of media streaming in python so I don't know how to handle that with the requests module that we are using here. In fact most of my http get/post experiencesin python were simple endpoints that auto closed. This endpoint appears to be the one the app uses to open the stream, but the docs don't show how it ends. I'll have to dive into the requests module and see how it closes persistent connections.

@luzik
Copy link
Author

luzik commented Mar 24, 2022

Maybe this ?

r = requests.get('https://github.com', timeout=(3.05, 5))

https://docs.python-requests.org/en/latest/user/advanced/#timeouts

3.05 - connection timeout
5 - read timeout

@itkfilelor
Copy link
Contributor

😂 😅
Never looked into it before, this is likely the way. When I have a mo to work on I'll submit a new PR. Can you confirm with the dahua device the endpoint is the same as my amcrest bell?

@luzik
Copy link
Author

luzik commented Mar 24, 2022

Yes it is
Please also consider using FFmpeg instead of sox. Default home-assistant docker image, contains only ffmpeg.

ffmpeg -i audio_test.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 audio_test.al is working for me. Later on I will test it with acc (should be supported with hardware, and using less space/ be faster)

@itkfilelor
Copy link
Contributor

Got it, have some free time coming up, I'll look into it.

@luzik
Copy link
Author

luzik commented Mar 25, 2022

I failed trying to play an ACC format on my VTO. pcm_alaw is a way to go.

@calisro
Copy link

calisro commented Jun 6, 2022

I've been playing around with this. The issue I am having, though, is after sending a few streams of audio (which work very well btw with pcm_alaw) it then refuses any more. Its almost like it needs a 'end conversation' to be sent to close the existing connections. I am at a loss tbh.

What I have noticed though. It sends perfectly the first time and then fails the second. I believe the 'mic' needs to be turned off somehow. In the amcrest app, you turn the mic on, speak, then turn it off.

IF I test the first time, then go into the app and toggle the mic it works again. I need to figure out how to 'turn off the mic' after sending. Any ideas?

EDIT:
Timeouts/keepalive fixed it.
#181 (comment)

@NickM-27
Copy link

Would love to see this as a media player!

@luzik
Copy link
Author

luzik commented Sep 7, 2022

As media player would be a grate feature it probably take some time to implement.. in the meantime did someone figure out how to automate/script this in HA ?

@calisro
Copy link

calisro commented Sep 7, 2022

Well. For any camera that supports onvif profile T, you can now 2-way with the cameras with go2rtc. I'm using it with a ad410 perfectly.

@luzik
Copy link
Author

luzik commented Sep 8, 2022

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http:https://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http:https://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

@GaryOkie
Copy link

For what it's worth - the techniques described here also work on the Amcrest AD110/AD410 doorbells to send custom sounds, including sirens.

@morpheus8888
Copy link

morpheus8888 commented Dec 24, 2022

Would love to see this as a media player!

any news? i'm very interested in this

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http:https://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http:https://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

can you explain the procedure better for a newbie like me? thank you @luzik

@Pveska
Copy link

Pveska commented Jun 4, 2023

So, any progress with that issue?

@Pveska
Copy link

Pveska commented Jun 4, 2023

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http:https://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http:https://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

Would be nice if you explain that code for us

@baudneo
Copy link

baudneo commented Dec 25, 2023

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.
Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http:https://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http:https://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

Would be nice if you explain that code for us

Launches bash and sets 2 local variables

  • VAR 1: 'name' = {{states('input_text.person_at_door')}} (Jinja template for HASS to process)

    • input_text.person_at_door - is out of scope here, but I am assuming that there is an external automation that runs face detection and recognition that sets input_text.person_at_door to a name like "George" or possibly "Unknown" for faces that aren't recognized.
  • VAR 2: 'x' = /usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http:https://localhost:8123/api/tts_get_url | jq -r .url

    • VAR 2 x is a curl command that creates a TTS audio file using text: Hi, $name. It queries HASS TTS endpoint to create a sound file, the variable 'x' is then set to the URL output that is parsed by jq binary. This gives a URL that you can HTTP GET to obtain the TTS audio file (in .mp3 format, I assume).
  • && /usr/bin/curl $x -o /tmp/audio_vto.mp3 - if the 'name' and 'x' vars are set correctly (&& will not execute if the previous command fails) it then runs curl and saves the HASS generated TTS file to a temporary .mp3 file at /tmp/audio_vto.mp3

  • && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al - converts the .mp3 to pcm_alaw with proper flags and saves it to /tmp/audio_vto.al

  • && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http:https://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3 - Issues the final command to send the pcm_alaw file to the VTO device for playback, and then deletes the 2 temp audio files (mp3 and alaw).

    • Change VTO_IP to the actual IP of your VTO device.

The original command has 2 spaces in the last commands -H \"Content-Type: Au dio/G.711A\"

Here is a reformatted command with the whitespace removed:

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http:https://localhost:8123/api/tts_get_url | jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16  /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http:https://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants