Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of speech recognition for microphone node #52

Merged
merged 7 commits into from
Dec 28, 2020

Conversation

HiroyasuNishiyama
Copy link
Member

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Proposed changes

This PR add speech recognition feature to ui_microphone node using SpeechRecognition interface of Web Speech API.

Checklist

  • I have read the contribution guidelines
  • For non-bugfix PRs, I have discussed this change on the forum/slack team.
  • I have run grunt to verify the unit tests pass
  • I have added suitable unit tests to cover the new/changed functionality

Copy link
Member

@dceejay dceejay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for adding this mode and adding all the translations.

$("#node-input-mode").change(function (e) {
var mode = $("#node-input-mode").val();
if (mode === "recog") {
$("#div-audio").hide();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to hide the audio options ? they affect length of recording and whether it's press to talk or click to start then click to stop - both of which would also make sense for Speech to text I think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll add support for these options also for speech recognition mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - as it was you could set it either way when in the non-reco mode and then switch to reco and the button behaviour would be as you had set it but couldn't be changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but couldn't be changed.

Does this mean that there is some issue on your side?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not now that the option is available when in reco mode.

$("#div-audio").show();
}
});
}
if (this.press === undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need an if mode is undefined check for existing nodes that don't have this set or the select box will start as blank.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Will add check as you suggested.

For speech recognition mode, this node will work in more restricted browsers as it uses SpeechRecognition API.

- IE, Safari : not supported
- Firefox : SpeechRecognition needs to be enabled (about:config -> `media.webspeech.recognition.enable`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple way (without having to test every useragent etc) to detect if SpeechRecognition is allowed/enabled ? such that we could then hide the option if not ? (Though I guess for Firefox it would then hint that it could be enabled if wanted...) - Maybe we want to repeat this tip in the edit confg ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation checks existence of SpeechRecognition interface, though on editor side, and disables speech recognition mode if not available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting thought... I suppose in practise the runtime browser may well be different from the browser used for editing... I have no good solution for that, apart from ensuring the developer is aware of the possibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ui_audio node have a similar implementation to create a voice list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - my next question was going to be about languages... is it selected by the browser language or user configuration ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation uses default browser language.
SpeechRecognition interface have lang property that accepts language to be recognized.
However, it do not provides list of languages that can be recognized.

@dceejay
Copy link
Member

dceejay commented Dec 22, 2020

Is it also worth looking at adding a flag (under speech mode only) - for interim results operation - https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/interimResults -or maybe that should possible the default when in click to start and click to stop mode... ie so you get streamed results while in that mode ?

Thoughts ?

@HiroyasuNishiyama
Copy link
Member Author

Is it also worth looking at adding a flag (under speech mode only) - for interim results operation - https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/interimResults -or maybe that should possible the default when in click to start and click to stop mode... ie so you get streamed results while in that mode ?

In our original implementation, specifying the interim results option would display the intermediate recognition results on the dashboard.
I think it can be used for such purposes.
However, if the node outputs interim results, distinguishing between the final and intermediate results is needed.
I would like to suggest the node outputs interim results to second output port if the interim results option is specified.

@HiroyasuNishiyama
Copy link
Member Author

Hi. Updated speech recognition mode to use same button mode of audio input with fixes you pointed out.
Also added example flows for both mode.

@dceejay
Copy link
Member

dceejay commented Dec 26, 2020

Yes - I think showing intermediate results on the dashboard is one use case - but that could be done via the backend - - IE don't implement in the node itself - if the user wants it they could feed the info back to another widget. I think there are other use cases where you may want ongoing data fed back. I don't think the node / reco engine will handle both modes at once so I don't know how a second output could work. Though once the reco has stopped presumable we can send a done ? so you know the microphone is no longer expected to send more.

@HiroyasuNishiyama
Copy link
Member Author

Though once the reco has stopped presumable we can send a done ? so you know the microphone is no longer expected to send more.

Changed to set done property to true instead of sending a message to second port if interim flag is set.

@dceejay
Copy link
Member

dceejay commented Dec 28, 2020

Looks good now - Using Done works nicely - thank you. only wrinkle I can see is if I drag on a new node and set to reco mode the default time is 0 seconds. It should default to some useful value (maybe 5 to be consistent with the audio mode. ? )
I think leaving as 0 doesn't make sense unless you have intermediate results on by default... which we don't want (I think).

@dceejay
Copy link
Member

dceejay commented Dec 28, 2020

I'm going to merge this as-is and we can work on it from there in master. I won't publish it just yet.

@dceejay dceejay merged commit 31d45a1 into node-red:master Dec 28, 2020
@HiroyasuNishiyama
Copy link
Member Author

I think leaving as 0 doesn't make sense unless you have intermediate results on by default... which we don't want (I think).

SpeechRecognition interface outputs a final result for each phrase (at least for Japanese recognition). The interim result will change before fixed as final result.
For this reason, I intentionally set the default value to 0 in speech recognition mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants