Issue#13 transcribe #209

epg323 · 2020-06-11T02:53:08Z

closes issue#13

…tle taken from iota.subject

…ranscribe event , got rid of transcribe in create partcipant

MrNanosh · 2020-06-15T18:09:16Z

app/models/transcribe.js

+
+class Transcribe extends MongoModels {
+ static create(obj) {
+ return new Prommise(async, (ok, ko) => {


JS Promises should be spelled with one m. async followed by a comma seems to be a syntax error. ko and ok are parameters but are vague.

Is it possible you are looking at an older commit, this file was deleted 2 weeks ago?

MrNanosh · 2020-06-15T18:10:33Z

app/models/transcribe.js

+ const result = await this.insertOne(doc)
+ if (result && result.length === 1) ok(result[0])
+ else {
+ const msg = ` unexpected number of results receivec ${results.length}`


received is misspelled.

ddfridley · 2020-06-15T19:34:12Z

(ok,ko) came from me and I learned it from someone else. I like it because it's way shorter than resolve and reject, and they are the reverse of each other, and If someone throws a punch (error) you get knocked out (ko'd). I have code spell checker installed in VSC to help with spelling -but it has it's disadvantages too, like it doesn't recognize variable name. It would give you a little squiggly under Prommise and under receivec

…

On 6/15/2020 11:53 AM, MrNanosh wrote: ***@***.**** commented on this pull request. ------------------------------------------------------------------------ In app/models/transcribe.js <#209 (comment)>: > + +const Joi = require('joi') +const MongoModels = require('mongo-models') + +const schema = Joi.object({ + _id: Joi.object(), + path: Joi.string(), + subject: Joi.string().required(), + description: Joi.string().required(), + component: Joi.object(), + userId: Joi.string(), +}) + +class Transcribe extends MongoModels { + static create(obj) { + return new Prommise(async, (ok, ko) => { JS Promises should be spelled with one m. async followed by a comma seems to be a syntax error. ko and ok are parameters but are vague. ------------------------------------------------------------------------ In app/models/transcribe.js <#209 (comment)>: > + path: Joi.string(), + subject: Joi.string().required(), + description: Joi.string().required(), + component: Joi.object(), + userId: Joi.string(), +}) + +class Transcribe extends MongoModels { + static create(obj) { + return new Prommise(async, (ok, ko) => { + try { + const doc = new Transcribe(obj) + const result = await this.insertOne(doc) + if (result && result.length === 1) ok(result[0]) + else { + const msg = ` unexpected number of results receivec ${results.length}` received is misspelled. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#209 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZJ537F5KLBBQLO5MMGKBDRWZUZJANCNFSM4N3AA4ZA>.

ddfridley · 2020-06-16T18:02:10Z

app/components/data-components/merge-children/merge-latest-transcription-into-parent.js

+ ) // .some to stop after finding the first one
+ if (transcribe) {
+ if (!parentIota.webComponent) parentIota.webComponent = {}
+ if (!parentIota.webComponent.metaTags) parentIota.webComponent.metaTags = []


the metaTags here are for sharing information to facebook about this page. The transcription information should be put into webComponent.participant[the participant]. It's going to be a little challenging though to figure out which .participant is for which transcription - but I think you can use the userId field to make the association.

I admit the description of the structure of the parentIota here is very weak. The best description is in: https://github.com/EnCiv/undebate/blob/master/app/components/data-components/merge-participants.js

If you go down to the part where it says "what we are trying to create"
Also - I see that, at least as documented, the userId is not there. We may need to find some other way to associate the transcription to the participant - or we may just need have the mergeParticipants component add the userId.

The only thing I can come up with is we can match by url , example: "https://res.cloudinary.com/hf6mryjpf/video/upload/v1566510654/5d5b73c01e3b194174cd9b92-1-speaking.webm"

I do see this being highly inefficient. I think the best method will be as you suggested , a userid.

I have been thinking about Hartford and how to have a second and third round of undebates - meaning first round is BP's questions, and second round is hartfords questions. I think that userId will not be sufficient. I am thinking that when participants are merged - mergeParticipant into parent needs to get the participant Iota's _id. Then when merging transcription you can look for the corresponding _id to the one you transcribed. This also solves the problem that a participants re-records their answers - which creates a new partcipant records, - but there isn't a translation for it yet. If you update the last forEach loop of merge-participants-into-parent.js you'll get both participantId, and userId.

limitedLatestParticipants.forEach(participantDoc => { parentIota.webComponent.participants[audience + nextIndex++] = { participantId: participantDoc._id.toString(), userId: participantDoc.userId, ...participantDoc.component.participant} })

this code is not tested though.

note that a weirdism of Mongo is that _id is an object. In this project I have made up a convention that that is the only place where it's an object. So parentId, and userId, and now participantId are all strings. I've just had too much trouble with inconsistency about this in past projects. If you know of other conventions or anything - I'd be excited to talk about it.

Would the path field(""path": "/schoolboard-demo"") be a better way to match?

ddfridley · 2020-06-16T18:11:47Z

app/components/data-components/merge-children/merge-latest-transcription-into-parent.js

+ // the list is sorted by date, find the first / youngest child with a socialpreview
+ let transcribe
+ childIotas.some(iota =>
+ iota.component && iota.component.component === 'Transcription' ? (transcribe = iota) : false


If there are multiple candidates in an undebate - there will be multiple transcription records - one for each user. This is different than smpreview where there was only one preview for the whole page. I think you are going to have to do something like:

let transcribes=childIotas.reduce((transcribes,iota)=>iota.component && iota.component.component === 'Transcription' ? (transcribes.push(iota),transcribes) : transcribes)

transcribes will be a list of the iota's that are transcriptions.

ddfridley · 2020-06-16T21:17:54Z

app/server/events/transcribe.js

+ let convertedFile = speakingFile.replace('.mp4', '.wav')
+ let chunkedFile = fs.createWriteStream('chunkedFile.wav', 'base64')
+ let request = https.get(convertedFile, function(resp) {
+ logger.info('Status code is:' + Object.getOwnPropertyNames(resp))


Here is a challenge - I think that after you do https.get resp.body will be the wav file data and all you need to do is audioString = resp.body.toString('base64'). But I'm not sure and their may be another layer of structure with resp.body that you need to drill into. But lets try to save the time of writing to a file and a reading it back out. There may be format issues to resolve but I bet we can figure it out buy looking into what comes back in resp.body and maybe comparing that to what we see in audiostring if you run it now.

ddfridley · 2020-06-16T22:19:11Z

enciv-transcribe.json

@@ -0,0 +1,12 @@
+{


Danger! you should not check in private_key and things into github. I see you added it to .gitignore but that was probably after there was a git add .

You will need to move the file out of the directory, git another git add ., and then a git commit -m "removed enciv-transcribe.json" in order to get rid of it.

I'm not sure how this code getting the keys into the api call - are they set in env variables somewhere? We'll need to document it somehow so people can set it up. (like me).

Also - you are going to need to get new keys - these are public.

I was thinking of adding installing instructions into the readme after we do the merge to master. To get a key you need to put your credit card info, i hope this isnt a barrier for new developers.

ddfridley · 2020-06-16T22:46:33Z

app/server/events/transcribe.js

+ main(audioString).catch(console.error)
+ })
+ })
+ async function main(audioByte) {


I suggest that variable names of lists/arrays end in s like audioBytes

MrNanosh · 2020-06-15T19:35:09Z

app/server/events/transcribe.js

+ languageCode: 'en-US',
+ enableWordTimeOffsets: true,
+ }
+ const request = {


since main is called within the closure of a function which is tied to a variable called request it is better to not use a const name called request here. It is confusing but also might cause errors.

MrNanosh · 2020-06-15T19:57:13Z

enciv-transcribe.json

+ "type": "service_account",
+ "project_id": "enciv-1583191497701",
+ "private_key_id": "21076fa78d8d13f853162eb2333f3f10d7b0664e",
+ "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvwIBADANBgkqhkiG9w0BAQEFAASCBKkwggSlAgEAAoIBAQCZmEH4rAUNKCJm\n7+xsPn0mDivwe7AVFBIZV6MQDZJs9oo0U0Q9PrlQOxI+9fJ9OJ669MeYW28cCQbc\nhPlOSGuZMtWcTKqAB5rEax9wFf37O4pss8GIvk9BF60WizPcSaNa5r76p+R/zuer\ngufjzJX5pdZ5Xvty1U1OOqNOTHh4YliGSLPdo3GvvL2q72y1myAyPE7DzkGkYAQt\nUO9JMAyDlZBHyy4DMFLSebipkcqzd4vHhps+qfnvBWyoqXCe5nsLrsiJqCbp6C3O\n/g3nKG7Vx0OQ8TJLqPED1G48/foNDnqoDMatn6Y96LF5YzpEGpj0zNw6C43ITPTe\nR5oFndtfAgMBAAECggEALryb5nVBnD1IKpZ7FHz3S+soB6c7b06KK1f1cF8Q3UMv\nzrg/nXtGnFk9NhdU0DG4ax8s1PmNl7RPeC6mReHXi+hiA4t4njiyKW6HRG4MuLPn\nbShNjbSLHT19F80H3NIzeOeZ2V/ZMeLdr9zHfxOz1yFVX909GjY5rcI+CwdN6SNa\n7kZNrkoqk9VW59Y3md3aT5NI2XgmcFl6Qq+oJGUy+ngx/pmhSevrPzq7h1t5DfUO\n7qkS/wTAVtvQrwbp57svMsfB08ausKS5jE2bjVyatmOt4Qu+NiwzrAoPi7oQ9EFa\nPRrpyq4KLc4H6J7SqoAiErLqv9J6aMPdcRnQ5Gh2gQKBgQDTIZWlvb3DWnvgQFDu\nVrXiihUNDufsXqnulY+ljR6UHGpGZlIXRurgDiFUDo8w6dfHmMkTeIqF7C3r4Yo5\nO/i4PKV3umW05/iU7nLR0Ij5uE6kCwjG37hicvQHZDpOYW+2kBAQ8OYbv+j/HOu5\nNuNhCMqwgHioN67ISRC0pX0kIQKBgQC6PHOjzI9ToAtMavA/keRXB9JR8uqYorsb\n1F+cnHId9MY6XPjSFnM8ja9D/2+OzmXpDgFd3zxYXV4ydb00ehlK/IuAxoYQ3TQK\n9hHmK1CB8IkuAcq/3bKoZC5HFxTLClrgWvuRl8F2n0Us7DVtXjVWO5YqWsssE6GP\nfoCHygsPfwKBgQCq04ejXMxHXdTQ8vkIfrwXmaXCtQbN5yITWoupZW8SY5NRdsSA\n9O5hbs1kl4sgBnSCmIpI6MUb6qaVLh2KhY2Oc/Nl1jkokHA/AFeCj/nkI03tyMfH\nMYicj2oG+P98H7YlvpZhPqjQVwyMLbhEWbbL8jMdcDUv1i3i+8s6fpOsQQKBgQCr\nBwMHpL7nBC38EqTpLDiu9/7cxSFN5PvB1emsMDvYaMZ2KJVUkctoC5Gt93Fiiwcp\n3HPC2lRXrf7ohhmojIyXwY73RtktuYamnk3Xu2VmvxeriXfBdX4xiDa7kGXHBI5q\nRQOMM/o1zpQ/afiypHaE55nC8bhtlOWkmn68tP4tTwKBgQCa1RKW27N4JMmlaBBh\n93uiKcZ/0m13C4zRW3Bhfm+21A9xfC/nFDMUlHDklR5NZ/02XPuuBf71XibyCn13\n743QKY2KRQpWsZpA4rfjFrhwItKgAmvcveeanLL8vUC4h4egLEOn++tuAbbZIbNA\nJiQXroG0XtYj/X/coUbbk3O6TA==\n-----END PRIVATE KEY-----\n",


Private keys should go in some sort of env file that is untracked. This should be scrubbed from the git history.

in your .bashrc file create:

export GOOGLE_TRANSCRIPTION_PROJECT_ID="..." export GOOGLE_TRANSCRIPTION_KEY_ID="..." export GOOGLE_TRANSCRIPTION_KEY="..."

then you can run:

source .bashrc heroku config:set GOOGLE_TRANSCRIPTION_PROJECT_ID=$GOOGLE_TRANSCRIPTION_PROJECT_ID heroku config:set GOOGLE_TRANSCRIPTION_KEY_ID=$GOOGLE_TRANSCRIPTION_KEY_ID heroku config:set GOOGLE_TRANSCRIPTION_KEY=$GOOGLE_TRANSCRIPTION_KEY

If you have troubles - on heroku I had to go in through the web user interface and go to my app, and go to settings, and Reveal Config Vars - and then edited the GOOGLE_TRANSCIPTION_KEY add a newline at the end. (meaning go to the end of the line and hit return so it looks like two lines). I have this same sort of key configuration for gmail from the server and I had to figure that out.

…oParent

…able are not defined

… into issue#13-transcribe

epg323 · 2020-07-16T02:22:33Z

Cloudinary process: We use the cloudinary api to upload the video file to cloudinary and google speech to text api. We use the video url and replace the .mp4 extension with .transcript . Once we do that we can call the url and extract the contents of the .transcript file.

epg323 · 2020-07-17T02:53:25Z

Just reached out to cloudinary, this is what they said: "The transcription gets queued in an async process, so you'll need to wait for that process to finish. I did notice that the documentation doesn't include those details, so I will have them add it but first let me test it."

I will try to get an estimate of the wait time, if the wait time is too long we can just do google transcribe streaming

ddfridley · 2020-07-17T05:32:43Z

In https://cloudinary.com/documentation/google_ai_video_transcription_addon#:~:text=With%20the%20Google%20AI%20Video,best%20possible%20speech%20recognition%20results.it says: The google_speech parameter value activates a call to Google's Cloud Speech API, which is performed _asynchronously after your original method call is completed_. Thus your original method call response displays a pending status: ... "info": { "raw_convert": { "google_speech": { "status": "pending" } } } ... When the google_speech request is complete (may take several seconds or minutes depending on the length of the video), a new raw file is created in your account with the same public ID as your video or audio file and with the .transcript file extension. If you also provided a notification_url in your method call, the specified URL then receives a notification when the process completes: Here is the documentation on notifications: https://cloudinary.com/documentation/notifications We are going to have to create athis.app.post(...) handler in server.js. But we should keep putting feature specific code in there - but lets get it working first and then clean it up. I've got time to talk this through on Friday if anyone's available.

…

On 7/16/2020 7:53 PM, epg323 wrote: Just reached out to cloudinary, this is what they said: "The transcription gets queued in an async process, so you'll need to wait for that process to finish. I did notice that the documentation doesn't include those details, so I will have them add it but first let me test it." I will try to get an estimate of the wait time, if the wait time is too long we can just do google transcribe streaming — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#209 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZJ536VOBU6IA6GIRU5KULR364LFANCNFSM4N3AA4ZA>.

epg323 and others added 21 commits May 19, 2020 06:01

made transcribe model

c35ec53

made schema for transcription

dcf46cf

merge-children will merge participants and the socialpreview. meta ti…

1186f85

…tle taken from iota.subject

transcribe prototype

7d00130

GOOGLE_ANALYTICS is taken from env, if not present not used

b526eca

don't fail if no iota

65aadb7

corrected google analytics, /canddidate-conversation uses MergeChildren

6ee23fd

cleaned up serverReactRender no functional changes

3013fef

google analytics fixed from before

9c0be5a

removed transcribe model, run transcribe on a single event, removed t…

b418bd3

…ranscribe event , got rid of transcribe in create partcipant

inserts document to iota

6c9a255

cleaned up schema, and added key to .ignore

8284ea6

removed importing transcibe model

6e92fd2

testing path to .wav for production

2e75132

changed path

b51b48d

fixed path

03e7cd9

.

714bfd6

format

cd1ffb3

merged with smpreview

62a51ad

added transcribe into merge children

2360377

minor changes

1f9aad2

epg323 assigned ddfridley, MrNanosh and luiscmartinez Jun 11, 2020

cleaned up transcribe.js

28c37dc

MrNanosh reviewed Jun 15, 2020

View reviewed changes

ddfridley reviewed Jun 16, 2020

View reviewed changes

MrNanosh requested changes Jun 18, 2020

View reviewed changes

made var plural, got rid of enciv-transcribe.json

b0a3ecb

ddfridley and others added 14 commits June 25, 2020 14:11

merged with master - includes socialmediapreveiw and transcriptionInt…

9c4c731

…oParent

transcribe merge example

5e6ed43

userId and participantId in merge-participants-into-parent

71979cb

refactored transcribe.js with useragent

3ed7ae1

fixed merging

24d6460

bare minnimum credentials

2331533

more relevant names given to gcloud keys

531af6b

merged with master had to resolve package-lock.json differences

f29f14a

error message and graceful handling if transcription environment vari…

3c8429b

…able are not defined

added check for ENV vars in notfiy-of-new-participant

3824a2f

added installation instructions for speech to text api

02a1fd8

Merge branch 'issue#13-transcribe' of https://github.com/EnCiv/undebate…

f172eed

… into issue#13-transcribe

changed the api call for short audio file to long audio file

d62f25d

updated iota for transcript testing

a050f38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue#13 transcribe #209

Issue#13 transcribe #209

epg323 commented Jun 11, 2020

MrNanosh Jun 15, 2020

epg323 Jun 23, 2020

MrNanosh Jun 15, 2020

epg323 Jun 23, 2020

ddfridley commented Jun 15, 2020 via email

ddfridley Jun 16, 2020

ddfridley Jun 16, 2020

epg323 Jun 23, 2020

ddfridley Jun 23, 2020

epg323 Jun 23, 2020

ddfridley Jun 16, 2020

ddfridley Jun 16, 2020

ddfridley Jun 16, 2020

epg323 Jun 23, 2020

ddfridley Jun 16, 2020

MrNanosh Jun 15, 2020

MrNanosh Jun 15, 2020

ddfridley Jun 23, 2020

epg323 commented Jul 16, 2020

epg323 commented Jul 17, 2020

ddfridley commented Jul 17, 2020 via email

Issue#13 transcribe #209

Are you sure you want to change the base?

Issue#13 transcribe #209

Conversation

epg323 commented Jun 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddfridley commented Jun 15, 2020 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epg323 commented Jul 16, 2020

epg323 commented Jul 17, 2020

ddfridley commented Jul 17, 2020 via email