Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure GoogleCloudSRModel phrases to use tfidf found n-grams #58

Closed
evamaxfield opened this issue May 24, 2021 · 1 comment
Closed
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects

Comments

@evamaxfield
Copy link
Member

Use Case

Please provide a use case to help us understand your request in context

phrases provides functionality to tune the SR model by providing phrases that the model should recognize / give context to the model about what is being talked about. We should try to improve this wherever possible. I.e. choosing phrases that most improve model performance and context.

Solution

Please describe your ideal solution

We provide the minutes items to the model as just a list of string and then the "clean_phrases" function simply returns the first 500 characters for each of the first 100 phrases of the full phrases list.

However, we could run TFIDF or some n-gram indexing to find the most valuable (specific to that meeting) n-grams in the minutes items and just provide those. This has a cold-start problem but we can probably get around that.

There may also be better methods for phrase selection to improve the model.

Along this entire issue should be tracking and reporting model performance. We could probably add ASV benchmarking as a way of monitoring performance.

@evamaxfield evamaxfield added documentation Improvements or additions to documentation enhancement New feature or request labels May 24, 2021
@evamaxfield evamaxfield changed the title Configure GoogleCloudSRModel phrases to use tfidf phrases Configure GoogleCloudSRModel phrases to use tfidf found n-grams May 24, 2021
@sarahjliu sarahjliu added this to To do in v3.0 Jul 10, 2021
@evamaxfield
Copy link
Member Author

Closing with #69 as I like the way I am doing phrase generation.

v3.0 automation moved this from Ready for Dev to Done Jul 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
No open projects
v3.0
Done
Development

No branches or pull requests

1 participant