Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: Add support for word vectors #25

Open
dpalmasan opened this issue Apr 11, 2021 · 0 comments
Open

Investigation: Add support for word vectors #25

dpalmasan opened this issue Apr 11, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@dpalmasan
Copy link
Owner

Implementation of this should be generic enough to allow used any other model, for example:

  • We'd like to use vectors obtained from a Vector Space model (Raw Counts, Tf-idf, LSA)
  • We also need to support using other types of word embeddings, such as GloVe or Word2Vec or embeddings obtained from BERT (for sentences), for example: https://huggingface.co/

Our representation should allow us to implement metrics such as:

  • Average sentence similarity (e.g. cosine distance, euclidean distance)
  • Other metrics based on sentence similarity (e.g. max distance between two sentences, average distance to the cluster center)
  • Givenness using semantic spaces
  • etc.

One design idea is having a callable that takes the text and returns de vectors as a numpy array. From the spacy dependency we should already have numpy in our dependencies, so no worries about that.

The, we will need to file issues to implement different metrics.

@dpalmasan dpalmasan self-assigned this Apr 11, 2021
@dpalmasan dpalmasan added the enhancement New feature or request label Apr 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant