-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend CharSpanArray and TokenSpanArray to support multiple documents #73
Labels
enhancement
New feature or request
Comments
frreiss
changed the title
ENH: Extend CharSpanArray and TokenSpanArray to support multiple documents
Extend CharSpanArray and TokenSpanArray to support multiple documents
Aug 25, 2020
WIP PR at #170 |
Closed by #170 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The current implementation of
CharSpanArray
andTokenSpanArray
only allows a single target text for all of the spans in a given array. This restriction is fine as long as all the spans in a given Dataframe come from a single document, but it complicates use cases involving combining information from multiple documents in a single Dataframe. Currently the only way to have spans from multiple documents in a series is to convertCharSpanArray
/TokenSpanArray
arrays into arrays of typeObject
containing individualCharSpan
andTokenSpan
objects.We should extend our span array types to allow for multiple target texts per array. Key challenges to address:
text_extensions_for_pandas.spanner
with multiple target textsThe text was updated successfully, but these errors were encountered: