Cython Classes
Doc cdef class
The Doc
object holds an array of TokenC
structs.
Attributes
Name | Description |
---|---|
mem | A memory pool. Allocated memory will be freed once the Doc object is garbage collected. cymem.Pool |
vocab | A reference to the shared Vocab object. Vocab |
c | A pointer to a TokenC struct. TokenC* |
length | The number of tokens in the document. int |
max_length | The underlying size of the Doc.c array. int |
Doc.push_back method
Append a token to the Doc
. The token can be provided as a
LexemeC
or
TokenC
pointer, using Cython’s
fused types.
Name | Description |
---|---|
lex_or_tok | The word to append to the Doc . LexemeOrToken |
has_space | Whether the word has trailing whitespace. bint |
Token cdef class
A Cython class providing access and methods for a
TokenC
struct. Note that the Token
object does
not own the struct. It only receives a pointer to it.
Attributes
Name | Description |
---|---|
vocab | A reference to the shared Vocab object. Vocab |
c | A pointer to a TokenC struct. TokenC* |
i | The offset of the token within the document. int |
doc | The parent document. Doc |
Token.cinit method
Create a Token
object from a TokenC*
pointer.
Name | Description |
---|---|
vocab | A reference to the shared Vocab . Vocab |
c | A pointer to a TokenC struct. TokenC* |
offset | The offset of the token within the document. int |
doc | The parent document. int |
Span cdef class
A Cython class providing access and methods for a slice of a Doc
object.
Attributes
Name | Description |
---|---|
doc | The parent document. Doc |
start | The index of the first token of the span. int |
end | The index of the first token after the span. int |
start_char | The index of the first character of the span. int |
end_char | The index of the last character of the span. int |
label | A label to attach to the span, e.g. for named entities. attr_t |
Lexeme cdef class
A Cython class providing access and methods for an entry in the vocabulary.
Attributes
Name | Description |
---|---|
c | A pointer to a LexemeC struct. LexemeC* |
vocab | A reference to the shared Vocab object. Vocab |
orth | ID of the verbatim text content. attr_t |
Vocab cdef class
A Cython class providing access and methods for a vocabulary and other data shared across a language.
Attributes
Name | Description |
---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool |
strings | A StringStore that maps string to hash values and vice versa. StringStore |
length | The number of entries in the vocabulary. int |
Vocab.get method
Retrieve a LexemeC*
pointer from the
vocabulary.
Name | Description |
---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool |
string | The string of the word to look up. str |
RETURNS | The lexeme in the vocabulary. const LexemeC* |
Vocab.get_by_orth method
Retrieve a LexemeC*
pointer from the
vocabulary.
Name | Description |
---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool |
orth | ID of the verbatim text content. attr_t |
RETURNS | The lexeme in the vocabulary. const LexemeC* |
StringStore cdef class
A lookup table to retrieve strings by 64-bit hashes.
Attributes
Name | Description |
---|---|
mem | A memory pool. Allocated memory will be freed once the StringStore object is garbage collected. cymem.Pool |
keys | A list of hash values in the StringStore . vector[hash_t] |