Skip to content
/ spacyex Public
forked from wjbmattingly/spacyex

SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.

Notifications You must be signed in to change notification settings

edzq/spacyex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spaCyEx

PyPI version GitHub stars GitHub forks

logo

spaCyEx is a powerful extension for spaCy, designed to make pattern matching as flexible and easy as using regular expressions. It builds upon the existing capabilities of spaCy's Matcher, enhancing it with a more accessible syntax for defining complex patterns. spaCyEx allows for intuitive and detailed text pattern specifications, perfect for extracting detailed linguistic features from texts.

Installation

You can install spaCyEx via pip:

pip install spacyex

Features

  • Dynamic Pattern Creation: Create complex token matching patterns using a simple string-based syntax.
  • Integration with spaCy: Leverage spaCy's Matcher capabilities to find sequences in text that match defined patterns.
  • Customizable Matching Rules: Define token attributes including text characteristics, lexical attributes, and grammatical properties.

Creating Patterns

Define patterns using a string syntax where each token and its attributes are encapsulated by parentheses. Token attributes are specified by key-value pairs, separated by an equals sign (=), and multiple attributes are divided by a pipe (|).

Syntax Examples

  • Single Attribute: (pos=NOUN)
  • Multiple Attributes: (pos=NOUN|lemma=run)
  • Using List Values: (lemma=in[run,walk])
  • Using Operators: (ent_type=person|op={2,3})

Pattern Matching

Once a pattern is defined, it can be used to search text for matches.

Usage

Here is a simple example to get started with spaCyEx:

import spacyex as se
import spacy

nlp = spacy.load("en_core_web_sm")
text = "John Smith runs fast, but Jacob Smith walks slowly."
pattern = "(ent_type=person|op={2}) (lemma=in[run,walk]) (pos=ADV)"

results = se.search(pattern, text, nlp)
for match in results:
    print(match[0].text, "Start:", match[1], "End:", match[2])

This code will match sequences in the text based on the defined pattern, using named entities, lemmas, and parts of speech.

Roadmap

  • Support for all dictionary properties in patterns.
  • Additional utilities and helper functions for more complex pattern scenarios.

About

SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 58.6%
  • Jupyter Notebook 41.4%