Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features added: #44

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

martinarrieta
Copy link

Hello,

Great project, I was trying to do something like that until I find yours. In order to solve all my needs I had to add a few things. Let me know what you think.

Unfortunately I couldn't make the test cases for that because I don't have experience with pytest, but I'm adding examples of how it works.

Things that I have added:

  1. Custom stop words option in the remove_stop_words method
  2. Added the method "remove_numbers" to remove the numbers
  3. Added the method "replace_custom_regex" to remove a custom regex in the text.

Sample code to test my changes:

from cucco.cucco import Cucco

from cucco.config import Config
cucco_config = Config(language='en')
c = Cucco(config=cucco_config)

# Replace numbers but those that are only numbers, not numbers between letters.
c.replace_numbers("this 3333 i3s a text with the number 2")

# Removing custom regex, for example all #foo and @bar
import re
regex = re.compile(r"[#@]\w+", re.IGNORECASE)
c.replace_custom_regex(regex=regex, text= "Test a string #foo to replace @bar")

# Removing custom stop words

# This is the default one, with all stop words:
c.remove_stop_words("Test to remove stop words")

# This is with a custom set of stop words (in case that you want to use your own set):
c.remove_stop_words("Test to remove stop words", custom_stop_words=['test', 'to'])

Sample code with the console output:

In [1]: from cucco.cucco import Cucco
   ...:
   ...: from cucco.config import Config
   ...:
   ...: cucco_config = Config(language='en')
   ...:
   ...: c = Cucco(config=cucco_config)
   ...:

In [3]: # Replace numbers but those that are only numbers, not numbers between letters.
   ...: c.replace_numbers("this 3333 i3s a text with the number 2")
Out[3]: 'this i3s a text with the number'

In [4]: # Removing custom regex, for example all #foo and @bar
   ...: import re
   ...: regex = re.compile(r"[#@]\w+", re.IGNORECASE)
   ...: c.replace_custom_regex(regex=regex, text= "Test a string #foo to replace @bar")
   ...:
Out[4]: 'Test a string  to replace '


In [5]: # This is the default one, with all stop words:
   ...: c.remove_stop_words("Test to remove stop words")
   ...: 'test remove stop words'
   ...:
Out[5]: 'test remove stop words'

In [6]: # This is with a custom set of stop words (in case that you want to use your own set):
   ...: c.remove_stop_words("Test to remove stop words", custom_stop_words=['test', 'to'])
   ...: 'remove stop words'
   ...:
Out[6]: 'remove stop words'

1- Custom stop words option in the remove_stop_words method
2- Added the method "remove_numbers" to remove the numbers
3- Added the method "replace_custom_regex" to remove a custom regex in the text.
@davidmogar
Copy link
Owner

Thank you for your contribution. I really appreciate.

It will take me some time to review it. As you can see, this is still a single guy project and these days I'm a bit busy. But for sure I will review it and try to add your great suggestions.

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants