Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let process_text() return a dict. #120

Merged
merged 5 commits into from
Apr 11, 2016

Conversation

mkcor
Copy link
Contributor

@mkcor mkcor commented Feb 3, 2016

Hi @amueller !

While playing around, I found myself frustrated with process_text() returning a object of class 'dict_items'... Indeed, what if I want to apply further processing, or edit manually my dictionary of words, before turning it into an actual word cloud? I believe I should have the flexibility to do so. Generally speaking, it seems preferable to use standard data structures for input and output values.

Thank you for your consideration.

@amueller
Copy link
Owner

amueller commented Feb 3, 2016

You could just call dict on it, right? That doesn't seem very frustrating.

I agree, returning a dict might have been nicer. When I wrote this, I was using python2, where this actually returns a tuple, as it is documented.
For python3, it seems the documented return type for process_text is currently wrong.

You also changed the input type for generate_from_frequencies, which means this PR does two breaking API changes. I'm not sure the very minor change in usability is worth breaking people's code. At the very least, you'd need to fix the docstrings for the return type for process_text and the input for generate_from_frequencies.

@amueller amueller mentioned this pull request Feb 3, 2016
@mkcor
Copy link
Contributor Author

mkcor commented Feb 6, 2016

Thanks, @amueller .

You could just call dict() on it, right? That doesn't seem very frustrating.

Oh, right! I didn't think about it. It's only frustrating conceptually, but not technically, I agree.

For python3, it seems the documented return type for process_text is currently wrong.

I'm happy to update it... But it's not very clear to me whether the module runs Python 3 or both Python 2 and 3. In a way, it's cool that it works with both (if I understand correctly), but it's worth mentioning explicitly (typically, in the README). I tend to push for porting to Python 3 and moving on to using Python 3 in general.

You also changed the input type for generate_from_frequencies(), which means this PR does two breaking API changes. I'm not sure the very minor change in usability is worth breaking people's code. At the very least, you'd need to fix the docstrings for the return type for process_text() and the input for generate_from_frequencies().

No, I have not changed the input type for generate_from_frequencies(), it's still a dict_items. The only change is that the input of generate_from_frequencies() is not the output of process_text() any more. I've had backwards-compatibility in mind when submitting this PR. And if it broke anything, it would/should/could be captured by tests. Well, at the moment there is no testing of generate_from_frequencies(). I'm happy to add a test for the input type of generate_from_frequencies(). What do you think?

@amueller
Copy link
Owner

amueller commented Feb 8, 2016

You're right, I misread the code. It thought it would be more natural if the output of one function would be the input of the next. I guess it would be possible to accept dicts or tuples / dict_items.

The codebase is Python2 and Python3 compatible. I would not like to drop Python2.7 any time soon. The CI tests 2.7 and 3.4 at the moment. Feel free to add to the readme.

Also feel free to add tests for just calling generate_from_frequency.

It is still a backward incompatible change, since the return type of process_text changed. I guess that is fine, though.

What I'd like for this to be merged is:

  • change the docstring
  • add a versionchanged sphinx string that says the return type of process_text changed in 1.1.4

It would be nice to have generate_from_frequency accept both dicts and tuples / dict_items, but you can also leave it as-is.

@mkcor
Copy link
Contributor Author

mkcor commented Apr 8, 2016

Hi @amueller !

Sorry for the delay on this. I have thus rebased the branch onto master.

I'm not sure where I can add the versionchanged Sphinx string... I looked into doc/ but there is no obvious .rst file I should edit... And the latest changes date back from 11 months ago. Could you please provide some guidance on this?

@@ -376,7 +376,7 @@ def process_text(self, text):

Returns
-------
words : list of tuples (string, float)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add

..versionchanged:: 1.2.2
    Changed return type from list of tuples to dict

to the docstring and similarly for generate_from_text.

@mkcor
Copy link
Contributor Author

mkcor commented Apr 9, 2016

@amueller Done! Thank you.

@amueller
Copy link
Owner

cool, thanks :)

@amueller amueller merged commit 1860f7d into amueller:master Apr 11, 2016
@mkcor mkcor deleted the process_text-return-dict branch April 11, 2016 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants