Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix removing unlikely candidate #74

Closed
wants to merge 2 commits into from

Conversation

dsuket
Copy link

@dsuket dsuket commented Jul 11, 2016

No description provided.

@luin
Copy link
Owner

luin commented Jul 13, 2016

Thank you for the pull request! Could you provide more details about this change?

@dsuket
Copy link
Author

dsuket commented Jul 14, 2016

If the main content is wrapped in unlikelyCandidates class or id, that was deleted. For example in https://uxmilk.jp/43386, the main content is wrapped in #menu-wrap element.
So, check the unlikelyCandidates with p elements.
please tell me a better way how get the contents above site.

@haroldtreen
Copy link
Collaborator

haroldtreen commented Jul 25, 2016

@dsuket

This has also been a problem for me. But I still think the general idea behind removing unlikely candidates is good (if something just has a class of ad and nothing else, it's unlikely to have content).

My approach for fixing these problems has been to update the okMaybeItsACandidate regex. When the node matches this, it will not be removed.

I've done this a few times and those changes have been since pulled in. I've added an integration test to verify this fixes UX-Milk, and that seems to be passing.

So that page should be fixed and this change shouldn't be needed :).

Does that fix your issue @dsuket ?
You can test with this commit to verify - #77

@dsuket
Copy link
Author

dsuket commented Jul 26, 2016

awesome! thank you @haroldtreen !

@dsuket dsuket closed this Jul 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants