Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented the usage of google search for regular search, interactive search, and manual search #56

Merged
merged 10 commits into from
May 22, 2017

Conversation

aaxu
Copy link
Contributor

@aaxu aaxu commented May 21, 2017

This can be changed back to using Stack Overflow by simply changing the global variable "google_search" to False. This also fixes some errors that the old implementation had because extracting the links from a google search seems to produce different formats at different times. I added a method fixGoogleURL to fix these different types of links. I'm not sure how the randomizing of the agents work as @gautamkrishnar described in issue #49 but you shouldn't get blocked by google if you don't repeatedly spam commands. Searching like a normal person (at most one search every few seconds) shouldn't cause google to block your IP.

My merge also contains the updated README on accident. I'm not sure if this is okay, but I'm new to git so sorry about that! Let me know if you want that changed.

@aaxu
Copy link
Contributor Author

aaxu commented May 21, 2017

I didn't implement google search for the -tag option because I'm not really sure how you want it to work. I tested it and it's very simple to implement and very similar to the stack overflow version, but it doesn't really make sense to search google with tags. Should we just keep it so that -tag defaults to a Stack Overflow search even if "google_search" is set to True?

@gautamkrishnar
Copy link
Owner

@aaxu thanks a lot ❤️. I will surely review this and get it merged 👍

@gautamkrishnar gautamkrishnar merged commit b268a93 into gautamkrishnar:develop May 22, 2017
@gautamkrishnar
Copy link
Owner

@aaxu thanks for your contribution... This needs more testing.... It is causing an error sometimes while using socli for loop java

Traceback (most recent call last):
  File "C:\Users\Gautam krishna R\PycharmProjects\socli\socli\socli.py", line 149, in socli
    dispres(res_url)
  File "C:\Users\Gautam krishna R\PycharmProjects\socli\socli\socli.py", line 887, in dispres
    question_title, question_desc, question_stats = get_stats(soup)
  File "C:\Users\Gautam krishna R\PycharmProjects\socli\socli\socli.py", line 840, in get_stats
    question_title = (soup.find_all("a", class_="question-hyperlink")[0].get_text())
IndexError: list index out of range

@aaxu
Copy link
Contributor Author

aaxu commented May 22, 2017

Did you receive this error while repeatedly testing?
I think this happens due to a triggered captcha for accessing stack overflow too much too quickly. I actually am quite confused as to why this program sometimes extracts different URLs and produces different results sometimes, but I am guessing for this one it's probably a stack overflow captcha. One thing I noticed is even though it produces the captcha in SO, when you redo the command, it usually works the second or third time so it doesnt seem like it blocks your IP. Do you have any insight on what exactly is happening? I will test more today.

@gautamkrishnar
Copy link
Owner

Thanks. I will do more testing and let you know. Please dont forget to use the uptodate develop branch while testing....

@aaxu
Copy link
Contributor Author

aaxu commented May 23, 2017

Hey @gautamkrishnar, I saw that on your in the code you wrote that we still had to implement the captcha check for google searches. I tested it and apparently, when you receive a captcha from your terminal, even if you extract the link and solve it in your browser, it doesn't unblock your terminal. The terminal and browser seems to be independent of each other for Google's system. This also works the other way around, where even if you receive a captcha in your browser, you can still perform google searches through SoCLI. I'm not really sure how to do this without any hack workarounds or recaptcha solvers, but normal users shouldn't normally encounter this issue. I had to queue 50-100 socli searches before I got blocked by google, so it should be a rare problem. How do you want to handle this?

@gautamkrishnar
Copy link
Owner

@aaxu yeah since we are using randomized user agents it wont work either. Lets leave the problem caused by google unfixed since it is a rare problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants