Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify User-Agent to avoid banning from some websites #52

Open
9bow opened this issue Apr 20, 2024 · 2 comments
Open

Specify User-Agent to avoid banning from some websites #52

9bow opened this issue Apr 20, 2024 · 2 comments

Comments

@9bow
Copy link

9bow commented Apr 20, 2024

Thanks to all the maintainers. go-readablity is really helpful for me. I’m using this every day adding it my GPT workflow. :D

However, sadly, some websites (such as theregister.com) are banning the readability client based on User-Agent value in the HTTP header. To avoid this, I'd like to suggest an option to specify User-Agent value. (If there's a way around this that I haven't found, please let me know.)

Here's how I've personally solved this issue:

  • Changed FromURL() in the readability package to optionally accept a UserAgent String.
  • Use client.Do() instead client.Get() to specify RequestHeader.

You may check the full changes I made here: 9bow@db2a1fa

If this looks good to you, may I create a pull request about this?
Or, please suggest some ideas to improve this. 😄

Thanks!

@fmartingr
Copy link
Member

Hey @9bow, thank you for your nice words!

Regarding the change, absolutely! I see it's useful to have to allow consumers to modify the user agent, but I would go a bit further and implement something like the "options" we use in the http testing utils in shiori: https://github.com/go-shiori/shiori/blob/master/internal/testutil/http.go#L33

Would look something like this:

type requestWith func(*http.Request) 

...

func FromURL(pageURL string, timeout time.Duration, requestModifiers ...requestWith) (Article, error) {
    ....
	req, err := http.NewRequest("GET", pageURL, nil)
	if err != nil {
		return Article{}, fmt.Errorf("failed to create request: %v", err)
	}

        for _, modifier := range requestModifiers {
          modifier(req)
        }

        ...
}

...

func WithUserAgent(userAgent string) requestWith {
    return func(r *http.Request) {
        r.Header.Set("User-Agent", userAgentOptional[0])
    }
}

This way we would not only allow for the user agent but consumers would be allowed to do other modifications to the request by creating their own modifiers (like adding a header that the site needs, cookies, etc).

What do you think?

@9bow
Copy link
Author

9bow commented May 2, 2024

That's a great idea! Thank you, @fmartingr.
As you said I was thinking about having such an option, but I only added the functionality for the immediate need. 😅

As a HTTP client, control over headers seems to be essential, and I'll think about improving it with your suggestions & codes. I'll take a look and make a Pull Request when ready. (However, it's been a while since I've used the go language, so it may take some time.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants