-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify User-Agent to avoid banning from some websites #52
Comments
Hey @9bow, thank you for your nice words! Regarding the change, absolutely! I see it's useful to have to allow consumers to modify the user agent, but I would go a bit further and implement something like the "options" we use in the http testing utils in shiori: https://github.com/go-shiori/shiori/blob/master/internal/testutil/http.go#L33 Would look something like this: type requestWith func(*http.Request)
...
func FromURL(pageURL string, timeout time.Duration, requestModifiers ...requestWith) (Article, error) {
....
req, err := http.NewRequest("GET", pageURL, nil)
if err != nil {
return Article{}, fmt.Errorf("failed to create request: %v", err)
}
for _, modifier := range requestModifiers {
modifier(req)
}
...
}
...
func WithUserAgent(userAgent string) requestWith {
return func(r *http.Request) {
r.Header.Set("User-Agent", userAgentOptional[0])
}
} This way we would not only allow for the user agent but consumers would be allowed to do other modifications to the request by creating their own modifiers (like adding a header that the site needs, cookies, etc). What do you think? |
That's a great idea! Thank you, @fmartingr. As a HTTP client, control over headers seems to be essential, and I'll think about improving it with your suggestions & codes. I'll take a look and make a Pull Request when ready. (However, it's been a while since I've used the go language, so it may take some time.) |
Thanks to all the maintainers.
go-readablity
is really helpful for me. I’m using this every day adding it my GPT workflow. :DHowever, sadly, some websites (such as theregister.com) are banning the readability client based on User-Agent value in the HTTP header. To avoid this, I'd like to suggest an option to specify User-Agent value. (If there's a way around this that I haven't found, please let me know.)
Here's how I've personally solved this issue:
FromURL()
in the readability package to optionally accept a UserAgent String.client.Do()
insteadclient.Get()
to specify RequestHeader.You may check the full changes I made here: 9bow@db2a1fa
If this looks good to you, may I create a pull request about this?
Or, please suggest some ideas to improve this. 😄
Thanks!
The text was updated successfully, but these errors were encountered: