Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 403, ScraperException: 4 requests #848

Closed
anastasita308 opened this issue Apr 21, 2023 · 1 comment
Closed

Error 403, ScraperException: 4 requests #848

anastasita308 opened this issue Apr 21, 2023 · 1 comment
Labels
duplicate This issue or pull request already exists

Comments

@anastasita308
Copy link

anastasita308 commented Apr 21, 2023

I have been trying to scrape Twitter, it worked just fine up until yesterday. My version of snscrape is updated, but it still gives me this error after 8 seconds.
I know this has been raised before, but I cannot find a solution from what was already mentioned in the past questions at the moment.

ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=%28plastic+OR+environment+OR+pollution+OR+packaging+OR+waste+OR+climate+OR+sustainability%29+%28%40Unilever%29+until%3A2019-10-07+since%3A2019-09-29&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.

This is my code:

import snscrape.modules.twitter as sntwitter
import pandas as pd
import re
import string
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
import time

# define the search query
query = "(plastic OR environment OR pollution OR packaging OR waste OR climate OR sustainability) (@Unilever) until:2019-10-07 since:2019-09-29"

# define a list of stopwords
stop_words = set(stopwords.words('english'))

# define the list of tweets
tweets = []
limit=500

# loop through the search results and clean the text of each tweet
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
    if i == limit:
        break
    else:
        # clean the text of the tweet
        text = tweet.content.lower()
        text = re.sub(r'http\S+', '', text) # remove URLs
        text = re.sub(r'@\w+', '', text) # remove mentions
        hashtags = re.findall(r'#\w+', text) # find hashtags
        cleaned_hashtags = [re.sub(r'#', '', hashtag) for hashtag in hashtags] # remove hash from hashtags
        text = re.sub(r'#\w+', ' '.join(cleaned_hashtags), text) # replace hashtags with cleaned hashtags
        text = text.translate(str.maketrans('', '', string.punctuation)) # remove punctuation
        words = [word for word in text.split() if word not in stop_words] # remove stop words
        cleaned_text = ' '.join(words)

        # check if the cleaned text is empty, and skip the tweet if it is
        if not cleaned_text:
            continue
        
        # add the cleaned text and other tweet data to the list
        tweets.append([tweet.date, tweet.username, cleaned_text])

        # pause the code for 1 second before making the next request
        time.sleep(3)

# create a dataframe from the list of tweets and save to CSV
df = pd.DataFrame(tweets, columns=['Date', 'User', 'Tweet'])
df.to_csv('unilever_tweets.csv', index=False)
@Wouze
Copy link

Wouze commented Apr 21, 2023

#846 discussed this issue

@JustAnotherArchivist JustAnotherArchivist added the duplicate This issue or pull request already exists label Apr 21, 2023
@JustAnotherArchivist JustAnotherArchivist closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants