Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'utf-8' codec can't decode byte 0x?? in position ?: invalid continuation byte #169

Open
d-Rickyy-b opened this issue Feb 4, 2020 · 2 comments
Labels
bug Something isn't working Difficulty: Medium This issue is not easy and not hard to resolve

Comments

@d-Rickyy-b
Copy link
Owner

When pastes contain non utf-8 characters, the decoding fails and downloading the paste is being stopped.

Errors logged at:

try:
response_data = r.get(api_url)
except Exception as e:
self.logger.error(e)
raise e

and:

try:
body = self._get_paste_content(paste.key)
except PasteNotReadyException:
self.logger.debug("Paste '{0}' is not ready for downloading yet. Enqueuing it again.".format(paste.key))
# Make sure to wait a certain time. If only one element in the queue, this can lead to loops
self._rate_limit_sleep(last_body_download_time)
self._tmp_paste_queue.put(paste)
continue
except PasteDeletedException:
# We don't add a sleep here, because this can't lead to loops
self.logger.info("Paste '{0}' has been deleted before we could download it! Skipping paste.".format(paste.key))
continue
except PasteEmptyException:
self.logger.info("Paste '{0}' is set to None! Skipping paste.".format(paste.key))
continue
except Exception as e:
self.logger.error("An exception occurred while downloading the paste '{0}'. Skipping this paste! Exception is: {1}".format(paste.key, e))
continue

Example pastes:

2020-02-04 01:17:22,211 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
2020-02-04 01:17:22,213 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - An exception occurred while downloading the paste 'nDPF9r5b'. Skipping this paste! Exception is: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
2020-02-04 01:19:19,130 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte
2020-02-04 01:19:19,132 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - An exception occurred while downloading the paste 'aeC9BS25'. Skipping this paste! Exception is: 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte
2020-02-04 08:17:52,570 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte
2020-02-04 08:17:52,636 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - An exception occurred while downloading the paste '0Cq4CYCH'. Skipping this paste! Exception is: 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte

@d-Rickyy-b d-Rickyy-b added bug Something isn't working Difficulty: Medium This issue is not easy and not hard to resolve labels Feb 4, 2020
@d-Rickyy-b
Copy link
Owner Author

For this bug to be fixed, we need to use the raw bytes. That might break many analyzers.

@issamansur
Copy link

issamansur commented Jun 4, 2023

Hi, I think that you have problem there:
https://github.com/d-Rickyy-b/pastepwn/blob/1d9b82efa53d948f790b663a54d609150e65b32e/pastepwn/util/request.py#LL37C1-L42C22
You 'except' only TimeoutError add no except other errors.
You need and regex in this part before decoding utf-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Difficulty: Medium This issue is not easy and not hard to resolve
Projects
None yet
Development

No branches or pull requests

2 participants