Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip invalid UTF8 $_SERVER values #35

Merged
merged 1 commit into from
Dec 16, 2013
Merged

Strip invalid UTF8 $_SERVER values #35

merged 1 commit into from
Dec 16, 2013

Conversation

mrardon
Copy link
Contributor

@mrardon mrardon commented Dec 11, 2013

This introduces a mbstring library requirement. So it may not be a good merge candidate but wanted to detail the issue. (utf8_decode may also handle it and am not sure if it introduces the mbstring requirement)

A little more in-depth on what is going on in my use case.

We use MaxMind's geoip to add server variables for country/city/etc. Sao Paulo has an accent over the a (ala São Paulo). Not sure if MaxMind is the issue or what but thousands of errors are showing up in Raygun due to our uptime monitor being located in São Paulo and everytime it tries to json_encode it throws up with "invalid UTF-8 sequence".

This is what MaxMind GeoIP shows: S\xe3o Paulo.

Let's discuss how we can handle this so these false positive errors go away and I can clean up my Raygun dash. Another alternative may be to allow me to ignore those variables the same way that you could white/blacklist per my other feature request #32.

Thanks!

This introduces a mbstring library requirement.
@fundead
Copy link
Contributor

fundead commented Dec 12, 2013

Thanks for bringing up this issue, this is indeed desirable to fix. We've had a similar issue in the past where users were passing in invalid input in $_POST that couldn't be serialized, so the utf8_convert function was created. I've created a new branch here with changes that transliterate it to ASCII, and on my local machine the input 'São Paulo' converts to 'S~ao Paulo'.

Obviously this isn't totally ideal so I'm not merging it into master yet but if you'd like you could check the pr35 branch out and see if it improves things a bit - at the very least it'll get rid of the spurious 'invalid UTF-8 sequence errors' - https://github.com/MindscapeHQ/raygun4php/tree/pr/35

The function does have the ability to correctly convert to escaped UTF-8 codes, but there appears to be a backend issue in parsing it which we're still investigating. Once this is resolved though I can complete the branch and you will be able to see 'São Paulo' in the Server Variables as desired.

@fundead
Copy link
Contributor

fundead commented Dec 15, 2013

I've managed to get a better implementation working here, which doesn't need to transliterate to ASCII-

https://github.com/MindscapeHQ/raygun4php/tree/requestdata_to_utf8_json

This will correctly convert $_SERVER variables into UTF-8 then strip the escaped unicode characters when encoding it into JSON. The result is that in the Raygun dashboard I now see "São Paulo" (same for form data, query string, raw data etc). Give the above branch a try and if it fixes the issues you were seeing I'll merge it into master.

Regards,

Callum

@mrardon
Copy link
Contributor Author

mrardon commented Dec 16, 2013

Great. I'll give it a shot tomorrow and let ya know. Thanks

@mrardon
Copy link
Contributor Author

mrardon commented Dec 16, 2013

At first glance it looks like all is well in our staging environment. No encoding errors coming across. 👍

fundead pushed a commit that referenced this pull request Dec 16, 2013
Strip invalid UTF8 $_SERVER values
@fundead fundead merged commit d3ceb8d into MindscapeHQ:master Dec 16, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants