Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_encode() trouble with unicode #11

Closed
adamthehutt opened this issue Oct 7, 2013 · 8 comments · Fixed by #16
Closed

json_encode() trouble with unicode #11

adamthehutt opened this issue Oct 7, 2013 · 8 comments · Fixed by #16
Assignees

Comments

@adamthehutt
Copy link
Contributor

The json_encode() function has trouble when passed non-unicode data. This sometimes happens due to malformed user input. This results in the RaygunClient#send() method throwing an additional error (json_encode(): Invalid UTF-8 sequence in argument).

This can be fixed by changing line 143 of RaygunClient.php to:

curl_setopt($httpData, CURLOPT_POSTFIELDS, json_encode(iconv('UTF-8', 'UTF-8//IGNORE', $message)));
@ghost ghost assigned fundead Oct 10, 2013
@CmdrKeen
Copy link
Contributor

Thanks for this Adam -- we'll get this rolled in ASAP 👍

@fundead
Copy link
Contributor

fundead commented Oct 11, 2013

Hi Adam,

I've been looking into this, and the fix above throws another warning: iconv() expects parameter 3 to be string, object given in '../RaygunClient.php'. This is as expected as iconv()'s choking on the fact that $message is a RaygunMessage object.

What fields are the malformed input appearing in - assuming it's one of the Request data fields? We could sanitize it manually when parsing the request object.. alternatively, would it be possible to ensure that the user data is encoded in UTF-8 (assuming it's coming in from a POST)?

@adamthehutt
Copy link
Contributor Author

Hi Callum,

Right, of course. Sorry about that.

This is really only a concern with the request data (post/get) as you note. So probably what needs to happen is in RaygunRequestMessage#__construct().

Something like:

$this->form = array_map(
    function($str) {
        return iconv('UTF-8', 'UTF-8//IGNORE', $str);
    }, 
    $_POST
);

and later:

$this->rawData = iconv('UTF-8', 'UTF-8//IGNORE', file_get_contents('php:https://input'));

As you note, it would be possible to do this in the application code before it's handled by Raygun (at least with respect to $_POST), but that would require modifying the superglobal itself, which isn't a good idea.

Hope that helps!

Adam

@fundead
Copy link
Contributor

fundead commented Oct 13, 2013

Cheers for that! I've pushed a branch with those changes and after testing it appears to work on my end. When you have a chance can you grab the latest version and verify that it's fixed the error message you were seeing.

Callum

fundead pushed a commit that referenced this issue Oct 13, 2013
Ensure user-submitted data is encoded UTF-8 ref #11
@adamthehutt
Copy link
Contributor Author

Thanks, Callum. So far so good! I'll let you know if I run into trouble.

@fundead fundead closed this as completed Oct 15, 2013
@adamthehutt
Copy link
Contributor Author

Hey Callum,

Sorry, but I just realized there's a pretty serious bug with the code I gave you above. The array_map for the $_POST data assumes that every element of $_POST is a string. But of course, with PHP you can submit form values like, e.g., name="field[]" which causes $_POST to include a nested array. This will break the above code and cause an additional error to be thrown.

A possible fix (which I haven't had a chance to test):

$utf8_convert = function($value) use (&$utf8_convert) {
    return is_array($value) ? 
        array_map($utf8_convert, $value) : 
        iconv('UTF-8', 'UTF-8//IGNORE', $value);
};
$this->form = array_map($utf8_convert, $_POST);

@fundead
Copy link
Contributor

fundead commented Oct 16, 2013

Ah excellent, thanks for the code. I've tested it locally and it appears to do what is expected - when $_POST contains data from a form with

<input type="text" name="eg[0]" value="1" />
<input type="text" name="eg[1]" value="2" />

no error will be thrown (as opposed to before). The Form Values section in the Raygun dashboard will correctly display eg = "1, 2". I've pushed the change to the request-fix branch, if you could verify that the fix works and doesn't throw any further errors I'll merge it into master.

@adamthehutt
Copy link
Contributor Author

Thanks, Callum. Works for me.

@fundead fundead mentioned this issue Oct 17, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants