-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alter the AppExitInfoService to encode NDK protobuf info in escaped chars #22
Conversation
logDebug("AEI - No info trace collected") | ||
return null | ||
} | ||
val trace = encoder.encodeToString(bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i talked to @noonat about this and he thought there was a way to do this without restoring to base64 encoding. i'll let him chime in with his suggested approach
base64 compresses poorly, so it would be good if we can avoid using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that JSON strings do support arbitrary binary values with \uXXXX
escape sequences, and that if the binary data could instead be converted into a valid Java string then it might be encodable as JSON using the escape sequences where necessary. The \uXXXX
sequences in JSON strings are 6 bytes each, so this would only work out in practice if the string is mostly made up of valid ASCII characters, as @fnewberg was suspecting.
I haven't been able to find any way to do this in standard library Java/Kotlin, as all the encoding/decoding functions seem to expect the byte sequences to conform to a codec of some sort. But you could do something like this:
fun bytesToUTF8String(bytes: ByteArray): String {
val encoded = ByteArray(bytes.size * 2)
var i = 0
for (b in bytes) {
val u = b.toInt() and 0xFF
if (u < 128) {
encoded[i++] = u.toByte()
continue
}
encoded[i++] = (0xC0 or (u shr 6)).toByte()
encoded[i++] = (0x80 or (u and 0x3F)).toByte()
}
return String(encoded.copyOf(i), Charsets.UTF_8)
}
If you take the resulting string and dump it to JSON with Gson you get something with \uXXXX
escape sequences in it. I attempted encoding a byte array of the values 0 to 255 using this method and JSON decoding it on the Go side and it does seem to work fine (although you have to iterate over the string and convert the runes to bytes manually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noonat do you think this is the preferred approach over base64? should we produce a payload both ways then try compressing it with gzip to see how the sizes compare?
pretty easy to get a basic comparison with python
>>> import gzip
>>> with open('/Users/fredric/Downloads/aei-trace', 'rb') as fd:
... data = fd.read()
...
>>> len(data)
341238
>>> compressed = gzip.compress(data)
>>> len(compressed)
59755
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will do that and reply with the results here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I wrote a Go script to encode as base64 and utf8, marshal both to JSON, then compress with gzip. These are the sizes:
unencoded: 341238
base64: 104184
utf8: 69129
It also decoded and compared the resulting buffers to the original buffer to make sure nothing weird is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can toString()
a ByteArray
: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/to-string.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am away from a computer, but let me verify that toString actually works properly. When I was trying some of those techniques before, I ended up with Unicode replacement characters in the output (as the conversion functions expected the byte stream to already be a valid UTF-8 stream). But I can't remember if I tried toString.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toString() takes a CharSet
so I'd be surprised if it didn't work. Now, how well it performs... that I don't know 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right -- but it takes a charset that is expected to express the format of the incoming bytes. e.g. if you have a stream of UTF-8 bytes and you want to convert that in memory into a Java string representation, you specify the charset of the byte stream as UTF-8 so that toString knows how to interpret it. If it encounters bytes that don't conform to the encoding, it replaces them with Unicode replacement characters in the resulting string.
You can think of the charset on toString
and the String
constructor as instructing what format to use when parsing the raw bytes into a string. It's not expressing what format to convert the byte stream into, which is what we need.
As an example, if I create a byte array containing bytes 247 through 255:
var b = byteArrayOf()
for (i in 247..255) {
b += (i).toByte()
}
If I convert them with bytes.toString(Charsets.UTF_8)
and print out the JSON string and the .toByteArray().contentToString()
, I get this:
"���������"
[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67]
Note that all the characters have been replaced with the same Unicode replacement character. If I instead use this custom encoding function, I get this:
"÷øùúûüýþÿ"
[-61, -73, -61, -72, -61, -71, -61, -70, -61, -69, -61, -68, -61, -67, -61, -66, -61, -65]
...d-sdk/src/test/java/io/embrace/android/embracesdk/capture/aei/AeiNdkCrashProtobufSendTest.kt
Outdated
Show resolved
Hide resolved
@noonat @fnewberg I've updated the changeset to encode the AEI blob with the suggested approach of escaping characters. It's worth noting that currently this applies to both the ANR + NDK blobs. I assume we'd need to update the backend to handle this encoding change before we ship this as part of an SDK release? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let Fredric/Nathan finish up the review regarding the encoding stuff, this this looks good to me. And I'd probably just use the toString()
method on the ByteArray unless this implementation is more efficient?
@@ -153,7 +153,7 @@ class InternalEmbracePlugin : Plugin<Project> { | |||
testOptions { | |||
unitTests { | |||
all { test -> | |||
test.maxParallelForks = (Runtime.getRuntime().availableProcessors() / 2) + 1 | |||
test.maxParallelForks = (Runtime.getRuntime().availableProcessors() / 3) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it deadlocking that caused your test to run for 2 days? ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protobuf file added as part of the tests in this changeset seems to be failing CI consistently due to OOMs, and failed locally for me once or twice. I think reducing the number of parallel forks should help mitigate this, or perhaps only just using one test process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. Leaks? Or maybe the heap size was too small? I'm surprised the file was so big that it runs into problems if there were no leaks or if the heap is the right size.
logDebug("AEI - No info trace collected") | ||
return null | ||
} | ||
val trace = encoder.encodeToString(bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can toString()
a ByteArray
: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/to-string.html
Yes, although I suspect it will be backwards compatible for the ANR blobs, since those are already text and should not normally be effected by the encoding step. |
I will update this so that the encoding only applies to AEIs with |
@noonat i spoke with @fractalwrench about this and we discussed a couple of things:
@fractalwrench LMK if i missed something there i don't think we will need to change any keys in the payload, e.g. change |
...src/main/java/io/embrace/android/embracesdk/capture/aei/EmbraceApplicationExitInfoService.kt
Outdated
Show resolved
Hide resolved
3b14b1f
to
64e9e98
Compare
I've updated this changeset so that AEI traces are only encoded for native crashes on API >=31, and added some extra test coverage. I will see what happens with the CI run & possibly look into increasing the runner size, or decreasing the parallel forks used for running tests on CI only. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #22 +/- ##
==========================================
- Coverage 72.66% 72.64% -0.02%
==========================================
Files 305 305
Lines 9986 10012 +26
Branches 1440 1446 +6
==========================================
+ Hits 7256 7273 +17
- Misses 2161 2166 +5
- Partials 569 573 +4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Goal
This alters the
AppExitInfoService
to encode AEI blobs as Base64 in the payload.When I attempted to read the NDK protobuf file it became apparent that there was an issue with how data is recorded in the blob payload. The ANR trace file is UTF-8 text so is unaffected, but the NDK crash protobuf file is not.
The SDK was reading the file as UTF-8 text, using
BufferedReader::readText
. I think that this resulted in unreadable characters being replaced with\UFFFD
by theCharsetDecoder
, and destroyed any possibility of us being able to read the NDK crash protobuf that was sent to the backend.There are two approaches we could take to fix this:
I have not altered the payload keys etc, which we would probably want to do as part of this.
Testing
I generated a Java class using
protoc
from the tombstone schema. I then saved an NDK crash that was saved as part of AEI. A unit test was then added to simulate parsing the protobuf at different stages of its journey through the system: