The hprof
library allows convenient access to Java .hprof files, generated by e.g. OpenJDK's JVM or Android's ART. It was built to enable scripted or interactive analysis of heap dumps.
- Java classes and objects are modeled 1:1 as Python classes and objects
- Fields are exposed as Python attributes
- Arrays are indexable and iterable
isinstance()
,issubclass()
works as expected- Tab completion works, enabling convenient exploration
Class lookups by name:
>>> car_cls, = heap.classes['com.example.cars.Car']
>>> car_cls
<JavaClass 'com.example.cars.Car'>
Finding all instances of a class, field access, instanceof
:
>>> vehicles = heap.all_instances('com.example.cars.Vehicle')
>>> vehicles = sorted(vehicles, key=lambda v: str(v.make))
>>> for v in vehicles:
... print(type(v), v.make, isinstance(v, car_cls))
com.example.cars.Bike Axes False
com.example.cars.Bike Fånark False
com.example.cars.Car Lolvo True
com.example.cars.Limo Stretch True
com.example.cars.Car Toy Yoda True
Reading object arrays:
>>> carex, = heap.all_instances('com.example.Cars')
>>> carex
<com.example.Cars 0x...>
>>> carex.vehicles
<com.example.cars.Vehicle[5] 0x...>
>>> print(carex.vehicles)
Vehicle[5] {<com.example.cars.Car 0x...>, <com.example...}
>>> carex.vehicles[0]
<com.example.cars.Car 0x...>
>>> print(carex.vehicles[0])
Car@...
Accessing a field, printing a java.lang.String
:
>>> carex.vehicles[0].make
<java.lang.String 0x...>
>>> print(carex.vehicles[0].make)
Lolvo
At least for now.
.hprof files are quite versatile, with many different record types. The hprof
library currently supports only the records that were needed to support heap dump analysis.
.hprof files don't contain information about the declared type of object references. Hence, there is no way to tell the difference between a
and b
in this case:
String a = "hello";
Object b = a;
In Java, we would see only Object
fields when accessing b
; any fields declared in String
, including possible overloads, would only be accessible through a
, or by casting b
to String
.
Since the hprof
library cannot know the declared type, both a
and b
will behave as if they were references to the exact type of the object -- String
, in this case.
The cast()
function allows emulating the Java behavior by explicitly supplying the type of the reference.
Since Java objects and classes are modeled as Python objects and classes, there may be name collisions between values from the .hprof file and the library's Python code.
The library prefixes all its internal fields with _hprof_
to minimize this risk, but this provides no guarantees. In particular, a Java application could pick its names maliciously to maximize collisions. On top of collisions with library-internal fields, there may also be name conflicts with built-in functions like __dir__
or __str__
.
It will not be possible to avoid all these situations, but we still welcome reports of bugs caused by such collisions; many may be fixable.
Our model of the Java class hierarchy breaks down at the top where java.lang.Object
and java.lang.Class
meet. This is because Class
is an Object
, and Object
is an instance of Class
, which proved too difficult to model.
There are some special cases in functions like isinstance()
that try to patch this up, but there will probably be holes remaining.
This should not be a problem for most use cases, but may be good to know if you're doing really advanced stuff.
The hprof
library instantiates all heap objects when you open your file. This makes the implementation a little simpler, and explicitly ensures that all heap references are valid at load time, so that you won't have nasty surprises later.
This can be quite memory intensive, especially when working with large .hprof files.
.hprof files may contain various callstacks, perhaps most interestingly allocation callstacks. The hprof
library currently skips over them while parsing.
The hprof
library does not expose all information available in the heap dumps. If your tool needs something, feel free to contribute to the library!
Contributions of all kinds are welcome!
Please see CONTRIBUTING.md for details.
Originally written by Snild Dolkow at Sony Mobile Communications Inc.
Please see AUTHORS and the git log for other contributors and detailed history.