> The simple approach is to simply add a children() method to the interface and implement it in every kind of node. Very tedious, and boilerplaty as hell.
The best approach without boilerplate would be to generate this at compile time, not runtime. You could use an approach like in http://hannesdorfmann.com/annotation-processing/annotationpr... to create at compile-time, using reflection, a ChildWalker class that accesses the relevant properties for every class annotated with @Node. At runtime, the JIT should get this down to a vtable lookup and field lookups - just about as optimized as you can get.
EDIT: Always remember https://xkcd.com/1205/ - if it would take 5 seconds to write each children() method, and you write no more than 5 Node classes per day (amortized), then you shouldn't spend more 12 hours writing the code for this blog post ;) Unless, of course, that was the point all along!
Seeing all these comments about code generation at compile time makes me wonder if there's an overlap between people who use/write these kind of tools for complex languages and those who post 'what's so great about (lisp) macros?'.
An annotation processor does not seem to be a good fit here. Annotation processors can only generate new classes, not modify existing ones. So you would have to make all classes abstract and then generate non-abstract ones that have a #children() method and refer to the generated classes everywhere in your code.
If you use Kotlin instead of Java it's way better because no nullability-issues and it's super-useful standard-library.
And once you've figured out how to hook it into gradle, IDEA etc., which admittedly cost me an hour or so, you can reuse that knowledge for every annotation-processor to come.
I made it a habit to set myself a time-limit EVERY time I start touching the build-process. After the limit has passed it must be faster, more automated, or in any other way more useful, or I will revert the changes.
It really depends on the annotation processor and what kind of code you are trying to generate. In the author's case it shouldn't take that long nor would it be hard to develop and debug.
Also I've worked places with this magic and no induction or documentation or time for anyone to explain anything. The more behind the scenes hard to discover stuff you implement the better the docs and onboarding needs to be. It can't just be done to fulfill a architect astronaut fantasy.
I don't care so much about the documentation, as long as it's part of the build script (it's own documentation) and not checked into source control. By not checking it into source control it won't be edited, there won't be a mix of generated and custom code. By being part of the build process you avoid the generation tool and scripts being lost to the sands of time.
Or, heck, just hack together a little script to generate the code and hook it into your build system. You don't need anything as fancy as an annotation processor. A handful of Python code could accomplish just as well.
Indeed, code generation was going to be my permanent solution, until this turned out fast enough not to bother!
Seems like I'm doing fine by xkcd, at least if you remove the time to make a simpler version for the blog post =) And the learning has some value too anyhow.
Seems to me this is approaching this totally the wrong way. First off, the current implementation is just looping through all the methods of the implementing class and the first one that matches a Node or [Node] return type is supposedly its children? That seems extremely brittle.
The fact that you have 100 classes implementing this is a good sign you should turn the design on its head. Why not just have a single tree structure built using generics?
edit: also I would point out that you're not really making reflection any faster, just your code
On top of that, he acknowledges that there is a much, much simpler solution, known from the dawn of Java, and this is to have a polymorphic children() method.
But that would (gasp!) require adding those implementations. Which is tedious (it is not with a bit of inheritance or some other technique but ok). OK, so, it requires a bit of work. But those methods would take a few (or a dozen) nanoseconds to dispatch; you can't go faster than that. What's the alternative? To build your brittle infrastructure? Hmmm, I must be missing something...
This is assuming you have full control over all the implementations. I see this frequently in large Java and Scala code bases in enterprise, you have a generic library you built awhile back and now has all sorts of users in all different places that you may not even have source code access to, only jar access though Nexus or Artifactory. Then you need to do some operation like OP notes but you can't expect everyone to commit your code immediately or even at all. Thus OP is left with only reasonable solution detailed.
So your complaint is that it is hard for users if you make a breaking change in your library, big surprise. This is why breaking changes in libraries should only be done with utmost consideration, preferably after first deprecating the usage.
And if you make a breaking change to the interface, you better be damn well breaking it at compile time and not hiding it with reflection shenanigans.
In the example of the OP, it's easy to think of an example that would totally be broken by the author's reflection approach, while the user has no warning at compile time and no reasonable way of knowing that what he's doing isn't allowed:
class MyWeirdNode extends Node {
private final int i = 1;
public IntNode asIntNode() {
return new IntNode(i)
}
}
I think you're confused about the "heterogeneous tree" parts. All implementation of Node have different children, held in different fields with different names, and varying in numbers. This can't be simplified by inheritance, you need to write an implementation per class.
> The fact that you have 100 classes implementing this is a good sign you should turn the design on its head. Why not just have a single tree structure built using generics?
Does anyone know why gui toolkits never do this? Every one seems to follow an OO model where each widget owns it's own children. Even in GTK they hacked in an OO system to do it instead of having a separate tree.
Because almost every GUI component is responsible for sizing and positioning its own children. A separate tree might be useful for some other optimisations I can't think of, but it would definitely be a pain for components, even those with a fixed number of children (as opposed to "container" components).
I'm not seeing how I can use some generic tree class without adding code for each Node subclass, which is exactly what I'm trying to avoid.
At the least, I'd have to populate the children list of that class (which is more or less what children() would have done). The identity of each child (what it represents), such as "condition of a while statement" or "body of the else part of an if statement" must be preserved.
Also, the issue with methods really isn't, because these classes are really POJOs. Getters and methods inherited from Object is pretty much all they have, so I'm not worried about accidentally capturing a method that is not a getter.
And well, this make reflection calls faster by replacing them (probably) with the equivalent plain code.
If you are using reflection and encountering performance problems, the correct approach is pretty much always to write an annotation processor and generate that code instead.
When I needed something more custom, I've done the bytecode strategy a few times, using ASM library or equiv. But the cost of change (brittleness, aggravation) is pretty high.
Now I just write Java source and use the JDK's built-in javac to generate the bytecodes.
Some people like StringTemplate. http://www.stringtemplate.org For whatever reason, I still prefer Velocity. Template languages are mostly evil, and variations of terrible, so I just stick to what I know.
Annotations are never the correct answer. For any use case, purpose.
Edit: Whoops. May have spoke too soon. JDK 8's MethodHandle and some other features (inlining simple getters) may have mooted ReflectASM. https://github.com/EsotericSoftware/reflectasm/issues/23 I may have to retool some of my older code.
I'm just a simple bear. I can't keep all the details straight.
One strategy I use to cope is round tripping, as a way to confirm I get the expected results.
My current approach to code generation: painstakingly hand craft the desired source code output. Then turn that into a template (progressive refinement). Then compare (diff) generated output to manual. Iterate. Many times that means adjusting target to make it easier to generate.
Using bytecodes (class file) as the target output adds complexity and more steps. I've been reversing bytecodes for a long time, manually using javap, the decompilers (which are now quite good), and using ASMs code generator plugin for Eclipse (super clever). But it's too fussy, compared to the Java source code strategy.
Didn't look too deep, but this might be due to the fact that call sites are memoized for invoke calls as part of the invokedynamic improvements (but really MethodHandle::invoke and MethodHandle::invokeExact are where the magic is as is mentioned by the JVM spec IIRC).
Regardless, MethodHandle invocations are always going to be faster than reflection Method invocations. A primary reason is that the parameters for invoke and invokeExact are not subject to the same boxing and runtime-validation requirements as reflection.
Why? I see why it might be a good idea depending on what problem you are trying to solve, but I don't see why anything has to have any references to its children in order to be a tree node.
I assume the JIT isn't optimizing until the functions execute a certain number of times. Given the speed difference maybe it isn't even natively compiling at first and using the interpreter.
I don't know about Java but in .NET there's a call you can make to force the dynamically generated lamba expression to be compiled immediately.
Of course if you're being really adventurous you can use IL emit to build a function up from IL opcodes.
Although sometimes the performance when interpreted before warm-up is important.
Testing with CompilerControl.Mode.EXCLUDE in JMH is still more reliable than testing manually, when you don't know if it got compiled or not.
Interpreted code is a case where method handles should be much faster than the generated accessor classes used by reflection - lambdas and method references are faster than an inner class implementing a functional interface before jit compilation. Afterwards, performance tends to be sameish.
That's the kind of solution I've used before when performing reflection in a tight loop. The important thing there is to cache the result of the compiled expression or the dynamic IL code because the compile time/emit is slower than a single use of reflection.
> The important thing there is to cache the result of the compiled expression or the dynamic IL code because the compile time/emit is slower than a single use of reflection
I came across some code a few weeks ago that skipped this step. It wasn't called remotely often enough to justify the complication in the first place (low volume internal web app) but it didn't cache the result so it ended up making things slower. Thanks to the wonders of RenderAction in MVC it compiled the dynamic IL 20 times per page view as well.
In the past, I've used Introspector API [1] for property reflection. It's part of the JDK, and provides a lot of the caching this article is talking about. Doubt this would provide the same level of improvement as adding the direct children() method, or even ASM based approaches, but it's certainly faster than naked reflection.
There's another, and IMO, a better way to make Java reflection faster. It's to disable checks on reflection calls. Just call setAccessible(true), and you will notice substantial improvement.
I wonder if this could be used by JVM languages that sometimes fall back to reflection, like Clojure. Transparently profile the reflection calls and recompile the hot ones?
Clojure provides a var called warn-on-reflection that can be set to true during dev, and all (or a vast majority of) reflective calls can be optimized away by adding type hints wherever it warns. This is to say that Clojure devs get rid of reflection instead of trying to make it faster.
Sounds like doing it automatically, instead of requiring code changes, would save a lot of work and speed up many programs where the programmers haven't gone through the trouble of doing this.
C# reflection is also slow, and even slower in Unity3D with the IL2CPP ahead-of-time compiler, where there is no just-in-time compiler or System.Reflection.Emit to dynamically generate code at runtime (which is impossible and prohibited on iOS, for example).
However, there are ways of compiling the little snippets of the code you need ahead of time, then plugging them together with delegates at runtime, which avoids much of the overhead.
Much of the cost is the overhead of converting from unknown types to specific types. So you can pre-compile loosely typed adaptor thunks that take generic "object" arguments, convert them to the required type, and directly call strongly typed delegates (which you can plug into the adaptor, so adaptors can be used for any number of compatible delegates).
Check out how the following code uses FastInvoke, ToOpenDelegate, Delegate.CreateDelegate, Expression.GetActionType, and
Expression.GetFuncType.
It may be that some of these general techniques could be applied to Java.
Faster Invoke for reflected property access and method invocation with AOT compilation:
The bane of the iOS programmers life, when working with reflection in Mono, is that you cant go around making up new generic types to ensure that your reflected properties and methods get called at decent speed. This is because Mono on iOS is fully Ahead Of Time compiled and simply cant make up new stuff as you go along. That coupled with the dire performance of Invoke when using reflected properties lead me to construct a helper class.
This works by registering a series of method signatures with the compiler, so that they are available to code running on the device. In my tests property access was 4.5x faster and method access with one parameters was 2.4x faster. Not earth shattering but every little helps. If you knew what you wanted ahead of time, then you could probably do a lot better. See here for info.
You have to register signatures inside each class Im afraid. Nothing I can do about that.
So to register a signature you use:
static MyClass()
{
//All methods returning string can be accelerated
DelegateSupport.RegisterFunctionType<MyClass, string>();
//All methods returning string and taking an int can be accelerated
DelegateSupport.RegisterFunctionType<MyClass, int, string>();
//All methods returning void and taking a bool can be accelerated
DelegateSupport.RegisterActionType<MyClass, bool>();
}
Then when you have a MethodInfo you use the extension method
FastInvoke(object target, params object[] parameters) to call
it. FastInvoke will default to using normal Invoke if you havent
accelerated a particular type.
The best approach without boilerplate would be to generate this at compile time, not runtime. You could use an approach like in http://hannesdorfmann.com/annotation-processing/annotationpr... to create at compile-time, using reflection, a ChildWalker class that accesses the relevant properties for every class annotated with @Node. At runtime, the JIT should get this down to a vtable lookup and field lookups - just about as optimized as you can get.
EDIT: Always remember https://xkcd.com/1205/ - if it would take 5 seconds to write each children() method, and you write no more than 5 Node classes per day (amortized), then you shouldn't spend more 12 hours writing the code for this blog post ;) Unless, of course, that was the point all along!