Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support adding fewer than numItems via auto-trim() #44

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

leeoniya
Copy link

@leeoniya leeoniya commented Aug 30, 2022

Fixes #43

hey @mourner,

this diff was the path of least resistance. i think potentially this._levelBounds init can be deferred until finish(), but is likely minor, though every little bit counts when you're rebuilding the index during a canvas resize/drag :)

i'm not sure there is much reason to require trim() be manually invoked as it would just add noise to user code simply to avoid an automatically-avoidable error? you may also be passing the Flatbush instance outside the original context, and then consumers need to know about trim() and also how to read back expected length to do the pre-check before invoking .finish()...feels awkward.

@leeoniya
Copy link
Author

leeoniya commented Sep 7, 2022

ping @mourner 😄

@leeoniya
Copy link
Author

re-ping @mourner 🎃

@mourner
Copy link
Owner

mourner commented Nov 5, 2022

@leeoniya sorry, having a lot on my plate here in Kyiv — not sure when I'll get to this. In the meantime, you could simply rewrite the trim so that it could be added on your app side rather than merged in.

@leeoniya
Copy link
Author

leeoniya commented Nov 9, 2022

sorry! no problem, stay safe! 🇺🇦

@mourner
Copy link
Owner

mourner commented Apr 21, 2023

i'm not sure there is much reason to require trim() be manually invoked as it would just add noise to user code simply to avoid an automatically-avoidable error?

@leeoniya Getting back to the PR and thinking about this more, I guess the reason why it doesn't feel right to trim automatically is because it makes it easy to shoot yourself in the foot. If we change the docs for the numItems to really mean maxNumItems, people will start using Flatbush this way, not realizing that there's a significant difference in performance & memory footprint between e.g. adding numItems and adding numItems - 1 and trimming.

In the latter case, the code will duplicate the allocated memory and then dump the old one all under the hood. This is why I like things being explicit, so that users realize there's a cost to calling trim(). The behavior is less magic and more transparent, just the way I like it.

@leeoniya
Copy link
Author

leeoniya commented Apr 23, 2023

not realizing that there's a significant difference in performance & memory footprint

i ran a non-scientific test of this and the cost of trimming 100k estimated down to 50k-70k actual is ~0.5ms. the rest of that .finish() call takes 16ms when using new Flatbush(100_000, 512, Int16Array);. so the CPU overhead is 3%, and likely proportionally lower when the item counts are higher. in my case i'm way over a single frame budget anyways and i don't foresee people re-indexing 100k+ objects every 16ms while expecting smooth sailing.

keeping it explicit is fine. in my order of preference:

  • add trim: boolean arg to the Flatbush constructor, but it already has a lot of opts :(
  • index.finish(trim: false)
  • index.trim() + index.finish()

only the first option allows you to be explicit and not have to worry about calling finish() with insufficient items. the other two options have worse ergonomics because calling .finish() without remembering to indicate trim runs the risk of throwing an error that is completely avoidable.

@mourner
Copy link
Owner

mourner commented Apr 24, 2023

@leeoniya the profiler won't show the full picture when it's about memory allocation. When a bunch of memory is no longer used, it will be marked for garbage collection, but not necessarily be freed at the time — it will batch those unused memory chunks until a GC pass, which may happen much later and cause a micro-stutter. So it may have an impact even if you don't see it in the profiler. In your case 0.5ms is likely just for the 50k allocation / memory copy, not for the GC (50k is a pretty small amount anyway).

Nowadays developers aren't used to paying attention to memory allocation, but it's something that feels important to me. Even if it this doesn't make a difference in your particular application, it still doesn't feel right to do the trimming implicitly.

calling .finish() without remembering to indicate trim runs the risk of throwing an error that is completely avoidable.

I don't see any problems with seeing an error if it's easy to fix and make the code more intentional.

@leeoniya
Copy link
Author

Nowadays developers aren't used to paying attention to memory allocation, but it's something that feels important to me.

i spend a lot of time in mem and cpu profilers and have written some pretty fast libs, so i definitely do pay attention. in this case, however, .finish() will always dwarf any .trim() + follow-up GC costs if you have to re-index in a tight loop. GC of shallow / simple structs, like a contiguous typed array will be very small.

i'm on a phone now, but happy to do a more thorough cpu & mem profile with some setInterval / 50ms and 10x larger datasets like 500k.

in either case, i'll update the pr to not do auto-trim during finish() without explicit opt-in.

@mourner
Copy link
Owner

mourner commented Apr 24, 2023

i definitely do pay attention

Sorry, I was referring to library users. Maybe you're right and I'm being irrational, but it still doesn't sit well with me that an index can suddenly allocate twice as much memory as expected based on non-obvious logic. Maybe that's some exaggerated sense of "purity" and subjective taste that makes me defensive about one of my favorite libraries, but I'd rather go with a gut feeling than endlessly argue.

@@ -44,6 +44,18 @@ export default class Flatbush {
* @param {ArrayBuffer | SharedArrayBuffer} [data] (Only used internally)
*/
constructor(numItems, nodeSize = 16, ArrayType = Float64Array, ArrayBufferType = ArrayBuffer, data) {
this.init(numItems, nodeSize, ArrayType, ArrayBufferType, data);
Copy link
Author

@leeoniya leeoniya Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this causes TS to be confused now about initializing various this.* things it expects to see in the constructor, not sure how to fix it.

honestly, i think having Flatbush and FlatQueue be ES6 classes does not help here. there's really no reason for either to be a class instead of a plain closure (which can still be used just fine with new or without. having constructor be special is problematic because it cannot be invoked without new like you can do with a plain init function. i dont think the polymorphism afforded by classes adds anything to this lib.

related discussion about "resetting" already-allocated class instances: https://stackoverflow.com/questions/47917721/pooling-with-javascript-classes

if you're interested, i can port Flatbush to TS and a closure fn instead of a class.

@leeoniya
Copy link
Author

pushed some updates. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

resize (shrink) during finish()?
2 participants