-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change output type of clusters-kmeans #850
Conversation
@morganherlocker These changes might help understand the other modules ( |
I love the colors in the test output, so festive! 🎉 😄 |
@DenisCarriere although the module's output might still be acceptable, there is an issue I have been thinking about in the last few days. The above to say that I believe we should at least use the default random selection of initial centroids (provided by We might define valid an output that:
which is basically all you can/should expect from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat! 👍
if (numberOfClusters > count) numberOfClusters = count; | ||
|
||
// Clone points to prevent any mutations | ||
points = clone(points, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we let the user decide if the input should change or (by default) not, like we did for the transform
modules? I believe it's likely the user wants the input to be added the clustering properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could always add a mutate
parameter (to allow input mutations).
It will significantly increase the performance if the user doesn't mind having the input be added.
We might be able to do handle this without the use of clone, but as a safe measure it was added.
Benchmark results
without clone
fiji x 260,987 ops/sec ±1.03% (87 runs sampled)
many-points x 173 ops/sec ±7.86% (77 runs sampled)
points-with-properties x 245,562 ops/sec ±1.54% (86 runs sampled)
points1 x 71,572 ops/sec ±0.86% (88 runs sampled)
points2 x 40,784 ops/sec ±1.06% (86 runs sampled)
with clone
(~6 times slower)
fiji x 41,566 ops/sec ±1.18% (89 runs sampled)
many-points x 145 ops/sec ±2.42% (79 runs sampled)
points-with-properties x 38,453 ops/sec ±1.22% (83 runs sampled)
points1 x 12,598 ops/sec ±1.09% (91 runs sampled)
points2 x 8,124 ops/sec ±0.97% (89 runs sampled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I had in mind :+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yep, I didn't realize how "slow" it was after adding clone
, that JSON.parse + JSON.stringify
really takes a toll on the performance benchmarks.
points: points, | ||
centroids: featureCollection(centroids) | ||
}; | ||
return points; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the reasons why we want to output GeoJSON objects, however I believe the centroids in this particular clustering method are relevant and would be beneficial to the user having them already available in the output (as it's a common pattern in ALL the kmeans
library available).
Couldn't we add them as a centroids
property of the output? It would still be a valid GeoJSON format, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 How about... we add [number, number]
as a new centroid
property, that way it doesn't deviate too much from the GeoJSON specification.
@stebogit Added the centroid back to the test cases using the (new) |
@DenisCarriere did you read my comment? Just curious about your point of view. |
That was a big paragraph... :) You might of not noticed it, but I started added bullet points of To-Do's in the first comment #850 (comment) Things I might of missed:
It would be interesting to see the difference between K-means++, I personally don't know the difference, I'd have to dig deeper into this. 👍 We can include the improved K-means++
We should use the default random selection of centroids if it's recommended by K-means to do so. As for the test cases, I'm sure we can figure out a clever way to test this, we can still output the results using
And the ones you mentioned:
Those should be enough tests to make sure the module is working correctly. Try to summarize them in a concise "to-do" list and we can tackle them in a new PR. |
Sorry @DenisCarriere I didn't noticed your changes. |
Updates to
@turf/clusters-kmeans
cluster
)numberOfClusters can\'t be greater than the number of points
(number of Clusters fallbacks to the count instead of throwing an error since the user might be passing multiple featureCollections and the total number of features is unknown/variable)index.js
clone
to prevent any input mutationclusterEach
(visual/testing purposes only)mutate
param to allow input to be mutatedJSDocs
Example