Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] Single row predictions speedup. #2935

Closed
AlbertoEAF opened this issue Mar 22, 2020 · 5 comments
Closed

[performance] Single row predictions speedup. #2935

AlbertoEAF opened this issue Mar 22, 2020 · 5 comments

Comments

@AlbertoEAF
Copy link
Contributor

AlbertoEAF commented Mar 22, 2020

Hello,

Context

I know you're looking for ways to improve performance throughout the codebase.

I'm working on a production scenario with LightGBM for real-time prediction systems where throughtput and latency are both important.

Due to those and other constraints, since I receive individual events which must be scored as quickly as possible, I'm using the method LGBM_BoosterPredictForMatSingleRow for it states in the documentation that it partly reuses internal structures to speed up computation.

Change proposal

I looked at the code to see if there were any easy wins we could pull off and saw that for every single prediction we create a new Config object from scratch. This object has roughly 200 members, and also must parse the configuration string into the different properties.

I split that LGBM_BoosterPredictForMatSingleRow call into a "configuration/init" call that creates the config and a "scoring" call that uses that config. With some small twraks I got in a very basic case almost 2x the throughtput. This requires adding 2 functions to the C API (without touching existing code at all):

  • LGBM_BoosterPredictForMatSingleRowFastInit (creates the config before scoring lots of events)
  • LGBM_BoosterPredictForMatSingleRowFast (score using the pre-build config - as we're not changing parameters)

Check here the call graph for the current (non-patched code) with a binary classifier test with 1 thread, 7 features and 100 trees (extremely simple model):
image
Roughly 1/3 of the time (27.40% + 6.40%) is spent recreating the same Config over and over.
Notice only the left branch is doing "meaningful" work when you mantain the config properties - i.e. score lots of events with the same configuration.

In that implementation I was speaking of that cost no longer exists.

You can find a prototype code in here: https://github.com/AlbertoEAF/LightGBM/blob/ft-af-PULSEDEV-30690-optimize-single-row-predict-fast/src/c_api.cpp.

Can we add those 2 functions to the C API?
This is just the prototype and I could work a bit more on it to further improve it.

@imatiach-msft
Copy link
Contributor

@AlbertoEAF that looks very promising! I could definitely reuse that in mmlspark's version as well to improve performance.

@StrikerRUS
Copy link
Collaborator

Also ping @eisber as the original author and this method is only used in SWIG wrapper which in turn is used by MMLSpark.

@guolinke
Copy link
Collaborator

@AlbertoEAF good catch! Contribution is very welcome.

@StrikerRUS
Copy link
Collaborator

Closed via #2992.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants