[performance] Single row predictions speedup. #2935

AlbertoEAF · 2020-03-22T20:10:28Z

Hello,

Context

I know you're looking for ways to improve performance throughout the codebase.

I'm working on a production scenario with LightGBM for real-time prediction systems where throughtput and latency are both important.

Due to those and other constraints, since I receive individual events which must be scored as quickly as possible, I'm using the method LGBM_BoosterPredictForMatSingleRow for it states in the documentation that it partly reuses internal structures to speed up computation.

Change proposal

I looked at the code to see if there were any easy wins we could pull off and saw that for every single prediction we create a new Config object from scratch. This object has roughly 200 members, and also must parse the configuration string into the different properties.

I split that LGBM_BoosterPredictForMatSingleRow call into a "configuration/init" call that creates the config and a "scoring" call that uses that config. With some small twraks I got in a very basic case almost 2x the throughtput. This requires adding 2 functions to the C API (without touching existing code at all):

LGBM_BoosterPredictForMatSingleRowFastInit (creates the config before scoring lots of events)
LGBM_BoosterPredictForMatSingleRowFast (score using the pre-build config - as we're not changing parameters)

Check here the call graph for the current (non-patched code) with a binary classifier test with 1 thread, 7 features and 100 trees (extremely simple model):

Roughly 1/3 of the time (27.40% + 6.40%) is spent recreating the same Config over and over.
Notice only the left branch is doing "meaningful" work when you mantain the config properties - i.e. score lots of events with the same configuration.

In that implementation I was speaking of that cost no longer exists.

You can find a prototype code in here: https://github.com/AlbertoEAF/LightGBM/blob/ft-af-PULSEDEV-30690-optimize-single-row-predict-fast/src/c_api.cpp.

Can we add those 2 functions to the C API?
This is just the prototype and I could work a bit more on it to further improve it.

The text was updated successfully, but these errors were encountered:

imatiach-msft · 2020-03-23T02:52:31Z

@AlbertoEAF that looks very promising! I could definitely reuse that in mmlspark's version as well to improve performance.

StrikerRUS · 2020-03-23T14:28:41Z

Also ping @eisber as the original author and this method is only used in SWIG wrapper which in turn is used by MMLSpark.

guolinke · 2020-03-23T14:42:09Z

@AlbertoEAF good catch! Contribution is very welcome.

StrikerRUS · 2020-07-15T19:54:00Z

Closed via #2992.

github-actions · 2023-08-23T22:47:24Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

StrikerRUS mentioned this issue Mar 23, 2020

[Discussion] efficiency improvements #2791

Open

StrikerRUS added the efficiency label Mar 23, 2020

AlbertoEAF mentioned this issue Apr 11, 2020

Feat/optimize single prediction #2992

Merged

StrikerRUS closed this as completed Jul 15, 2020

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[performance] Single row predictions speedup. #2935

[performance] Single row predictions speedup. #2935

AlbertoEAF commented Mar 22, 2020 •

edited

Loading

imatiach-msft commented Mar 23, 2020

StrikerRUS commented Mar 23, 2020

guolinke commented Mar 23, 2020

StrikerRUS commented Jul 15, 2020

github-actions bot commented Aug 23, 2023

[performance] Single row predictions speedup. #2935

[performance] Single row predictions speedup. #2935

Comments

AlbertoEAF commented Mar 22, 2020 • edited Loading

imatiach-msft commented Mar 23, 2020

StrikerRUS commented Mar 23, 2020

guolinke commented Mar 23, 2020

StrikerRUS commented Jul 15, 2020

github-actions bot commented Aug 23, 2023

AlbertoEAF commented Mar 22, 2020 •

edited

Loading