Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low GPU utilization during Random Forest predict #843

Open
StargazerAlex opened this issue May 6, 2020 · 0 comments
Open

Low GPU utilization during Random Forest predict #843

StargazerAlex opened this issue May 6, 2020 · 0 comments

Comments

@StargazerAlex
Copy link

Environment (for bugs)

  • OS platform, distribution and version (e.g. Linux Ubuntu 16.04): Ubuntu 16.04.6 LTS
  • Installed from (source or binary): pip
  • Version: 0.4.0
  • Python version (optional): 3.6
  • CUDA/cuDNN version: 10.0
  • GPU model (optional): Nvidia T4
  • CPU model: Intel Xeon, 32 cores
  • RAM available: 200 GB

Description

I want to use the Random Forest Classifier for predictions on a large amount of data but the prediction phase takes oddly much time and shows very low GPU utilization. Here are the parameters I used for training:

model = h2o4gpu.RandomForestClassifier(
    n_estimators = 100, criterion = "gini",
    max_depth = 8, min_samples_split = 2, min_samples_leaf = 1,
    min_weight_fraction_leaf = 0, max_features = "auto",
    max_leaf_nodes = None, min_impurity_decrease = 0,
    min_impurity_split = None, bootstrap = True, oob_score = False,
    n_jobs = -1, random_state = None, verbose = 0, warm_start = False,
    class_weight = None, subsample = 1, colsample_bytree = 1,
    num_parallel_tree = 1, tree_method = "gpu_hist", n_gpus = -1,
    predictor = "gpu_predictor", backend = "h2o4gpu")

model.fit(x_train, y_train)

The training works pretty well. It's comparably fast and constantly uses around 80% of the GPU (measured with nvidia-smi).

y_pred = model.predict(x_test)

The prediction however only utilizes 4% of the GPU for a fraction of the time it requires to do one iteration (across 10 samples) while it mostly seems to use the CPU with being constantly at 100% for one core. For a class size of 2 it takes around 0.4 seconds, for 10 classes it is 3.4 seconds. Running it on solely CPU-based scikit-learn is faster with only 0.1 seconds.

Is this a general problem of tree-based predictions or am I doing something wrong?

Thanks a lot in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant