-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Neural Net - DO NOT MERGE #349
base: master
Are you sure you want to change the base?
Conversation
…ataframes to numpy arrays to make keras happy * added a self.columns attribute to save the dataframe column names
KerasClassifier is not a BaseEstimator. This raises an error in neural network.
Don't need it now.
# Conflicts: # healthcareai/advanced_supvervised_model_trainer.py
This dataset is used for multi class classification
Changed `roc_auc` to `accuracy` to accommodate multi class classification
Add a `binary` check in `calculate_binary_classification_metrics()`, and changed the name to `calculate_classification_metrics()`. Choose performance metrics based on the number of classes.
Using the diabetes dataset
Dataset is from healthcareai-r package
Although you removed the ctg dataset in the end, I can still see it at ab5b130. Dunno if it matters though. create_nn:
Just my two cents. |
@mxlei01 I agree with your comments - Thank you for checking it out! It is important to note that is only the first step toward getting neural nets into healthcare.ai. I may be pulling pieces of this in slowly (for example the multiclass support) as we decide how we want to handle nets. |
@Aylr Regarding the neural network that Healthcare-AI would use. Would you guys rather use TensorFlow, or high level tool like Keras? I have researched a little bit about Keras vs TensorFlow. I'm not sure your deep learning training flow, but it used to be multi-threading + queues, then now the recommended way is to use DataSets. With Keras batch training using a Python generator, you would only get a portion of throughput you get for TensorFlow (plus the overhead of switching between the underlying C++ code and Python) compared to pure TensorFlow. However, Keras with MXNet backend seems to be a good alternative, with a high training throughput, although I'm not sure the performance compared with pure TensorFlow. Good GPUs are expensive, and training times are long so we might want every performance we can squeeze out of a GPU. With deep learning, we would also want to batch data for training, but right now we actually read in the whole dataset for training. Would we want to somehow make a scalable version of our data pipeline? Don't need to actually replace the whole thing, but can be set with user settings. For example I could invest in some time playing with TensorFlow and see how we could integrate it in Healthcareai-py. Finally, I might be wrong, so I'm throwing this out there to see if anyone corrects my statements. |
# Conflicts: # healthcareai/advanced_supvervised_model_trainer.py
This requires substantial review and discussion before merging.