Initial Neural Net - DO NOT MERGE #349

Aylr · 2017-08-29T19:24:25Z

This requires substantial review and discussion before merging.

…ataframes to numpy arrays to make keras happy * added a self.columns attribute to save the dataframe column names

KerasClassifier is not a BaseEstimator. This raises an error in neural network.

Don't need it now.

# Conflicts: # healthcareai/advanced_supvervised_model_trainer.py

This dataset is used for multi class classification

Changed `roc_auc` to `accuracy` to accommodate multi class classification

Add a `binary` check in `calculate_binary_classification_metrics()`, and changed the name to `calculate_classification_metrics()`. Choose performance metrics based on the number of classes.

Using the diabetes dataset

Dataset is from healthcareai-r package

mxlei01 · 2017-08-29T19:33:57Z

Although you removed the ctg dataset in the end, I can still see it at ab5b130. Dunno if it matters though.

create_nn:

Maybe give the user the an option create a deeper network? Looks like a fixed layer size network
Last layer activation: Maybe have a choice for sigmoid for multi-label classification?

Just my two cents.

Aylr · 2017-09-04T12:24:57Z

@mxlei01 I agree with your comments - Thank you for checking it out! It is important to note that is only the first step toward getting neural nets into healthcare.ai. I may be pulling pieces of this in slowly (for example the multiclass support) as we decide how we want to handle nets.

mxlei01 · 2017-09-06T05:32:50Z

@Aylr Regarding the neural network that Healthcare-AI would use. Would you guys rather use TensorFlow, or high level tool like Keras? I have researched a little bit about Keras vs TensorFlow. I'm not sure your deep learning training flow, but it used to be multi-threading + queues, then now the recommended way is to use DataSets. With Keras batch training using a Python generator, you would only get a portion of throughput you get for TensorFlow (plus the overhead of switching between the underlying C++ code and Python) compared to pure TensorFlow. However, Keras with MXNet backend seems to be a good alternative, with a high training throughput, although I'm not sure the performance compared with pure TensorFlow.

Good GPUs are expensive, and training times are long so we might want every performance we can squeeze out of a GPU.

With deep learning, we would also want to batch data for training, but right now we actually read in the whole dataset for training.

Would we want to somehow make a scalable version of our data pipeline? Don't need to actually replace the whole thing, but can be set with user settings. For example pipeline='TensorFlow'.

I could invest in some time playing with TensorFlow and see how we could integrate it in Healthcareai-py.

Finally, I might be wrong, so I'm throwing this out there to see if anyone corrects my statements.

# Conflicts: # healthcareai/advanced_supvervised_model_trainer.py

Shufang Ci and others added 30 commits July 17, 2017 13:32

Add ctg dataset specifically for building neural network

ab5b130

Add a function to load ctg data

f3d0e58

For testing, print roc score

aca1bdd

Add get_algorithm_neural_network

f8fc7df

Add neural_network_classifier

63122de

Run an example to test neural network

922755b

* using pandas DataFrame.as_matrix() in train_test_split to convert d…

2067e50

…ataframes to numpy arrays to make keras happy * added a self.columns attribute to save the dataframe column names

Remove issubclass(type(model), sklearn.base.BaseEstimator)

23cd59a

KerasClassifier is not a BaseEstimator. This raises an error in neural network.

Remove print(roc)

4a8c3ea

Don't need it now.

Create an estimator for KerasClassifier

c16d7b5

Add feature scaling, adjusted _create_trained_supervised_model

d6f477e

Add TestNeuralNetworkClassificaton

e907623

Neural network example using ctg data

d18dd46

Neural network example using diabetes data

f055c6a

Merge remote-tracking branch 'origin/sc_neuralnet' into sc_neuralnet

5befb94

# Conflicts: # healthcareai/advanced_supvervised_model_trainer.py

Add compute_confusion_matrix(), add metrics choices for multiclass

16011f5

Add print/plot confusionMat; check if is binary classification

e2cb790

Calculate number of output neurons in neural network

7e8356e

Add calculated output dimension to neural network

adb398c

Add confusion matrix tests

86af2dd

Add annotations for confusion matrix

bdb4368

Add annotations for get_algorithm_neural_network()

181644f

Adjusted transformations for target variable

ba7c39d

Add a function to load the dermatology dataset

de78588

This dataset is used for multi class classification

Add dermatology data for multi class classification

9b1fdcf

Add annotations for print/plot confusion matrix

0875b3a

Change default scoring_metric to accuracy

b90eebe

Changed `roc_auc` to `accuracy` to accommodate multi class classification

Add a check on binary classification

835c5e6

Add calculate_classification_metrics()

2266fbe

Add a `binary` check in `calculate_binary_classification_metrics()`, and changed the name to `calculate_classification_metrics()`. Choose performance metrics based on the number of classes.

Add annotations

347f6ff

Shufang Ci added 7 commits July 21, 2017 14:51

Some minor adjustments

8d5cc23

Add tests for multi class classification

516da02

Deleted ctg related examples

6ceaba7

Deleted ctg dataset

753d08d

Remove ctg data related function

e4fd448

A binary classification example with neural network

1d16316

Using the diabetes dataset

Add a multiclass example with neural network

d26ef88

Dataset is from healthcareai-r package

Merge branch 'master' into sc_neuralnet

6cfff51

# Conflicts: # healthcareai/advanced_supvervised_model_trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Neural Net - DO NOT MERGE #349

Initial Neural Net - DO NOT MERGE #349

Aylr commented Aug 29, 2017

mxlei01 commented Aug 29, 2017 •

edited

Loading

Aylr commented Sep 4, 2017 •

edited

Loading

mxlei01 commented Sep 6, 2017 •

edited

Loading

Initial Neural Net - DO NOT MERGE #349

Are you sure you want to change the base?

Initial Neural Net - DO NOT MERGE #349

Conversation

Aylr commented Aug 29, 2017

mxlei01 commented Aug 29, 2017 • edited Loading

Aylr commented Sep 4, 2017 • edited Loading

mxlei01 commented Sep 6, 2017 • edited Loading

mxlei01 commented Aug 29, 2017 •

edited

Loading

Aylr commented Sep 4, 2017 •

edited

Loading

mxlei01 commented Sep 6, 2017 •

edited

Loading