UPSG is a standard methodology, an interchange format, and a Python library for writing machine learning pipelines.
It is designed primarily to provide different teams working on different machine learning problems a way to share code across different languages and environments.
install with:
pip install git+git:https://github.com/dssg/UPSG.git
To use the UPSG Python library, we currently require the following packages. In most environments, pip should take care of this for you.
This is how to implement the sklearn "Getting started" pipeline:
from sklearn import datasets from sklearn.svm import SVC from upsg.fetch.np import NumpyRead from upsg.wrap.wrap_sklearn import wrap_and_make_instance from upsg.export.csv import CSVWrite from upsg.transform.split import SplitTrainTest from upsg.pipeline import Pipeline digits = datasets.load_digits() digits_data = digits.data # for now, we need a column vector rather than an array digits_target = digits.target p = Pipeline() # load data from a numpy dataset stage_data = NumpyRead(digits_data) stage_target = NumpyRead(digits_target) # train/test split stage_split_data = SplitTrainTest(2, test_size=1, random_state=0) # build a classifier stage_clf = wrap_and_make_instance(SVC, gamma=0.001, C=100.) # output to a csv stage_csv = CSVWrite('out.csv') node_data, node_target, node_split, node_clf, node_csv = map( p.add, [ stage_data, stage_target, stage_split_data, stage_clf, stage_csv]) # connect the pipeline stages together node_data['output'] > node_split['input0'] node_target['output'] > node_split['input1'] node_split['train0'] > node_clf['X_train'] node_split['train1'] > node_clf['y_train'] node_split['test0'] > node_clf['X_test'] node_clf['y_pred'] > node_csv['input'] p.run() # results are now in out.csv
Check out the documentation