Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github Wiki migration #79

Merged
merged 24 commits into from
Aug 22, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
move all files
  • Loading branch information
tovbinm committed Aug 22, 2018
commit e9c288fe77651df25b2f7d5e7ff4e4703346f75f
3 changes: 0 additions & 3 deletions docs/Motivation.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/_Footer.md

This file was deleted.

70 changes: 0 additions & 70 deletions docs/_Sidebar.md

This file was deleted.

2 changes: 2 additions & 0 deletions docs/Abstractions.md → docs/abstractions/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Abstractions

TransmogrifAI is designed to simplify the creation of machine learning workflows. To this end we have created an abstraction for creating and running machine learning workflows. The abstraction is made up of Features, Stages, Workflows and Readers which interact as shown in the diagram below.

![TransmogrifAI Abstractions](https://github.com/salesforce/TransmogrifAI/raw/master/resources/AbstractionDiagram-cropped.png)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# AutoML Capabilities

## Vectorizers and Transmogrification

Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,4 +169,4 @@
(master_doc, 'TransmogrifAI', 'TransmogrifAI Documentation',
author, 'TransmogrifAI', 'One line description of project.',
'Miscellaneous'),
]
]
12 changes: 7 additions & 5 deletions docs/Contributing.md → docs/contributing/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Contributing

This page lists recommendations and requirements for how to best contribute to TransmogrifAI. We strive to obey these as best as possible. As always, thanks for contributing – we hope these guidelines make it easier and shed some light on our approach and processes.

# Issues, requests & ideas
## Issues, requests & ideas

Use GitHub [Issues](https://github.com/salesforce/TransmogrifAI/issues) page to submit issues, enhancement requests and discuss ideas.

# Contributing
## Contributing

1. **Ensure the bug/feature was not already reported** by searching on GitHub under [Issues](https://github.com/salesforce/TransmogrifAI/issues). If none exists, create a new issue so that other contributors can keep track of what you are trying to add/fix and offer suggestions (or let you know if there is already an effort in progress).
3. **Clone** the forked repo to your machine.
Expand All @@ -14,7 +16,7 @@ Use GitHub [Issues](https://github.com/salesforce/TransmogrifAI/issues) page to

> **NOTE**: Be sure to [sync your fork](https://help.github.com/articles/syncing-a-fork/) before making a pull request.

# Contribution Checklist
## Contribution Checklist

- [x] Clean, simple, well styled code
- [x] Comments
Expand All @@ -28,8 +30,8 @@ Use GitHub [Issues](https://github.com/salesforce/TransmogrifAI/issues) page to
- Minimize number of dependencies.
- Prefer BSD, Apache 2.0, MIT, ISC and MPL licenses.

# Code of Conduct
## Code of Conduct
Follow the [Apache Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).

# License
## License
By contributing your code, you agree to license your contribution under the terms of the [BSD 3-Clause](License).
51 changes: 5 additions & 46 deletions docs/Developer-Guide.md → docs/developer-guide/index.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,4 @@
## Table Of Contents
* [Features](Developer-Guide#features)
* [Type Hierarchy and Automatic Feature Engineering](Developer-Guide#type-hierarchy-and-automatic-feature-engineering)
* [Feature Creation](Developer-Guide#feature-creation)
* [FeatureBuilders](Developer-Guide#featurebuilders)
* [Stages](Developer-Guide#stages)
* [Transformers](Developer-Guide#transformers)
* [TransmogrifAI Transformers](Developer-Guide#transmogrifai-transformers)
* [Writing your own transformer](Developer-Guide#writing-your-own-transformer)
* [Wrapping a SparkML transformer](Developer-Guide#wrapping-a-sparkml-transformer)
* [Wrapping a non serializable external library](Developer-Guide#wrapping-a-non-serializable-external-library)
* [Estimators](Developer-Guide#estimators)
* [TransmogrifAI Estimators](Developer-Guide#transmogrifai-estimators)
* [Writing your own estimator](Developer-Guide#writing-your-own-estimator)
* [Wrapping a SparkML estimator](Developer-Guide#wrapping-a-sparkml-estimator)
* [Creating Shortcuts for Transformers and Estimators](Developer-Guide#creating-shortcuts-for-transformers-and-estimators)
* [Shortcuts Naming Convention](Developer-Guide#shortcuts-naming-convention)
* [Customizing AutoML Stages](Developer-Guide#customizing-automl-stages)
* [Transmogrification](Developer-Guide#transmogrification)
* [SanityChecker](Developer-Guide#sanitychecker)
* [RawFeatureFilter](Developer-Guide#rawfeaturefilter)
* [Model Selector](Developer-Guide#modelselector)
* [Interoperability with SparkML](Developer-Guide#interoperability-with-sparkml)
* [Workflows](Developer-Guide#workflows)
* [Creating A Workflow](Developer-Guide#creating-a-workflow)
* [Fitting a Workflow](Developer-Guide#fitting-a-workflow)
* [Fitted Workflows](Developer-Guide#fitted-workflows)
* [Saving Workflows](Developer-Guide#saving-workflows)
* [Loading saved Workflows](Developer-Guide#loading-saved-workflows)
* [Removing problematic features](Developer-Guide#removing-problematic-features)
* [Extracting ModelInsights from a Fitted Workflow](Developer-Guide#extracting-modelinsights-from-a-fitted-workflow)
* [Extracting a Particular Stage from a Fitted Workflow](Developer-Guide#extracting-a-particular-stage-from-a-fitted-workflow)
* [Adding new features to a fitted workflow](Developer-Guide#adding-new-features-to-a-fitted-workflow)
* [Metadata](Developer-Guide#metadata)
* [DataReaders](Developer-Guide#datareaders)
* [Aggregate Data Readers](Developer-Guide#aggregate-data-readers)
* [Conditional Data Readers](Developer-Guide#conditional-data-readers)
* [Joined Data Readers](Developer-Guide#joined-data-readers)
* [Evaluators](Developer-Guide#evaluators)
* [Evaluators Factory](Developer-Guide#evaluators-factory)
* [Single Evaluation](Developer-Guide#single-evaluation)
* [Multiple Evaluation](Developer-Guide#multiple-evaluation)
* [Creating a custom evaluator](Developer-Guide#creating-a-custom-evaluator)
* [TransmogrifAI App and Runner](Developer-Guide#transmogrifai-app-and-runner)
* [Parameter Injection Into Workflows and Workflow Runners](Developer-Guide#parameter-injection-into-workflows-and-workflow-runners)

# Developer Guide

## Features

Expand Down Expand Up @@ -1083,3 +1038,7 @@ Here we are resetting the “TopK” parameter of a stage with class name “MyT


***


.. toctree::
:maxdepth: 2
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Boostrap Your First Project

We provide a convenient way to bootstrap you first project with TransmogrifAI using the TransmogrifAI CLI.
As an illustration, let's generate a binary classification model with the Titanic passenger data.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Boston Regression

The following code illustrates how TransmogrifAI can be used to do linear regression. We use Boston dataset to predict housing prices.
The code for this example can be found [here](https://github.com/salesforce/TransmogrifAI/tree/master/helloworld/src/main/scala/com/salesforce/hw/boston), and the data over [here](https://github.com/salesforce/op/tree/master/helloworld/src/main/resources/BostonDataset).

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Conditional Aggregation

In this example, we demonstrate use of TransmogrifAI's conditional readers to, once again, simplify complex data preparation. Code for this example can be found [here](https://github.com/salesforce/TransmogrifAI/tree/master/helloworld/src/main/scala/com/salesforce/hw/dataprep/ConditionalAggregation.scala), and the data can be found [here](https://github.com/salesforce/op/tree/master/helloworld/src/main/resources/WebVisitsDataset/WebVisits.csv).

In the previous [example](Example%3A-Time-Series-Aggregates-and-Joins), we showed how TransmogrifAI FeatureBuilders and Aggregate Readers could be used to aggregate predictors and response variables with respect to a reference point in time. However, sometimes, aggregations need to be computed with respect to the time of occurrence of a particular event, and this time may vary from key to key. In particular, let's consider a situation where we are analyzing website visit data, and would like to build a model that predicts the number of purchases a user makes on the website within a day of visiting a particular landing page. In this scenario, we need to construct a training dataset that for each user, identifies the time when he visited the landing page, and then creates a response which is the number of times the user made a purchase within a day of that time. The predictors for the user would be aggregated from the web visit behavior of the user up unto that point in time.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Iris MultiClass Classification

The following code illustrates how TransmogrifAI can be used to do classify multiple classes over the Iris dataset.
The code for this example can be found [here](https://github.com/salesforce/TransmogrifAI/tree/master/helloworld/src/main/scala/com/salesforce/hw/iris), and the data over [here](https://github.com/salesforce/op/tree/master/helloworld/src/main/resources/IrisDataset).

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Running from Spark Shell

Start up your spark shell:

```bash
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Time Series Aggregates and Joins

In this example, we will walk you through some of the powerful tools TransmogrifAI has for data preparation, in particular for time series aggregates and joins. The code for this example can be found [here](https://github.com/salesforce/TransmogrifAI/tree/master/helloworld/src/main/scala/com/salesforce/hw/dataprep/JoinsAndAggregates.scala), and the data over [here](https://github.com/salesforce/op/tree/master/helloworld/src/main/resources/EmailDataset).

In this example, we would like to build a training data set from two different tables -- a table of Email Sends, and a table of Email Clicks. The following case classes describe the schemas of the two tables:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Titanic Binary Classification

Here we describe a very simple TransmogrifAI workflow for predicting survivors in the often-cited Titanic dataset. The code for building and applying the Titanic model can be found here: [Titanic Code](https://github.com/salesforce/TransmogrifAI/blob/master/helloworld/src/main/scala/com/salesforce/hw/OpTitanicSimple.scala), and the data can be found here: [Titanic Data](https://github.com/salesforce/op/blob/master/helloworld/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv).

You can run this code as follows:
Expand Down
15 changes: 15 additions & 0 deletions docs/examples/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. _examples:

Examples
============

.. toctree::
:maxdepth: 1

Titanic-Binary-Classification
Iris-MultiClass-Classification
Boston-Regression
Time-Series-Aggregates-and-Joins
Conditional-Aggregation
Running-from-Spark-Shell
Bootstrap-Your-First-Project
16 changes: 9 additions & 7 deletions docs/FAQ.md → docs/faq/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
## 1) What is TransmogrifAI?
# FAQ

## What is TransmogrifAI?

TransmogrifAI is an AutoML library written in Scala that runs on top of Spark. It was developed with a focus on enhancing machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity and reuse.

Expand All @@ -7,13 +9,13 @@ Use TransmogrifAI if you need a machine learning library to:
* Rapidly train good quality machine learnt models with minimal hand tuning
* Build modular, reusable, strongly typed machine learning workflows

## 2) I am used to working in Python why should I care about type safety?
## I am used to working in Python why should I care about type safety?

The flexibility of Salesforce Objects allows customers to modify even standard objects schemas. This means that when writing models for a multi-tenant environment the only information about what is in a column that we can really count on is the Salesforce type (i.e. Phone, Email, Mulipicklist, Percent, etc.). Working in a strictly typed environment allows us to leverage this information to perform sensible automatic feature engineering.

In addition type safety assures that you get fewer unexpected data issues in production.

## 3) What does automatic feature engineering based on types look like?
## What does automatic feature engineering based on types look like?

In order to take advantage of automatic type based feature engineering in TransmogrifAI one simply defines the features that will be used in the model and relies on TransmogrifAI to do the feature engineering. The code for this would look like:

Expand All @@ -25,11 +27,11 @@ The transmogrify shortcut will sort the features by type and apply appropriate t

Of course if you want to manually perform these or other transformations you can simply specify the steps for each feature and use the VectorsCombiner Transformer to manually combine your final features. However, this gives developers the option of using default type specific feature engineering.

## 4) What other AutoML functionality does TransmogrifAI provide?
## What other AutoML functionality does TransmogrifAI provide?

Look at the [AutoML Capabilities](AutoML-Capabilities) section for a complete list of the powerful AutoML estimators that TransmogrifAI provides. In a nutshell, they are Transmogrifier for automatic feature engineering, SanityChecker and RawFeatureFilter for data cleaning and automatic feature selection, and ModelSelectors for different classes of problems for automatic model selection.

## 5) What imports do I need for TransmogrifAI to work?
## What imports do I need for TransmogrifAI to work?

```scala
// TransmogrifAI functionality: feature types, feature builders, feature dsl, readers, aggregators etc.
Expand All @@ -47,9 +49,9 @@ import com.salesforce.op.utils.spark.RichMetadata._
import com.salesforce.op.utils.spark.RichStructType._
```

## 6) I don't need joins or aggregations in my data preparation why can't I just use Spark to load my data and pass it into a Workflow?
## I don't need joins or aggregations in my data preparation why can't I just use Spark to load my data and pass it into a Workflow?
You can! Simply use the `.setInputRDD(myRDD)` or `.setInputDataSet(myDataSet)` methods on Workflow to pass in your data.

## 7) How do I examine intermediate data when trying to debug my ML workflow?
## How do I examine intermediate data when trying to debug my ML workflow?
You can generate data up to any particular point in the Workflow using the method `.computeDataUpTo(myFeature)`. Calling this method on your Workflow or WorkflowModel will compute a DataFrame which contains all of the rows for features created up to that point in your flow.

22 changes: 18 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@
TransmogrifAI
=========================================

.. toctree::
:maxdepth: 2
:caption: Contents:

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an **AutoML** library written in Scala that runs on top of Spark. It was developed with a focus on enhancing machine learning **developer productivity** through **machine learning automation**, and an API that enforces **compile-time type-safety**, **modularity** and **reuse**.

Use TransmogrifAI if you need a machine learning library to:
Expand Down Expand Up @@ -41,3 +37,21 @@ Motivation
*Building real life machine learning applications needs a fair amount of tribal knowledge and intuition. Coupled with the explosion of ML use cases in the world that need to be addressed, there is a need for tools that enable rapid prototyping and development of machine learning pipelines. We believe that automation is the key to making machine learning development truly scalable and accessible.*

For more information, read our `blogpost <https://engineering.salesforce.com/open-sourcing-transmogrifai-4e5d0e098da2/>`_!

Documentation
########
.. toctree::
:maxdepth: 4

installation/index
examples/index
abstractions/index
automl-capabilities/index
faq/index
talks/index
contributing/index
developer-guide/index
license/index



Loading