Skip to content

Commit

Permalink
Merge branch 'livedoc' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
Justin Yip committed Mar 11, 2015
2 parents 7572698 + 66ab232 commit 5b3164c
Show file tree
Hide file tree
Showing 9 changed files with 322 additions and 38 deletions.
4 changes: 2 additions & 2 deletions docs/manual/source/appintegration/index.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Event Server listens to port 7070 by default. You can change the port with the [
For further information, please read:

* [Event Server Overview](/datacollection/)
* [Collecting Data with REST/SDKs](datacollection/eventapi)
* [Collecting Data with REST/SDKs](/datacollection/eventapi)

## Sending Query

Expand All @@ -29,4 +29,4 @@ it will wait for queries from your application and return predicted results in J

For further information, please read:

* [Deploying an Engine as a Web Service](/deploy/)
* [Deploying an Engine as a Web Service](/deploy/)
2 changes: 1 addition & 1 deletion docs/manual/source/evaluation/index.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
title: ML Tuning and Evaluation
---

(coming soon)
While overview is coming soon, check out [Hyperparameter Tuning](/evaluation/paramtuning) and subsequent pages to get you going.
139 changes: 105 additions & 34 deletions docs/manual/source/evaluation/paramtuning.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,105 @@
title: Hyperparameter Tuning
---


A PredictionIO engine is governed by a set of parameters, these parameters
determines which algorithm is used as well as the parameter for the algorithm.
It naturally raises a question of how to choose the best set of parameters.
The evaluation module facilitates user to *tune* the engine to obtain the best
parameter set.

## Quick Start

We demonstrate the evaluation with [the classification template]
(/templates/classification/quickstart/).
The classification template uses naive bayesian algorithm that has a smoothing
parameter. We evaluate the prediction quality against different parameter value
to find the best one.

### Edit the AppId

Edit MyClassification/src/main/scala/***Evaluation.scala*** to specify the
*appId* you used to import the data.

```scala
object EngineParamsList extends EngineParamsGenerator {
...
private[this] val baseEP = EngineParams(
dataSourceParams = DataSourceParams(appId = <YOUR_APP_ID>, evalK = Some(5)))
...
}
```

### Building and run the evaluation
To run evaluation, the command `pio eval` is used. It takes to
mandatory parameter,
1. the `Evaluation` object, it tells PredictionIO the engine and metric we use
for the evaluation; and
2. the `EngineParamsGenerator`, it contains a list of engine params to test
against.
The following command kickstarts the evaluation
workflow for the classification template.

```
$ pio build
...
$ pio eval org.template.classification.PrecisionEvaluation \
org.template.classification.EngineParamsList
```

You will see the following output:

```
...
[INFO] [CoreWorkflow$] CoreWorkflow.runEvaluation
...
[INFO] [EvaluationWorkflow$] Iteration 0
[INFO] [EvaluationWorkflow$] EngineParams: {"dataSourceParams":{"":{"appId":18,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":10.0}}],"servingParams":{"":{}}}
[INFO] [EvaluationWorkflow$] Result: 0.9281045751633987
[INFO] [EvaluationWorkflow$] Iteration 1
[INFO] [EvaluationWorkflow$] EngineParams: {"dataSourceParams":{"":{"appId":18,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":100.0}}],"servingParams":{"":{}}}
[INFO] [EvaluationWorkflow$] Result: 0.9150326797385621
[INFO] [EvaluationWorkflow$] Iteration 2
[INFO] [EvaluationWorkflow$] EngineParams: {"dataSourceParams":{"":{"appId":18,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":1000.0}}],"servingParams":{"":{}}}
[INFO] [EvaluationWorkflow$] Result: 0.4444444444444444
...
[INFO] [CoreWorkflow$] Stop spark context
[INFO] [CoreWorkflow$] Optimal score: 0.9281045751633987
[INFO] [CoreWorkflow$] Optimal engine params: {
"dataSourceParams":{
"":{
"appId":18,
"evalK":5
}
},
"preparatorParams":{
"":{
}
},
"algorithmParamsList":[
{
"naive":{
"lambda":10.0
}
}
],
"servingParams":{
"":{
}
}
}
...
```

The console prints out the evaluation metric score of each engine params, and
finally pretty print the optimal engine params.
Amongst the 3 engine params we evaluate, *lambda = 10.0* yields the highest
precision score of 0.9281.

## Detailed Explanation

An engine often depends on a number of parameters, for example, naive bayesian
classification algorithm has a smoothing parameter to make the model more
adaptive to unseen data. Comparing with parameters which are *learnt* by the
Expand Down Expand Up @@ -216,42 +315,20 @@ construct the list of engine params we want to evaluation by
adding or replacing the controller parameter. Lines 13 to 16 generate 3 engine
parameters, each has a different smoothing parameters.

## Running the Evaluation

It remains to run the evaluation. The command `pio eval` is used. It takes to
mandatory parameter, 1. the `Evaluation` object, and 2. the
`EngineParamsGenerator`. The following command kickstarts the evaluation
workflow for the classification template.

### Building
```
$ pio build
```
## Running the Evaluation

### Running the evaluation
It remains to run the evaluation. Let's recap the quick start section above.
The `pio eval` command kick starts the evaluation, and the result can be seen
from the console.

```
$ pio build
...
$ pio eval org.template.classification.PrecisionEvaluation \
org.template.classification.EngineParamsList
```

You will see the following output:

```
...
[INFO] [CoreWorkflow$] CoreWorkflow.runEvaluation
...
[INFO] [EvaluationWorkflow$] Iteration 0
[INFO] [EvaluationWorkflow$] EngineParams: {"dataSourceParams":{"":{"appId":18,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":10.0}}],"servingParams":{"":{}}}
[INFO] [EvaluationWorkflow$] Result: 0.9281045751633987
[INFO] [EvaluationWorkflow$] Iteration 1
[INFO] [EvaluationWorkflow$] EngineParams: {"dataSourceParams":{"":{"appId":18,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":100.0}}],"servingParams":{"":{}}}
[INFO] [EvaluationWorkflow$] Result: 0.9150326797385621
[INFO] [EvaluationWorkflow$] Iteration 2
[INFO] [EvaluationWorkflow$] EngineParams: {"dataSourceParams":{"":{"appId":18,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":1000.0}}],"servingParams":{"":{}}}
[INFO] [EvaluationWorkflow$] Result: 0.4444444444444444
...
[INFO] [CoreWorkflow$] Stop spark context
[INFO] [CoreWorkflow$] Optimal score: 0.9281045751633987
[INFO] [CoreWorkflow$] Optimal engine params: {
"dataSourceParams":{
Expand All @@ -278,14 +355,8 @@ You will see the following output:
}
}
}
...
```

The console prints out the evaluation metric score of each engine params, and
finally pretty print the optimal engine params.
Amongst the 3 engine params we evaluate, *lambda = 10.0* yields the highest
precision score of 0.9281.


## Notes

Expand Down
2 changes: 1 addition & 1 deletion docs/manual/source/install/install-linux.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ below to setup PredictionIO and its dependencies.
Simply download PredictionIO's binary distribution and extract it.

```
$ wget http:https://download.prediction.io/PredictionIO-<%= data.versions.pio %>.tar.gz
$ wget https:https://d8k1yxp8elc6b.cloudfront.net/PredictionIO-<%= data.versions.pio %>.tar.gz
$ tar zxvf PredictionIO-<%= data.versions.pio %>.tar.gz
$ cd PredictionIO-<%= data.versions.pio %>
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ Once you have completed the installation process, please make sure all the
components (PredictionIO Event Server, Elasticsearch, and HBase) are up and
running.

NOTE: You can skip `pio-start-all` if you have launched from the AWS
Marketplace. All components should have been started automatically.

```
$ pio-start-all
```
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: How-To (E-Commerce Recommendation)
---

Here are the pages that show you how you can customize the E-Commerce Recommendation engine template.

- [Train with Rate Event](/templates/ecommercerecommendation/train-with-rate-event/)
Loading

0 comments on commit 5b3164c

Please sign in to comment.