Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
dhruvildave committed Nov 17, 2019
2 parents a17c21d + b0de09e commit 1a91c62
Show file tree
Hide file tree
Showing 4 changed files with 163 additions and 3 deletions.
Binary file added other sources/SVM.pdf
Binary file not shown.
166 changes: 163 additions & 3 deletions src/02_Training_Models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -110,12 +110,172 @@
"* This parameter has many options like, “linear”, “rbf”,”poly” and others (default value is “rbf”). ‘linear’ is used for linear hyper-plane whereas “rbf” and “poly” are used for non-linear hyper-plane.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Soft Margin Classification\n",
"\n",
"If we strictly impose that all instances be off the street and on the right side, this is\n",
"called hard margin classification. There are two main issues with hard margin classification. \n",
"\n",
"- it only works if the data is linearly separable.\n",
"- it is quite sensitive to outliers.\n",
"\n",
"To avoid these issues it is preferable to use a more flexible model. The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations (i.e., instances that end up in the middle of the street or even on the wrong side). This is called ___soft margin classification.___\n",
"\n",
"![SVM-SoftMargin](img/svm3.png)\n",
"\n",
"In Scikit-Learn’s SVM classes, you can control this balance using the C hyperparameter: a smaller C value leads to a wider street but more margin violations. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Pipeline(memory=None,\n",
" steps=[('scaler',\n",
" StandardScaler(copy=True, with_mean=True, with_std=True)),\n",
" ('linear_svc',\n",
" LinearSVC(C=1, class_weight=None, dual=True,\n",
" fit_intercept=True, intercept_scaling=1,\n",
" loss='hinge', max_iter=1000, multi_class='ovr',\n",
" penalty='l2', random_state=None, tol=0.0001,\n",
" verbose=0))],\n",
" verbose=False)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"from sklearn import datasets\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.svm import LinearSVC\n",
"iris = datasets.load_iris()\n",
"X = iris[\"data\"][:, (2, 3)] # petal length, petal width\n",
"y = (iris[\"target\"] == 2).astype(np.float64) # Iris-Virginica\n",
"svm_clf = Pipeline((\n",
" (\"scaler\", StandardScaler()),\n",
" (\"linear_svc\", LinearSVC(C=1, loss=\"hinge\")),\n",
" ))\n",
"svm_clf.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1.])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"svm_clf.predict([[5.5, 1.7]])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- using SVC(kernel=\"linear\", C=1) is also possible, but it is much slower, especially with large training sets, so it is not recommended.\n",
"- The SGDClassifier class, with SGDClassifier(loss=\"hinge\",alpha=1/(m*C)). This applies regular Stochastic Gradient Descent to train a linear SVM classifier. It does not converge as fast as the LinearSVC class, but it can be useful to handle huge datasets that do not fit in memory (out-of-core training), or to handle online classification tasks.\n",
"\n",
"> The LinearSVC class regularizes the bias term, so you should center\n",
"the training set first by subtracting its mean. This is automatic if\n",
"you scale the data using the StandardScaler. Moreover, make sure\n",
"you set the loss hyperparameter to \"hinge\", as it is not the default\n",
"value. Finally, for better performance you should set the dual\n",
"hyperparameter to False, unless there are more features than\n",
"training instances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Nonlinear SVM Classification\n",
"\n",
"Although linear SVM classifiers are efficient and work surprisingly well in many\n",
"cases, many datasets are not even close to being linearly separable. One approach to handling nonlinear datasets is to add more features, such as polynomial features.\n",
"\n",
"Consider the left plot in Figure it represents a simple dataset with just one feature x1. This dataset is not linearly separable, as you can see. But if you add a second feature $x_2 = (x_1)^2$, the resulting 2D dataset is perfectly linearly separable.\n",
"\n",
"![SVM-NonLinear](img/svm4.png)\n",
"\n",
"\n",
"```python\n",
"from sklearn.datasets import make_moons\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"polynomial_svm_clf = Pipeline((\n",
" (\"poly_features\", PolynomialFeatures(degree=3)),\n",
" (\"scaler\", StandardScaler()),\n",
" (\"svm_clf\", LinearSVC(C=10, loss=\"hinge\"))\n",
" ))\n",
"polynomial_svm_clf.fit(X, y)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Polynomial Kernel \n",
"\n",
"Adding polynomial features is simple to implement and can work great with all sorts of Machine Learning algorithms (not just SVMs), but at a low polynomial degree it cannot deal with very complex datasets, and with a high polynomial degree it creates a huge number of features, making the model too slow.\n",
"Fortunately, when using SVMs you can apply an almost miraculous mathematical\n",
"technique called the kernel trick (it is explained in a moment). It makes it possible to get the same result as if you added many polynomial features, even with very highdegree polynomials, without actually having to add them. So there is no combinatorial explosion of the number of features since you don’t actually add any features. This trick is implemented by the SVC class. \n",
"\n",
"```python\n",
"from sklearn.svm import SVC\n",
"poly_kernel_svm_clf = Pipeline((\n",
" (\"scaler\", StandardScaler()),\n",
" (\"svm_clf\", SVC(kernel=\"poly\", degree=3, coef0=1, C=5))\n",
" ))\n",
"poly_kernel_svm_clf.fit(X, y)\n",
"```\n",
"\n",
"> A common approach to find the right hyperparameter values is to\n",
"use grid search. It is often faster to first do a very\n",
"coarse grid search, then a finer grid search around the best values\n",
"found. Having a good sense of what each hyperparameter actually\n",
"does can also help you search in the right part of the hyperparameter space.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": []
"source": [
"## Adding Similarity Features\n",
"\n",
"Another technique to tackle nonlinear problems is to add features computed using a\n",
"similarity function that measures how much each instance resembles a particular\n",
"landmark.\n",
"\n",
"let’s define the similarity function to be the Gaussian Radial Basis Function (RBF)\n",
"with γ = 0.3\n",
"\n",
"$$ Gaussian RBF $$\n",
"\n",
"$$\\phi y(x,l)=exp(-y|x-l|^2)$$"
]
}
],
"metadata": {
Expand Down
Binary file added src/img/svm3.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/img/svm4.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1a91c62

Please sign in to comment.