Merge branch 'master' of https://github.com/Kaushal1011/machine-learning

Kaushal1011 · Nov 17, 2019 · 1a91c62 · 1a91c62
2 parents a17c21d + b0de09e
commit 1a91c62
Show file tree

Hide file tree

Showing 4 changed files with 163 additions and 3 deletions.
diff --git a/other sources/SVM.pdf b/other sources/SVM.pdf
diff --git a/src/02_Training_Models.ipynb b/src/02_Training_Models.ipynb
@@ -110,12 +110,172 @@
     "* This parameter has many options like, “linear”, “rbf”,”poly” and others (default value is “rbf”). ‘linear’ is used for linear hyper-plane whereas “rbf” and “poly” are used for non-linear hyper-plane.\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Soft Margin Classification\n",
+    "\n",
+    "If we strictly impose that all instances be off the street and on the right side, this is\n",
+    "called hard margin classification. There are two main issues with hard margin classification. \n",
+    "\n",
+    "-  it only works if the data is linearly separable.\n",
+    "-  it is quite sensitive to outliers.\n",
+    "\n",
+    "To avoid these issues it is preferable to use a more flexible model. The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations (i.e., instances that end up in the middle of the street or even on the wrong side). This is called ___soft margin classification.___\n",
+    "\n",
+    "![SVM-SoftMargin](img/svm3.png)\n",
+    "\n",
+    "In Scikit-Learn’s SVM classes, you can control this balance using the C hyperparameter: a smaller C value leads to a wider street but more margin violations. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Pipeline(memory=None,\n",
+       "         steps=[('scaler',\n",
+       "                 StandardScaler(copy=True, with_mean=True, with_std=True)),\n",
+       "                ('linear_svc',\n",
+       "                 LinearSVC(C=1, class_weight=None, dual=True,\n",
+       "                           fit_intercept=True, intercept_scaling=1,\n",
+       "                           loss='hinge', max_iter=1000, multi_class='ovr',\n",
+       "                           penalty='l2', random_state=None, tol=0.0001,\n",
+       "                           verbose=0))],\n",
+       "         verbose=False)"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "from sklearn import datasets\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "from sklearn.svm import LinearSVC\n",
+    "iris = datasets.load_iris()\n",
+    "X = iris[\"data\"][:, (2, 3)] # petal length, petal width\n",
+    "y = (iris[\"target\"] == 2).astype(np.float64) # Iris-Virginica\n",
+    "svm_clf = Pipeline((\n",
+    " (\"scaler\", StandardScaler()),\n",
+    " (\"linear_svc\", LinearSVC(C=1, loss=\"hinge\")),\n",
+    " ))\n",
+    "svm_clf.fit(X, y)"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([1.])"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "svm_clf.predict([[5.5, 1.7]])\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "- using SVC(kernel=\"linear\", C=1) is also possible, but it is much slower, especially with large training sets, so it is not recommended.\n",
+    "- The SGDClassifier class, with SGDClassifier(loss=\"hinge\",alpha=1/(m*C)). This applies regular Stochastic Gradient Descent to train a linear SVM classifier. It does not converge as fast as the LinearSVC class, but it can be useful to handle huge datasets that do not fit in memory (out-of-core training), or to handle online classification tasks.\n",
+    "\n",
+    "> The LinearSVC class regularizes the bias term, so you should center\n",
+    "the training set first by subtracting its mean. This is automatic if\n",
+    "you scale the data using the StandardScaler. Moreover, make sure\n",
+    "you set the loss hyperparameter to \"hinge\", as it is not the default\n",
+    "value. Finally, for better performance you should set the dual\n",
+    "hyperparameter to False, unless there are more features than\n",
+    "training instances"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Nonlinear SVM Classification\n",
+    "\n",
+    "Although linear SVM classifiers are efficient and work surprisingly well in many\n",
+    "cases, many datasets are not even close to being linearly separable. One approach to handling nonlinear datasets is to add more features, such as polynomial features.\n",
+    "\n",
+    "Consider the left plot in Figure it represents a simple dataset with just one feature x1. This dataset is not linearly separable, as you can see. But if you add a second feature $x_2 = (x_1)^2$, the resulting 2D dataset is perfectly linearly separable.\n",
+    "\n",
+    "![SVM-NonLinear](img/svm4.png)\n",
+    "\n",
+    "\n",
+    "```python\n",
+    "from sklearn.datasets import make_moons\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "polynomial_svm_clf = Pipeline((\n",
+    " (\"poly_features\", PolynomialFeatures(degree=3)),\n",
+    " (\"scaler\", StandardScaler()),\n",
+    " (\"svm_clf\", LinearSVC(C=10, loss=\"hinge\"))\n",
+    " ))\n",
+    "polynomial_svm_clf.fit(X, y)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Polynomial Kernel \n",
+    "\n",
+    "Adding polynomial features is simple to implement and can work great with all sorts of Machine Learning algorithms (not just SVMs), but at a low polynomial degree it cannot deal with very complex datasets, and with a high polynomial degree it creates a huge number of features, making the model too slow.\n",
+    "Fortunately, when using SVMs you can apply an almost miraculous mathematical\n",
+    "technique called the kernel trick (it is explained in a moment). It makes it possible to get the same result as if you added many polynomial features, even with very highdegree polynomials, without actually having to add them. So there is no combinatorial explosion of the number of features since you don’t actually add any features. This trick is implemented by the SVC class. \n",
+    "\n",
+    "```python\n",
+    "from sklearn.svm import SVC\n",
+    "poly_kernel_svm_clf = Pipeline((\n",
+    " (\"scaler\", StandardScaler()),\n",
+    " (\"svm_clf\", SVC(kernel=\"poly\", degree=3, coef0=1, C=5))\n",
+    " ))\n",
+    "poly_kernel_svm_clf.fit(X, y)\n",
+    "```\n",
+    "\n",
+    "> A common approach to find the right hyperparameter values is to\n",
+    "use grid search. It is often faster to first do a very\n",
+    "coarse grid search, then a finer grid search around the best values\n",
+    "found. Having a good sense of what each hyperparameter actually\n",
+    "does can also help you search in the right part of the hyperparameter space.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "## Adding Similarity Features\n",
+    "\n",
+    "Another technique to tackle nonlinear problems is to add features computed using a\n",
+    "similarity function that measures how much each instance resembles a particular\n",
+    "landmark.\n",
+    "\n",
+    "let’s define the similarity function to be the Gaussian Radial Basis Function (RBF)\n",
+    "with γ = 0.3\n",
+    "\n",
+    "$$ Gaussian RBF $$\n",
+    "\n",
+    "$$\\phi y(x,l)=exp(-y|x-l|^2)$$"
+   ]
   }
  ],
  "metadata": {

diff --git a/src/img/svm3.PNG b/src/img/svm3.PNG
diff --git a/src/img/svm4.PNG b/src/img/svm4.PNG