Skip to content

Commit

Permalink
through 4/1 audio
Browse files Browse the repository at this point in the history
  • Loading branch information
JanetMatsen committed Mar 14, 2016
1 parent b6cd77c commit 4a4470b
Show file tree
Hide file tree
Showing 6 changed files with 58 additions and 1 deletion.
4 changes: 4 additions & 0 deletions ML_cheatsheet.tex
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,10 @@

\input{./tex/vocab.tex}

\input{./tex/clustering.tex}

\input{./tex/expectation_maximization.tex}

% Reference: \includegraphics[width=2.5in]{figures/example_kernel_separation.pdf} \hfill \\


Expand Down
Binary file added figures/k-means_gets_stuck.pdf
Binary file not shown.
Binary file added figures/kmeans_algorithm_example.pdf
Binary file not shown.
1 change: 0 additions & 1 deletion tex/boosting.tex
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@ \subsection{Boosting}
\end{itemize}
\item boosting with a weak classifier is better than using a fancy classifier.
A boosted version will always do better than the vanilla one.
\
\end{itemize}


Expand Down
52 changes: 52 additions & 0 deletions tex/clustering.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
\section{Clustering}
\smallskip \hrule height 2pt \smallskip

\begin{itemize}
\item Unsupervised learning: detect patterns in unlabeled data.
Sometmes labels are too expensive, unclear, etc. to get them.
Examples:
\begin{itemize}
\item group e-mails or search results
\item find categories of customers
\item detect anomalous program execuations
\end{itemize}
\item Useful when you don't know what you are looking for.
\item Requires a definition of "similar". One option: small (squared) euclidean distance.
\item You can label then use the clusters, or use the clusters for the next level of anlaysis.
\end{itemize}

\subsection{K-Means}
An iterative clustering algorithm. \hfill \\
Pick K random points as cluster means: $c^1, \dots, c^k$. \hfill \\
Alternate:
\begin{itemize}
\item Assign each example $x^i$ to the mean $c^i$ that is closest to it
\item Set each mean $c^i$ to the average of its assigned points.
\end{itemize}
Stop when no points' assignments change. \hfill \\

Minimizing a loss that is a function of the points, assignments, and means:
$$ L( \{ x*i \}, \{ a*j \}, \{ c*k \}) = \sum_i dist(x^i, c^{a^i}$$
Coordinate gradient descent on L. \hfill \\

More formally:
\begin{itemize}
\item Data: $\{ x^j | j = 1 \dots n \}$
\item For $ t = 1 \dots T$: (or stop if assignments don't change): \hfill \\
Fix means ($c$) while you change the assignments ($a$): \hfill \\
\begin{itemize}

\item for $ j = 1 \dots n$: (recompute cluster assignments):
$$ a^j = \argmin_i dist(x^j, c^i) $$
\end{itemize}
\item fix assignments ($a$) while you change the means ($c$): \hfill \\
for $j = 1 \dots k$: (recompute cluster centers)
$$ c^j = \frac{1}{|\{ i | a^i = j \}|} \sum_{\{ i | a^i = j \}} x^i$$
\end{itemize}
Note: the point y with minimum squared Euclidean distance to a set of points {x} is their mean

\includegraphics[width=3.3in]{figures/kmeans_algorithm_example.pdf}

\subsection{K-Means gets stuck in local optima.}

\includegraphics[width=1.8in]{figures/k-means_gets_stuck.pdf}
2 changes: 2 additions & 0 deletions tex/expectation_maximization.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
\section{Expectation Maximization}
\smallskip \hrule height 2pt \smallskip

0 comments on commit 4a4470b

Please sign in to comment.