From e29de382df79811b2bbac3a9d5faf0c659223d6a Mon Sep 17 00:00:00 2001 From: Carlos Scheidegger Date: Thu, 7 Feb 2019 13:42:17 -0700 Subject: [PATCH] typos --- book/perc.tex | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/book/perc.tex b/book/perc.tex index 0f85682..b3388f0 100644 --- a/book/perc.tex +++ b/book/perc.tex @@ -47,7 +47,7 @@ \section{Bio-inspired Learning} \concept{activations}). Based on how much these incoming neurons are firing, and how ``strong'' the neural connections are, our main neuron will ``decide'' how strongly it wants to fire. And so on through the -whole brain. Learning in the brain happens by neurons becomming +whole brain. Learning in the brain happens by neurons becoming connected to other neurons, and the strengths of connections adapting over time. @@ -66,7 +66,7 @@ \section{Bio-inspired Learning} being a positive example and not firing is interpreted as being a negative example. In particular, if the weighted sum is positive, it ``fires'' and otherwise it doesn't fire. This is shown -diagramatically in Figure~\ref{fig:perc:example}. +diagrammatically in Figure~\ref{fig:perc:example}. Mathematically, an input vector $\vx = \langle x_1, x_2, \dots, x_D \rangle$ arrives. The neuron stores $D$-many weights, $w_1, w_2, @@ -74,7 +74,7 @@ \section{Bio-inspired Learning} \begin{equation} \label{eq:perc:sum} a = \sum_{d=1}^D w_d x_d \end{equation} -to determine it's amount of ``activation.'' If this activiation is +to determine its amount of ``activation.'' If this activation is positive (i.e., $a > 0$) it predicts that this example is a positive example. Otherwise it predicts a negative example. @@ -84,7 +84,7 @@ \section{Bio-inspired Learning} this feature. So features with zero weight are ignored. Features with positive weights are indicative of positive examples because they cause the activation to increase. Features with negative weights are -indicative of negative examples because they cause the activiation to +indicative of negative examples because they cause the activation to decrease. \thinkaboutit{What would happen if we encoded binary features like @@ -264,7 +264,7 @@ \section{Error-Driven Updating: The Perceptron Algorithm} between $20\%$ and $50\%$ of your time, are there any cases in which you might \emph{not} want to permute the data every iteration?} -\section{Geometric Intrepretation} +\section{Geometric Interpretation} \begin{mathreview}{Dot Products} \parpic[r][t]{\includegraphics[width=1.5in]{figs/perc_dotprojection}} @@ -343,7 +343,7 @@ \section{Geometric Intrepretation} projected onto $\vw$. Below, we can think of this as a one-dimensional version of the data, where each data point is placed according to its projection along $\vw$. This distance along $\vw$ is -exactly the \emph{activiation} of that example, with no bias. +exactly the \emph{activation} of that example, with no bias. From here, you can start thinking about the role of the bias term. Previously, the threshold would be at zero. Any example with a @@ -545,7 +545,7 @@ \section{Perceptron Convergence and Linear Separability} after the \emph{first update}, and $\vw\kth$ the weight vector after the $k$th update. (We are essentially ignoring data points on which the perceptron doesn't update itself.) First, we will show that - $\dotp{\vw^*}{\vw\kth}$ grows quicky as a function of $k$. Second, + $\dotp{\vw^*}{\vw\kth}$ grows quickly as a function of $k$. Second, we will show that $\norm{\vw\kth}$ does not grow quickly. First, suppose that the $k$th update happens on example $(\vx,y)$. @@ -698,7 +698,7 @@ \section{Improved Generalization: Voting and Averaging} \end{equation} The only difference between the voted prediction, Eq~\eqref{eq:perc:vote}, and the averaged prediction, -Eq~\eqref{eq:perc:avg}, is the presense of the interior $\sign$ +Eq~\eqref{eq:perc:avg}, is the presence of the interior $\sign$ operator. With a little bit of algebra, we can rewrite the test-time prediction as: \begin{equation}