Regresa is a Python package where I implemented my own versions of the linear and logistic regression algorithms presented by Andrew Ng in his course: Supervised Machine Learning: Regression and Classification.
My motivations were:
- to have reusable implementation of the algorithms for learning purposes,
- to write the algorithms avoiding nested loops for readability,
- to add tests to the implementations so I could play refactoring them.
Regresa is written with Poetry. The following instructions should be sufficient for you to start using it.
git clone https://github.com/elcapo/regresa.git
cd regresa
poetry install
Note that you'll need to install git, python and poetry to get this working.
Once installed, use Poetry's shell to interact with the package.
poetry shell
The linear module offers functions to compute a linear regression given a set of examples with one or more features:
- predict: apply a given set of coefficients to the input to predict an output
- loss: compute the individual loss for a set of examples
- cost: compute the total cost for a set of examples
- cost_gradient: compute the gradient of the cost of a given set of coefficients
- gradient_descent: compute a gradient descent
These functions can be imported one by one:
from regresa.linear import predict
# ... and then use them directly by their names
predict([[0], [1]], [2], .5) # [0.5, 2.5]
... or all at once:
from regresa import linear
# ... and then use them directly by their names prefixed with linear
linear.predict([[0], [1]], [2], .5) # [0.5, 2.5]
from regresa.linear import predict
help(predict)
Apply a given set of coefficients to the input to predict an output.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
Return:
f_wb (ndarray (m, )): evaluation of the linear regression for each value of x
from regresa.linear import loss
help(loss)
Compute the loss of a set of examples.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
Returns:
(ndarray (m, )): loss for each of the given examples
from regresa.linear import cost
help(cost)
Compute the cost for a given set of examples.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
lambde (scalar): factor of regularization
Returns:
(scalar): total cost for the given set of weights
from regresa.linear import cost_gradient
help(cost_gradient)
Compute the gradient of the cost for a given set of examples.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
lambde (scalar): factor of regularization
Returns:
(ndarray (n, )): gradient of the cost for the given set of weights w
(scalar): gradient of the cost for the given weight b
from regresa.linear import gradient_descent
help(gradient_descent)
Compute a gradient descent.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
alpha (scalar): learning rate
iterations (scalar): number of iterations to run
Returns:
(ndarray (n, )): weights for each feature after the iterations
(scalar): additional scalar weight
Note that the subscript in
$w_n$ and$b_n$ represent a given iteration and$w_{n-1}$ and$b_{n-1}$ represent the previous one.
The logistic module offers functions to compute a binary classification given a set of examples with one or more features:
- sigmoid: compute the sigmoid of a vector
- predict: apply a given set of coefficients to the input to predict an output
- loss: compute the individual loss for a set of examples
- cost: compute the total cost for a set of examples
- cost_gradient: compute the gradient of the cost of a given set of coefficients
- gradient_descent: compute a gradient descent
These functions can be imported one by one:
from regresa.logistic import sigmoid
# ... and then use them directly by their names
sigmoid(.5) # .6224593312018546
... or all at once:
from regresa import logistic
# ... and then use them directly by their names prefixed with logistic
logistic.sigmoid(.5) # .6224593312018546
from regresa.logistic import sigmoid
help(sigmoid)
Compute the sigmoid of z. In other words, compute 1 / (1 + e**(-z)).
Arguments:
z (ndarray (m, )): one dimensional vector with the input values
Returns:
(ndarray (m, )): vector with the dimension of z and the result of the computation
This function accepts scalars as input. If a scalar is given, a scalar is also returned.
sigmoid(0) # 0.5
sigmoid(9**9) # 1.0
The function also accepts lists of numbers and Numpy arrays as input. In those cases, a Numpy array with the same dimension of the input is returned.
sigmoid([0, 9**9]) # array([0.5, 1. ])
In combination with the plot
method from the plotter
module, you can easily have a glimpse on how the function looks like.
from regresa.logistic import sigmoid
from regresa.plotter import plot
x = [x for x in range(-10, 10 + 1)]
y = sigmoid(x)
plot(x, y)
from regresa.logistic import predict
help(predict)
Apply a given set of coefficients to the input to predict an output.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
Return:
f_wb (ndarray (m, )): evaluation of the logistic regression for each value of x
In combination with the plot
method from the plotter
module, you can check how a logistic regression graph changes with different weights.
from regresa import plotter, logistic
x = [[x/10] for x in range(-100, 110, 1)]
multiple_y = [logistic.predict(x, [d/10], 0) for d in range(0, 12, 2)]
labels = ['w = {}'.format(d/10) for d in range(0, 12, 2)]
plotter.over_plot(x, multiple_y, legends)
from regresa.logistic import loss
help(loss)
Compute the loss of a set of examples.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
Returns:
(ndarray (m, )): loss for each of the given examples
from regresa.logistic import cost
help(cost)
Compute the cost for a given set of examples.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
lambde (scalar): factor of regularization
Returns:
(scalar): total cost for the given set of weights
from regresa.logistic import cost_gradient
help(cost_gradient)
Compute the gradient of the cost for a given set of examples.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
lambde (scalar): factor of regularization
Returns:
(ndarray (n, )): gradient of the cost for the given set of weights w
(scalar): gradient of the cost for the given weight b
from regresa.logistic import gradient_descent
help(gradient_descent)
Compute a gradient descent.
Arguments:
X (ndarray (m, n)): input values where the regression will be computed
y (ndarray (m, )): vector with boolean tags for each example
w (ndarray (n, )): weights for each of the features
b (scalar): biased weight for the regression
alpha (scalar): learning rate
iterations (scalar): number of iterations to run
Returns:
(ndarray (n, )): weights for each feature after the iterations
(scalar): additional scalar weight
Note that the superscript in
$w_j^i$ does not represent a power. Instead, it express that this is the value of$w_j$ that corresponds with the iteration$i$ .
To run the tests, use PyTest from your shell.
pytest -v
In order to maintain the documentation of each function up to date, this README uses templates to print the help text for each of the functions on the linear
and logistic
modules.
This means that rather than making changes to this document, changes should be done in docs/README.template instead.
After the template us updated, the main README.md file can be updated by running:
python docs/refresh_readme.py