Skip to content

The Poisson distribution https://en.wikipedia.org/wiki/Poisson_distribution is a discrete probability distribution often used to describe count-based data, like how many snowflakes fall in a day. If we have count data 𝑦 that are influenced by a covariate or feature π‘₯, we can use the maximum likelihood principle to develop a regression model rela…

Notifications You must be signed in to change notification settings

MahtabEK/Maximum-Likelihood

Repository files navigation

Maximum-Likelihood

The poisson distribution https://en.wikipedia.org/wiki/Poisson_distribution is a discrete probability distribution often used to describe count-based data, like how many snowflakes fall in a day. If we have count data 𝑦 that are influenced by a covariate or feature π‘₯ , we can used the maximum likelihood principle to develop a regression model relating π‘₯ to y.

Part 1: In this part, I write a function called poissonNegLogLikelihood that takes a count y and a parameter lam and produces the negative log likelihood of y assuming that it was generated by a Poisson distribution with parameter lam. I used scipy.misc.gammaln to compute the log of a factorial. The Gamma Function, Ξ“(π‘₯) , is a sort of generalized factorial, and gammaln efficiently computes the natural log of the Gamma Function.

It is worth noting that Ξ“(π‘₯)β‰ π‘₯!

Part 2: In this part, I wrote a function called poissonMLE which accepts as it's first argument an array of data data and returns the maximum likelihood estimate for a poisson distribution πœ† . I used scipy.optimize.minimize for this function.

Part 3: Here, I wrote a function called poissonRegressionLogLikelihood that takes a vector 𝐲 of counts, a design matrix 𝐗 of features for each count (including a column of 1s for the intercept), and a vector 𝐛 of parameters. The function computes the likelihood of this dataset, assuming that each 𝑦 is independently distributed with a poisson distribution with parameter πœ†=𝑒π‘₯𝑝(𝑋𝛽) . That is to say, my function works in the general case for 𝑛 obervations and 𝑝 parameters.

Part 4: In poissonRegressionNegLogLikelihood, why do you think I applied the exponential function to the linear predictor? What might have happened if I had just passed πœ†=𝑋𝛽 ? You can check out the answer in the python notebook file :)

Part 5: Here, I wrote a function called fitPoissonRegression which takes as its first argument data x and as its second argument outcomes y and returns the coefficients for a poisson regression.

Part 6: Here, I wrote a function called makePoissonRegressionPlot which loads in the data from poisson_regression_data.csv, plots a scatterplot of the data, fits a poisson regression to this data, plots the model predictions over π‘₯∈[βˆ’2,2] , and then saves the plot under the file name poisson_regression.png.

Part 7: Here, I wrote a function called makeLinearRegressionPlot which loads in the data from poisson_regression_data.csv, plots a scatterplot of the data, fits a linear regression to this data, plots the model predictions over π‘₯∈[βˆ’2,2] , and then saves the plot under the file name linear_regression.png.

Part 8:

  1. Why do you think the coefficients from OLS are different from those from Poisson regression.
  2. Why do you think the predicted mean counts are different. Do you see any major problems with the predictions from OLS?

You can check out the answers to these questions in the python notebook provided.

About

The Poisson distribution https://en.wikipedia.org/wiki/Poisson_distribution is a discrete probability distribution often used to describe count-based data, like how many snowflakes fall in a day. If we have count data 𝑦 that are influenced by a covariate or feature π‘₯, we can use the maximum likelihood principle to develop a regression model rela…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published