Statistics and Probability

Basic

Descriptive Analytics

Probability

Counting Permutations and Combinations

Random Variables

$E[X+Y] = E[X] + E[Y]$, $E[X-Y] = E[X] - E[Y]$
$Var[X+Y] = Var[X] + Var[Y]$, $Var[X-Y] = Var[X] + Var[Y]$

Binomial Random Variables

Also include:

Normal Random Variables

Continuous Probability Distributions for Machine Learning

PDF: Probability Density Function, returns the probability of a given continuous outcome.
CDF: Cumulative Distribution Function, returns the probability of a value less than or equal to a given outcome.
PPF: Percent-Point Function, returns a discrete value that is less than or equal to the given probability.

Sampling Distributions

Statistical Study

Study Design

Confidence Intervals

The margin of error is a statistic expressing the amount of random sampling error in the results of a survey.

One-proportion z interval (One-sample z interval for a proportion)
One-sample t interval (One-sample t interval for a mean)

Hypothesis Testing

Common test statistics

One-sample tests are appropriate when a sample is being compared to the population from a hypothesis. The population characteristics are known from theory or are calculated from the population.

Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment.

Paired tests are appropriate for comparing two samples where it is impossible to control important variables. Rather than comparing two sets, members are paired between samples so the difference between the members becomes the sample. Typically the mean of the differences is then compared to zero. The common example scenario for when a paired difference test is appropriate is when a single set of test subjects has something applied to them and the test is intended to check for an effect.

Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation.

A t-test is appropriate for comparing means under relaxed conditions (less is assumed).

Tests of proportions are analogous to tests of means (the 50% proportion).

Chi-squared tests use the same calculations and the same probability distribution for different applications:

Chi-squared tests for variance are used to determine whether a normal population has a specified variance. The null hypothesis is that it does.
Chi-squared tests of independence are used for deciding whether two variables are associated or are independent. The variables are categorical rather than numeric. It can be used to decide whether left-handedness is correlated with height (or not). The null hypothesis is that the variables are independent. The numbers used in the calculation are the observed and expected frequencies of occurrence (from contingency tables).
Chi-squared goodness of fit tests are used to determine the adequacy of curves fit to data. The null hypothesis is that the curve fit is adequate. It is common to determine curve shapes to minimize the mean square error, so it is appropriate that the goodness-of-fit calculation sums the squared errors.

F-tests (analysis of variance, ANOVA) are commonly used when deciding whether groupings of data by category are meaningful. If the variance of test scores of the left-handed in a class is much smaller than the variance of the whole class, then it may be useful to study lefties as a group. The null hypothesis is that two variances are the same – so the proposed grouping is not meaningful.

one-sample

One-proportion z-test (One-sample z test for a proportion)
One-sample t-test (One-sample t test for a mean)

two-sample

Two-proportion z-test, Two-proportion z interval
Two-sample t-test, Two sample t interval

Inference for categorical data(Chi-square tests)

Chi-square goodness-of-fit tests
Chi-square tests for relationships

Regression

Linear Regression

Instruction

Python Virtual Environment

Under the root directory of statistics-and-probability:

pipenv shell

Jupyter

Jupyter Notebook Kernels: How to Add, Change, Remove

After activate pipenv environment, you can add kernel to your Jupyter:

ipython kernel install --name "statistics-and-probability" --user

If you don't have jupyter, run pipenv install jupyter jupyterlab --dev.

List your kernels:

jupyter-kernelspec list

If your Jupyter was installed under a specific virtual environment, you need to run the above list command under this env.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.ipynb_checkpoints		.ipynb_checkpoints
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
binomial.ipynb		binomial.ipynb
chi-square_test.ipynb		chi-square_test.ipynb
confidence_intervals.ipynb		confidence_intervals.ipynb
counting_permutations_and_combinations.ipynb		counting_permutations_and_combinations.ipynb
cumulative_histogram.html		cumulative_histogram.html
descriptive_analytics.ipynb		descriptive_analytics.ipynb
hypothesis_testing.ipynb		hypothesis_testing.ipynb
linear_regression.ipynb		linear_regression.ipynb
modeling_data_distributions.ipynb		modeling_data_distributions.ipynb
normal.ipynb		normal.ipynb
one-sample.ipynb		one-sample.ipynb
practice.ipynb		practice.ipynb
probability.ipynb		probability.ipynb
random_variables.ipynb		random_variables.ipynb
smapling_distributions.ipynb		smapling_distributions.ipynb
study_design.ipynb		study_design.ipynb
test.ipynb		test.ipynb
two-sample.ipynb		two-sample.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistics and Probability

Basic

Descriptive Analytics

Probability

Probability

Counting Permutations and Combinations

Random Variables

Binomial Random Variables

Normal Random Variables

Sampling Distributions

Statistical Study

Study Design

Confidence Intervals

Hypothesis Testing

Common test statistics

one-sample

two-sample

Inference for categorical data(Chi-square tests)

Regression

Linear Regression

Instruction

Python Virtual Environment

Jupyter

About

Releases

Packages

Contributors 2

Languages

License

ZacksAmber/statistics-and-probability

Folders and files

Latest commit

History

Repository files navigation

Statistics and Probability

Basic

Probability

Statistical Study

Common test statistics

Regression

Instruction

Python Virtual Environment

Jupyter

About

Resources

License

Stars

Watchers

Forks

Languages