Skip to content

ranja-sarkar/stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 

Repository files navigation

stats

Accuracy and precision cannot be used interchangeably, the former being true to intention (degree of closeness of measured value to true value) while the latter is true to itself (degree of closeness of repeated measured values)

Probability and likelihood are different terms; the former is finding the chance of outcomes given a data distribution, the latter is finding the most likely distribution given the outcomes.

DESCRIPTIVE STATISTICS: For inference of the smaller sample data

INFERENTIAL STATISTICS: For inference of the larger population

Depending on your goal and the datatype (parametric or non-parametric), you select a test.

If the goal is to quantify an association between two groups, we check Pearson correlation for parametric data, Spearman correlation for non-parametric data. If the goal is to predict a target from one or more variables, we perform simple regression (two variables) and multiple regression (more than two variables) for parametric data. If we have to compare unpaired (independent) groups, we perform unpaired T-test (or one-way ANOVA for 2+ groups) for parametric data, and Mann-Whitney test (2 groups) for non-parametric data.

Parametric test:-

Assumption: Data has normal distribution

image

Non-parametric test:-

No assumption

image

HYPOTHESIS TESTS: Depending on datatypes and data sample, hypothesis testing is carried out.

0

There's a data classification based on privacy, security, risk management and regulatory compliance: public, confidential, restricted and internal.

For more: https://en.wikipedia.org/wiki/F-test https://en.wikipedia.org/wiki/Analysis_of_variance

MEASURES OF CENTRAL TENDENCY data

image

Mode: Number that occurs most often in a dataset.

Median: Middle number/value when a dataset is ordered from least to greatest.

image

image

A violin plot shows the shape (density distribution) of data which boxplot does not, and it must be used to explore skewed data.

vp

There are power transformations that variables need to undergo if they follow either right-skewed or left-skewed distributions.

MEASURES OF DISPERSION: Range, quartile deviation and interquartile range (quartile deviation is half of the interquartile range), variance, standard deviation

image

image

image

image

Statistical models:-

Discriminative (L) and Generative (R) (non-conditional)

mod

About

Basic Statistics for Data Sciences

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published