Variables
- Numerical
- Continuous (Value)
- Discrete (Counts)
- Categorical
- Regular (Gender)
- Ordinal (Rating)
- Relationship
- Associated
- Independent
Charts
- Histograms denote the frequency distribution of a single binned numerical variable
- Relationship between 2 numerical variables will be a scatterplot
- Relationship between 2 categorical variables will be a bar chart
- Relationship between a categorical and numerical variable can be side-by-side box plots, that explore spread of the numerical variable per category but show them side-by-side for categorical comparison
Measures of center - Mean, Median Measures of spread - Standard Deviation, Interquartile Range
Ideal Distributions we want to work with
- Normal & T for numerical variables
- Chi-Square for categorical variables
- Binomial
https://www.youtube.com/watch?v=rzFX5NWojp0&
Working with a normal curve
- Z scores - normalized score defined in units of standard deviation. Z score of an observation in the normal curve is the difference of that observation and the mean divided by standard deviation. Probability of getting an observation above/below the point at which z score was calculated can then be calculated. https://www.youtube.com/watch?v=2tuBREK_mgE&
When we don’t get these distributions naturally, then we can get them by taking multiple samples and then preparing a distribution from the mean and deviation of those samples, this uses central limit theorem. It is recommended to have atleast 30 independent observations in each sample.