Probability Distributions and Summary Stats

Covered in 9/12 lecture

Conditional probability: $P (A ∣ B) = \frac{P ( A \cap B )}{P ( B )}$

Bayes rule: $P (A ∣ B) = \frac{P ( B ∣ A ) \cdot P ( A )}{P ( B )}$

Expected value: $E [X] = \sum (x_{i} \cdot p (x_{i}))$ where $x_{i}$ is the values that $X$ takes and $p (x_{i})$ is the probability that $X$ takes the value $x_{i}$ .

Distributions

A distribution is a statistical function that gives the probability of a given outcome from an experiment
Continuous, but large enough sample size converges to the right thing
Distributions are like histograms

Types:

Uniform distribution
- Just a horizontal line
Normal distribution (Gaussian)
- Extremely common, often assume normal distribution unless reason to think otherwise
- Continuous
Poisson distribution
- Sums to 1, scrunched up on one side
- Given the rate of some event occurring, poisson distribution tells you probability of the number of occurrences over a time period
- Discrete
Zero-inflated poisson distribution
- Poisson distribution with a spike at 0
Bernoulli distribution
- When you just have one thing with that has a set probability of happening
- Example: Flipping a coin once
Binomial distribution
- The probability of getting a set of outcomes from a set of Bernoulli Trials
- Discrete
- Looks like a normal distribution (if you have enough trials)
Power law distribution
- e.g. number of close friends

Central Limit Theorem

Central Limit Theorem
If you have a distribution, such as rolling dice (uniform distribution), and you repeatedly sample that and make a new distribution from the means of those samples, then that new distribution will eventually approach a normal distribution with a sufficiently large sample size.

Useful because if you have two funky-looking distributions, you can sample them a bunch of times, and now you have two nice normal distributions that you can compare.

Criteria for samples:

Picked at random

Representative of population

Big enough to draw conclusions (>=30)

Include less than 10% of the population, if you’re sampling without replacement

Link to original

Summary Statistics

Examples: Mean, median, mode
Cannot rely solely on summary statistics, need to first understand your data holistically

Measures of central tendency:

The Pythagorean means
- Arithmetic mean - Very sensitive to outliers
- Geometric mean - A measure of central tendency less sensitive to outliers
- Harmonic mean - Primarily used for rates (we won’t use it)
Median
Mode

Measures of variance:

Variance: $s^{2}$
Standard deviation: $s = \frac{\sum ( x _{i} - x ˉ ) ^{2}}{n - 1}$

Other descriptors:

Skew (left- or right-tailed): Whether distribution is shifted to left or right
- Negative skew: Mean is shifted to right
- Positive skew: Mean is shifted to left
Modality: How the modes are distributed (e.g. unimodal, bimodial, etc.)

CMSC320 Notes

Explorer

Probability Distributions and Summary Stats

Distributions

Central Limit Theorem

Central Limit Theorem

Summary Statistics

Graph View

Table of Contents

Backlinks