Covered in 9/28 lecture

Data Types

Exploratory Data Analysis (EDA)

Fardina’s Colab

EDA is about looking at and understanding your data before doing stuff with it so that you can:

  • Discover patterns
  • Spot anomalies
  • Test hypotheses
  • Check assumptions using summary statistics and graphical representations

Possible steps:

  • First, you want to see how variables are distributed
    • Can use box plots, histograms, etc.
    • hist() can generate a histogram of a Series
    • Data Visualization
  • A scatterplot can help by:
    • Visualizing relationships
    • Showing outliers
  • Correlation analysis (using Pearson’s Correlation Coefficient)
  • Use statistical tests to check if two datasets with different averages come from the same distribution (Hypothesis and Inference)

Useful functions for transforming DataFrames: Pandas

Checking Seasonality

With all temporal data, always check for seasonal patterns (yearly, weekly, monthly)

Identifying Errors

Look for: