ES218/bivariate_preamble.qmd at gh-pages · mgimond/ES218 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
title: "Bivariate EDA Preamble"
format:
  html:
    number-sections: false
---


## What You Will Learn in This Section

This section builds upon the univariate foundations of Exploratory Data Analysis (EDA) by introducing techniques for analyzing relationships between two continuous variables. While univariate analysis focuses on understanding the distribution of a single variable, bivariate analysis seeks to uncover patterns, associations, and dependencies between variables. The goal is not to confirm hypotheses, but to develop a visual and conceptual toolkit for modeling, diagnosing, and refining relationships in two-dimensional data.

---

### **Chapter 26: Fitting and Exploring Bivariate Models**

This chapter introduces the foundational tools for exploring relationships between two variables. You’ll learn how to use **scatter plots** to visualize associations and how to fit **polynomial models** (e.g., linear, quadratic) to capture trends. The chapter also introduces the concept of **residuals** as a way to assess model fit.

### **Chapter 27: Non-parametric Bivariate Modeling with Loess**

When the form of the relationship between variables is unknown or nonlinear, **loess smoothing** offers a flexible, data-driven alternative to polynomial models. This chapter explains how loess works, how to tune its parameters (span and degree), and when to use it.

### **Chapter 28: Model Residuals**

Residuals are central to understanding how well a model captures the structure in the data. This chapter introduces **residual-dependence** and **residual-fit** plots, which help diagnose model inadequacies such as curvature or nonlinearity.

###  **Chapter 29: Exploring Spread in the Residuals**
This chapter focuses on **heteroscedasticity**—situations where the spread of residuals changes across the range of the independent variable. You’ll learn to use **spread-location** and **spread-dependence** plots to detect and address unequal spread.

###  **Chapter 30: Visualizing Variability Decomposition in Bivariate Models**

Here, we introduce the **variability decomposition (VD) plot**, which visually separates the variability explained by the model from the residual variability. This helps assess the **predictive power** of a model.

###  **Chapter 31: Bivariate Residual-Fit Spread Plot **

The **residual-fit spread (RFS) plot** offers a quantile-based comparison of fitted values and residuals. It complements the VD plot by providing a more detailed view of model performance.

### **Chapter 32: Parameterizing the Residuals**

This chapter explores how to **characterize residuals** using statistical distributions, particularly the **Normal distribution**. You’ll learn how to use **Q-Q plots** to assess normality and what to do when residuals deviate from Normality.

### **Chapter 33: Refining Bivariate Models Through Re-expression**

When model assumptions are violated, **re-expressing** one or both variables can improve model fit and stabilize residual spread. This chapter walks through a real-world example of iterative model refinement using power transformations.

### **Chapter 34: Robust Regression: Resistant Lines and Beyond**

Outliers can distort traditional regression models. This chapter introduces **robust regression techniques**, including **Tukey’s resistant line** and **bisquare regression**, which reduce the influence of extreme values.

### **Chapter 35: Slicing Data: Exploring Discontinuities and Local Models**

Sometimes, a single global model is insufficient. This chapter introduces **data slicing** and **local modeling** to uncover **structural breaks** or **regime shifts** in the data.

### **Chapter 36: A Deeper Exploration of Residuals: Revealing Hidden Structure**

Residuals can reveal more than just model misfit—they can uncover **layered patterns** such as trends, seasonality, and anomalies. This chapter demonstrates how to iteratively model and analyze residuals to uncover hidden structure in real-world data.

-----

##  The Big Picture

Together, the chapters in this section form a coherent arc for understanding and modeling relationships between two continuous variables:

+ **Visualize** the relationship between variables using scatter plots and fitted curves.
+ **Model** the association using parametric (e.g., polynomial) and non-parametric (e.g., loess) techniques.
+ **Diagnose** model fit using residual plots to uncover curvature, nonlinearity, and heteroscedasticity.
+ **Quantify** explained versus unexplained variation using variability decomposition and residual-fit spread plots.
+ **Re-express** variables to stabilize spread, improve fit, and meet model assumptions.
+ **Refine models** using robust regression techniques to mitigate the influence of outliers.
+ **Explore** local structure and discontinuities by slicing data and fitting segmented models.
+ **Iterate** through layers of residual analysis to reveal hidden structure and deepen understanding.

This sequence equips you with a flexible and visual toolkit for uncovering patterns, refining models, and interpreting complex relationships in bivariate data.