Spatial/13-Spatial-Autocorrelation.Rmd at main · mgimond/Spatial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
# Spatial Autocorrelation {#chp13_0}

## Introduction

In the previous chapter, we explored how polynomial functions can be used to model spatial trends that capture broad, location-driven variation in continuous spatial fields. By fitting and removing global trends, we revealed residuals that may still exhibit spatial structure. This chapter builds on that foundation by introducing **spatial autocorrelation**, a second-order property that describes how values at one location relate to values at nearby locations.

This concept is rooted in **Tobler’s First Law of Geography** which states:


```{block type="note", eval = FALSE}
_"The first law of geography: Everything is related to everything else, but near things are more related than distant things."_ Waldo R. Tobler [@Tobler1970]
```

This principle underpins much of spatial analysis. It suggests that spatial data often exhibit **non-random patterns**, where nearby features tend to have similar attribute values. Spatial autocorrelation provides a formal way to quantify this tendency.


```{r f13-random-maps, echo=FALSE}
knitr::include_graphics("img/Random_maps.png")
```

To make the mathematical foundations of autocorrelation more accessible, we shift from the raster-based data used in the previous chapter to a simpler dataset composed of **polygonal areal units** (e.g., counties). This allows us to clearly illustrate how spatial relationships are defined, how spatial weights are constructed, and how measures like **Moran’s I** are computed.

This chapter focuses on:

- Defining spatial autocorrelation and its role in spatial statistics.
- Constructing spatial weights to represent neighborhood relationships.
- Computing and interpreting **global** and **local Moran’s I** statistics.
- Using permutation-based hypothesis testing to assess significance.

## Global Moran's *I*

While maps can sometimes reveal clusters of similar values, our visual interpretation is often limited, especially when patterns are subtle or complex. To move beyond subjective impressions, we need a **quantitative and objective measure** of spatial patterning. Specifically, we want to quantify the degree to which similar attribute values are clustered or dispersed across space.

One widely used statistic for this purpose is **Moran’s I** , a measure of **global spatial autocorrelation**. It quantifies the overall tendency for features with similar values to be located near one another, based on a defined spatial relationship (e.g., contiguity or distance). In essence, Moran’s *I* is a correlation coefficient between a variable and its spatially lagged counterpart.

### Computing the Moran's *I*

Let's start with a working example: 2020 median per capita income for the state of Maine.

```{r fig03, fig.height=3, fig.cap="Map of 2020 median per capita income for Maine counties (USA).", echo = FALSE, message=FALSE, fig.align='center'}
library(sf)
library(ggplot2)
library(classInt)
library(tukeyedar)
library(spdep)
options(scipen = 9999)

z <- gzcon(url("https://github.com/mgimond/Spatial/raw/main/Data/Income_schooling_sf.rds"))
s <- terra::unwrap(readRDS(z))

clint <- classIntervals(s$Income, style = "pretty", n = 5)$brks
ggplot(s, aes(fill=Income)) + geom_sf() +
  theme_void() +
    scale_fill_stepsn(colors = c("#D9EF8B",  "darkgreen") ,

                    breaks = clint[2:(length(clint)-1) ],
                    values = scales::rescale(clint[2:(length(clint)-1) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = "Income ($)",
                                              barheight = unit(2.0, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))
```


It may seem apparent that, when aggregated at the county level, the income distribution appears clustered with high counties surrounded by high counties and low counties surrounded by low counties (note that we are ignoring potential first-order trends here for pedagogical clarity; a spatial trend model could have been discerned). But a qualitative description may not be sufficient; we might want to quantify the degree to which similar (or dissimilar) counties are clustered. One measure of this type of relationship is the **Moran's I statistic**.

The Moran’s *I* statistic is a measure of spatial autocorrelation that quantifies the degree to which similar values (like income) cluster together in space. It is computed as the correlation between a variable and its spatially lagged counterpart, based on a defined spatial weights matrix.

But before we go about computing this correlation, we need to come up with a way to define a neighbor. One approach is to define a neighbor as being any contiguous polygon. For example, the northern most county (Aroostook), has four contiguous neighbors while the southern most county (York) has just two contiguous counties. Other neighborhood definitions can include distance bands (e.g. counties within 100 km) and k nearest neighbors (e.g. the 2 closest neighbors). Note that distance bands and k nearest neighbors are usually measured using the polygon's centroids and not their boundaries.

```{r fig04, echo=FALSE, fig.cap="Maps show the links between each polygon and their respective neighbor(s) based on the neighborhood definition. A contiguous neighbor is defined as one that shares a boundary or a vertex with the polygon of interest.  Orange numbers indicate the number of neighbors for each polygon. Note that the top most county has no neighbors when a neighborhood definition of a 100 km distance band is used (i.e. no centroids are within a 100 km search radius)", out.width=600, fig.align='center'}
knitr::include_graphics("img/Diff_neighbors.png")
```

Once we've defined a neighborhood for our analysis, we identify the neighbors for each polygon in our dataset then summaries the values for each neighborhood cluster (by computing their mean values, for example). This summarized neighborhood value is sometimes referred to as a spatially lagged value (*X~lag~*). In our working example, we adopt a contiguity neighborhood and compute the average neighboring income value (*Income~lag~*) for each county in our dataset. We then plot *Income~lag~* against *Income* for each county. The Moran's *I* coefficient between *Income~lag~* and *Income* is nothing more than the slope of the least squares regression line that best fits the points *after* standardizing both variables (i.e., converting them to z-scores), which centers them on the mean and scales them by their standard deviation.


```{r fig05, fig.height=4, fig.width = 3.5, echo = FALSE, fig.cap="Scatter plot of spatially lagged income (neighboring income) vs. each countie's income. If we equalize the spread between both axes (i.e. convert to a z-value) the slope of the regression line represents the Moran's *I* statistic.", fig.align='center', results='hide'}

nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
I <- round(moran(s$Income, lw, length(nb), Szero(lw))$I,2)

inc.lag <- lag.listw(lw, s$Income)
dat1 <- data.frame(lag = inc.lag, income = s$Income)


OP <- par( cex = 0.75, tck = -0.01)
 eda_lm(dat1, income, lag, sd= FALSE, xlab = "Income", ylab=" Lagged income", show.par = FALSE)
par(OP)
```

If there is no degree of association between *Income* and *Income~lag~*, the slope will be close to flat (resulting in a Moran's *I* value near 0). In our working example, the slope is far from flat with a Moran's *I* value of 0.28. So this raises the question: how significant is this Moran's *I* value (i.e. is the computed slope significantly different from 0)? There are two approaches to estimating the significance: an analytical solution and a Monte Carlo solution. The analytical solution makes some restrictive assumptions about the data and thus cannot always be reliable. The other approach (and the one favored here) is a Monte Carlo test. This approach avoids parametric assumptions about the data distribution or spatial layout, relying instead on the assumption that values are exchangeable under the null hypothesis.

### Monte Carlo approach to estimating significance


In a Monte Carlo test (specifically, a permutation-based bootstrap test) we assess the significance of an observed Moran's *I* statistic by comparing it to a distribution of values generated under the null hypothesis of spatial randomness. This is done by **permuting** the attribute values across spatial units while keeping the spatial structure (e.g., neighborhood relationships) fixed.

For each permutation, the attribute values are shuffled among the polygons and a new Moran's *I* value is computed.


```{r fig06, fig.height=4.1, fig.width = 4, fig.cap="Results from 199 permutations. Plot shows Moran's *I* lines (in gray) computed from each random permutation of income values. The observed Moran's *I* line for the original dataset is shown in red.", fig.align='center', echo = FALSE, results='hide'}

OP <- par( cex = 0.75, tck = -0.01)
sim <- vector()
 eda_lm(dat1, income, lag, sd= FALSE, xlab = "Income", ylab=" Lagged income", size = 0, show.par = FALSE)
 set.seed(323)
 for (i in 1:199){
   s$rand <- sample(s$Income)
   s$rand.lag <- lag.listw(lw, s$rand)
   M <- lm(rand.lag ~ rand,s)
   abline( M , col=rgb(0,0,0,0.1))
   sim[i] <- coef(M)[2]
 }
par(OP)
```

Repeating this process many times yields a **sampling distribution** of Moran's *I* values that would be expected if the observed values were randomly distributed across space.

```{r fig07, fig.height=3, fig.width=3,  echo=FALSE, fig.cap="Histogram shows the distribution of Moran's *I* values for all 199 permutations; red vertical line shows our observed Moran's *I* value of 0.28.", fig.align='center'}

OP <- par( cex = 0.75, tck = -0.01, mgp=c(1.7,0.8,0))
 hist(sim, main = NULL, xlab=("Simulated I"),  breaks = 10)
 abline(v=I, col = "red")
par(OP)
```

In our working example, 199 simulations indicate that our observed Moran's *I* value of `r I` is not a value we would expect to compute if the income values were randomly distributed across each county. A pseudo *p-value* can easily be computed from the simulation results:

$$
\dfrac{N_{extreme}+1}{N+1}
$$

where $N_{extreme}$ is the number of simulated Moran's *I* values more extreme than our observed statistic and $N$ is the total number of simulations. Here, out of 199 simulations, just three simulated *I* values were more extreme than our observed statistic, $N_{extreme}$ = 3, so $p$ is equal to (3 + 1) / (199 + 1) = 0.02. Given the small probability that the null hypothesis, H~o~, could have generated an outcome as clustered as ours, we would not be wrong in rejecting H~o~. Of course, had a random process truly been at play, then our observed pattern would have been a rare realization of a random process.

Note that in this permutation example, we shuffled the observed income values among polygons without replacement--this is referred to as a **permutation-based randomization**. This should not be confused with a **distribution-based simulation** where new values are randomly generated from a theoretical distribution and assigned to features. In such a scenario, one chooses to randomly assign a set of values to each feature in a data layer from a *theorized distribution* (for example, a *Normal* distribution). This may result in a completely different set of values for each permutation outcome. Note that you would only adopt this approach if the theorized distribution underpinning the value of interest is known *a priori*.

Another important consideration when computing a permutation-based p-value from a permutation test is the number of simulations to perform. In the above example we ran 199 permutations, thus, the smallest p-value we could possibly come up with is 1 / (199 + 1) or a p-value of 0.005. You should  therefore chose a number of permutations, $N$, large enough to allow for finer resolution of p-values and more robust inference.

## Moran's *I* at different distance bands

So far we have looked at spatial autocorrelation where we define neighbors as all polygons sharing a boundary with the polygon of interest. We may also be interested in studying the *ranges* of autocorrelation values as a function of distance. The steps for this type of analysis are straightforward:

1.  Define a neighborhood structure based on a specific distance band

2.  Compute lagged values for the defined set of neighbors.

3.  Calculate the Moran's *I* value for the defined neighborhood.

4.  Repeat for additional distance bands to observe how autocorrelation changes with spatial scale.

For example, the Moran's *I* values for income distribution in the state of Maine at distances of 75, 125, up to 325 km are presented in the following plot:

```{r f13-MC-distance, echo=FALSE, fig.cap="Moran's *I* at different spatial lags defined by a 50 km width annulus at 50 km distance increments. Red dots indicate Moran *I* values for which a P-value was 0.05 or less.", out.width=500, fig.align='center'}
knitr::include_graphics("img/MoranI_distance_band.png")
```

The plot suggests that there is significant spatial autocorrelation between counties within 75 km of one another, but as the distances between counties increases, autocorrelation shifts from positive to negative, indicating that nearby counties tend to have similar income levels, while counties farther apart tend to be more dissimilar.

## Local Moran's *I*

We can decompose the global Moran's *I* into a *localized* measure of autocorrelation--i.e. a map of "hot spots" and "cold spots".

A local Moran's *I* analysis is best suited for relatively large datasets, especially if a hypothesis test is to be implemented. We'll therefore switch to another dataset: Massachusetts (USA) household income data aggregated at the U.S. Census' county subdivision level (as with the Maine dataset,  we ignore potential first-order trends here for pedagogical clarity).

```{r ma-dataset, fig.width = 5, fig.height=2.5, echo =FALSE, fig.cap="Map of 2020 median household income for Massachusetts county subdivisions.", echo = FALSE, message=FALSE, fig.align='center'}
library(sf)
library(ggplot2)
library(classInt)
library(tukeyedar)
library(spdep)

z <- gzcon(url("https://github.com/mgimond/Spatial/raw/main/Data//ma2.rds"))
ma <- terra::unwrap(readRDS(z))

clint <- classIntervals(ma$house_inc, style = "pretty", n = 6)$brks
ggplot(ma, aes(fill=house_inc)) + geom_sf(col = NA) +
  theme_void() +
    scale_fill_stepsn(colors = c("#D9EF8B",  "darkgreen") ,

                    breaks = clint[2:(length(clint)-1) ],
                    values = scales::rescale(clint[2:(length(clint)-1) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = "Income ($)",
                                              barheight = unit(1.5, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))


```

Applying a contiguity based definition of a neighbor, we get the following scatter plot of spatially lagged income vs. income.

```{r chunk10, fig.width = 4, fig.height=4, echo = FALSE, fig.cap="Grey vertical and horizontal lines define the mean values for both axes values. Red points highlight counties with relatively high income values (i.e. greater than the mean) surrounded by counties whose average income value is relatively high. Likewise, dark blue points highlight counties with relatively low income values surrounded by counties whose average income value is relatively low.", fig.align='center', results='hide'}

nb <- poly2nb(ma, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)

inc.lag <- lag.listw(lw, ma$house_inc)
dat1 <- data.frame(lag = inc.lag, income = ma$house_inc)


M <- localmoran(ma$house_inc, lw)
M.df <- data.frame(attr(M, "quadr"))

OP <- par( cex = 0.75, tck = -0.01)
 eda_lm(dat1, income, lag, sd = FALSE, show.par = FALSE)
 palette(c("blue", "lightblue", "darkgoldenrod2", "red"))
 points(ma$house_inc, inc.lag, pch = 16, col = M.df$mean)
par(OP)

```

Note that the mean value for *Income*, highlighted as light grey vertical and horizontal lines in the above plot, carve up the four quadrant defining the low-low, high-low, high-high and low-high quadrants when starting from the bottom-left quadrant and working counterclockwise. Note that other measures of centrality, such as the median, could be used to delineate these quadrants.

The values in the above scatter plot can be mapped to each polygon in the dataset as shown in the following figure.

```{r fig08,  echo=FALSE,fig.width = 4.5, fig.height=2.5, fig.cap="A map view of the low-low (blue), high-low (light-blue), high-high (red) and low-high (orange) counties. ", fig.align='center'}

OP <- par(mar=c(0,0,0,0))
plot(st_geometry(ma), col = M.df$mean, border="white")
par(OP)
```

Each spatial unit (e.g., polygon) has its own Local Moran’s I statistic, denoted $I_i$, which quantifies the degree of spatial autocorrelation around that unit. The calculation of $I_i$ is shown later in the chapter.

While the scatterplot helps visually identify potential clusters, statistical significance must be assessed to determine whether these patterns are likely to have occurred by chance.

As with the global Moran's *I*, there is both an analytical and Monte Carlo approach to computing significance of $I_i$. In the case of a Monte Carlo approach, the values of all other units are shuffled while keeping the value of unit $i$ fixed. For each permutation, a new $I_i$ is computed based on the randomized neighborhood values. For each permutation, we compare the value at $y_i$ to the average value of its neighboring values. From the permutations, we generate a distribution of $I_i$ values (for each $y_i$ feature) we would expect to get if the values were randomly distributed across all features.

To illustrate, consider a polygon in eastern Massachusetts with a high observed $I_i$ value. We assess its significance by comparing it to a distribution of $I_i$ values generated through permutation.

```{r fig09, echo=FALSE, fig.width = 3.5, fig.height=3.5, fig.cap="Polygon whose *significance* value we are asessing in  this example.", fig.align='center'}
col1 <- M.df$mean
levels(col1) <-  c("blue", "lightblue", "darkgoldenrod2", "red")
ext1 <- st_bbox( st_buffer(ma[1,], 20000))
OP <- par(mar= c(0,0,0,0))
plot(st_geometry(ma), col = adjustcolor(col1, alpha.f = 0.1), border="white",  extent = ext1)
plot(st_geometry(ma[1,]), col = col1[1], add = TRUE)
par(OP)
I_i <- M[[1]]
```

Its local Moran's *I* statistic is `r round(I_i,2)`. A permutation test shuffles the income values around it, all the while keeping its value constant. An example of an outcome of a few permutations follows:

```{r fig10, echo=FALSE, fig.width = 9, fig.height=2, fig.cap="Local Moran's I outcomes of a few permutations of income values.", fig.align='center', results='hide'}

fun1 <- function(){
  Msim <- localmoran(c(ma$house_inc[1], sample(ma$house_inc[-1])), lw)
  Isim <- attr(Msim, "quadr")$mean
  levels(Isim) <-  c("blue", "lightblue", "darkgoldenrod2", "red")
  Isim }

fun2 <- function(){
  col1 <- fun1()
  plot(st_geometry(ma), col = adjustcolor(col1, alpha.f = 0.1), border="white",  extent = ext1)
  plot(st_geometry(ma[1,]), col = col1[1], add = TRUE)
}

set.seed(319)
OP <- par(mar= c(0,0,0,0), mfrow=c(1,5))
 sapply(1:5, FUN = \(x)fun2())
par(OP)
```

You'll note that even though the income value remains the same in the polygon of interest, its local Moran's *I* statistic will change because of the changing income values in its surrounding polygons.

If we perform many more permutations, we come up with a distribution of $I_i$ values under the null that the income values are randomly distributed across the state of Massachusetts. The distribution of $I_i$ for the above example is plotted using a histogram.

```{r fig11, echo=FALSE, fig.width = 3, fig.height=2.5, fig.cap="Distribution of $I_i$ values under the null hypothesis that income values are randomly distributed across the study extent. The red vertical line shows the observed $I_i$ for comparison.", fig.align='center', results='hide'}

n <- 999
fun3 <- function(){localmoran(c(ma$house_inc[1], sample(ma$house_inc[-1])), lw)[1,1]}
I_null <- sapply(1:n, FUN = \(x) fun3() )
OP <- par(mar=c(4,3,0,0))
 hist(I_null, breaks = 20, xlab="I(i)", main=NULL)
 abline(v = I_i, col = "red", lw=2, xlab = "I(i)")
par(OP)
N.greater <- sum(I_null > I_i)
p <- min(N.greater + 1, n + 1 - N.greater) / (n + 1)
```

About `r round(p * 100, 1)`% of the simulated values are more extreme than our observed $I_i$ giving us a permutation-based p-value of `r round(p,2)`.

If we perform this permutation for all polygons in our dataset, we can map the permutation-based p-values for each polygon. Note that here, we are mapping the probability that the observed $I_i$ value is more extreme than expected (equivalent to a one-tail test).

```{r  fig12, echo=FALSE,fig.width = 5, fig.height=2.5, fig.cap="Map of the permutation-based p-values for each polygons' $I_i$ statistic.", fig.align='center'}

M <- localmoran_perm(ma$house_inc, lw, nsim = 29999)
ma$p <- data.frame(M)$Pr.folded..Sim

clint <- classIntervals(ma$p, style = "fixed", n = 7,
                        fixedBreaks = c(0,0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5))$brks
ggplot(ma, aes(fill=p)) + geom_sf() +
  theme_void() +
   # scale_fill_stepsn(colors = c("#555555", "#DDDDDD" ) ,
     scale_fill_stepsn(colors = c("#882020", "#ffdddd" ) ,
                    breaks = clint[2:(length(clint)-1) ],
                    values = scales::rescale(clint[2:(length(clint)-1) ], c(0,1)),
                    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              title = "P-value",
                                              barheight = unit(1.7, "in"),
                                              barwidth = unit(0.15, "in"),
                                              label.position = "left"))

```

One can use the computed permutation-based p-values to filter the $I_i$ values based on a desired level of significance. For example, the following scatter plot and map shows the high/low "hotspots" for which a permutation-based p-value of 0.05 or less was computed from the above simulation.

```{r  fig13, echo=FALSE, fig.width =9, fig.height=4, fig.cap="Local Moran's *I* values having a signifcance level of 0.05 or less.", fig.align='center', results = "hide"}

sig <- ma$p <= 0.05

OP2 <- par(mfrow = c(1,2), cex = 0.75, tck = -0.01)
 eda_lm(dat1, income, lag, sd = FALSE, show.par = FALSE)
 palette(c("blue", "lightblue", "darkgoldenrod2", "red"))
 points(ma$house_inc[sig], inc.lag[sig], pch = 16, col = M.df$mean[sig])

 OP <- par(mar = c(0,0,0,0))
  plot(st_geometry(ma), col = "#EDEDED", border = "white")
  plot(st_geometry(ma[sig,]), col = M.df$mean[sig], border = "white", add = TRUE)
 par(OP)
par(OP2)
```

You'll note that the levels of significance do not apply to just the high-high and low-low regions, they can apply to *all* combinations of highs and lows.

Another example, where the $I_i$ values are filtered based on a more stringent significance level of 0.01, is shown in the following figure.

```{r  fig14, echo=FALSE, fig.width =9, fig.height=4, fig.cap="Local Moran's *I* values having a signifcance level of 0.01 or less.", fig.align='center', results='hide'}

sig <- ma$p <= 0.01
OP2 <- par(mfrow = c(1,2), cex = 0.75, tck = -0.01)
 eda_lm(dat1, income, lag, sd = FALSE, show.par = FALSE)
 palette(c("blue", "lightblue", "darkgoldenrod2", "red"))
 points(ma$house_inc[sig], inc.lag[sig], pch = 16, col = M.df$mean[sig])

 OP <- par(mar = c(0,0,0,0))
  plot(st_geometry(ma), col = "#EDEDED", border = "white")
  plot(st_geometry(ma[sig,]), col = M.df$mean[sig], border = "white", add = TRUE)
 par(OP)
par(OP2)
```

### A note about interpreting the permutation-based p-value

While the permutation technique highlighted in the previous section used in assessing the significance of localized spatial autocorrelation does a good job in circumventing limitations found in a parametric approach to inference (e.g. skewness in the data, irregular grids, etc...), it still has some limitations when one interprets this level of significance across *all* features. For example, if the underlying process that generated the distribution of income values was truly random, the outcome of such a realization could look like this:

```{r fig14a, echo=FALSE, fig.width=8, fig.height=3,  fig.cap="Realization of a random process. Left map is the distribution of income values under a random process. Right map is the outcome of a permutation test showing the computed permutation-based p-values. Each permutation test runs 9999 permutations."}
# Here I use rgeoda which is faster than spdep for permutations
library(rgeoda)
# P values used in custom function
p.breaks <- c(0, 0.001,0.01,0.05, 0.1, 0.51)
p.labels <-  c(0.001,0.01,0.05, 0.1,">0.1")

# Custom function
plot_clust <- function(poly, rnd, cluster, lisa, p = FALSE){

  clint <- classIntervals(ma[[rnd]], style = "equal", n = 6)$brks
  rnd.col <- as.character(cut(ma[[rnd]], breaks = clint, labels=pal.inc))
  poly$cluster <- factor(lisa$labels[lisa$c_vals+1], levels = lisa$labels)
  pval <- cut(lisa$p_vals , breaks = p.breaks,labels = p.labels )
  pval.col <- c( "#CB181D", "#FCAE91", "#FEE5D9" , "#FFF8DC", "#FFFFFF")
  p.col <- pval.col[pval]
  poly.col <- lisa$colors[lisa$c_vals+1]
  #  psub <- poly[rnd]

  # Plot clusters
  OP<- par(no.readonly = TRUE, mar=c(0,0,0,0))
  m <- matrix(c(1,3,2,4),nrow = 2, ncol = 2, byrow = TRUE)
  layout(mat = m, heights  = c(0.80,0.2,0.8,0.2))
  #  layout.show(4)

  # Plot values
  OP2 <- par(mar=c(0,0,0,0))
  plot(st_geometry(poly), col = rnd.col, reset = TRUE,
       border = "#bbbbbb")
  mtext("Random process", padj=3, adj=0.3, col = "#707070")
  par(OP2)

  OP2 <- par(mgp = c(0.5,0.5,0) )
.image_scale(3, breaks = clint, col =  pal.inc,
             key.length = 0.7, key.pos = 1, key.width = lcm(1),
             at = round(clint,0))
  par(OP2)

  if(p == FALSE){
    # Plot clusters
    OP2 <- par(mar=c(0,0,0,0))
    plot(st_geometry(poly), col = poly.col,  reset = TRUE,
         border = "#bbbbbb")
    mtext("Clusters", padj=3, adj=0.3, col = "#707070")
    par(OP2)

    # Plot legend
    OP2 <- par(mar=c(0,0,0,0))
    plot(1, type = "n", axes=FALSE, xlab="", ylab="")
    legend('top', legend = lisa$labels[-length(lisa$labels)], horiz = FALSE, ncol = 3,
           text.width = sapply( lisa$labels[-length(lisa$labels)], strwidth), bty = "n",
           fill = lisa$colors, border = "#eeeeee", cex = 0.9)
    par(OP2)
  } else {
    OP2 <- par(mar=c(0,0,0,0))
    plot(st_geometry(poly), col = p.col,  reset = TRUE,
         border = "#bbbbbb")
    mtext("permutation-based p-values", padj=3, adj=0.3, col = "#707070")
    par(OP2)

    #Plot legend
    OP2 <- par(mar=c(0,0,0,0))
    plot(1, type = "n", axes=FALSE, xlab="", ylab="")
    legend('top', legend = p.labels, horiz = FALSE, ncol = 5,
           text.width = sapply( p.labels, strwidth), bty = "n",
           fill = pval.col, border = "#eeeeee", cex = 1)
    par(OP2)
  }

  par(OP)
}

# Create palettes
pal.inc <- RColorBrewer::brewer.pal(6, "Greens")
clint <- classIntervals(ma$house_inc, style = "quantile", n = 6)$brks

# Get neighbors
queen_w <- queen_weights(ma)

# A random process
set.seed(307)
ma$rnd <- sample(ma$house_inc)
lisar <- local_moran(queen_w, ma["rnd"], permutations = 9999)
plot_clust(poly=ma, rnd = "rnd",lisa=lisar,  p= TRUE)
```

Note how some $I_i$ values are associated with low permutation-based p-value. In fact, one polygon's $I_i$ value is associated with permutation-based p-value of 0.001 or less! This is to be expected since the likelihood of finding a significant $I_i$ increases the more permutation tests are performed. Here we have 343 polygons, thus we ran 343 tests. This is analogous to having multiple opportunities to pick an ace of spades from a deck of cards--the more opportunities you have, the greater the chance of grabbing an ace of spades.

Another example of a local Moran's I test on a realization of a randomly reshuffled set of income values is shown next:

```{r fig14b, echo=FALSE, fig.width=8, fig.height=3,  fig.cap="Another realization of a random process. Left map is the distribution of income values under a random process. Right map is the outcome of a permutation test showing computed permutation-based p-values. "}
set.seed(521)
ma$rnd <- sample(ma$house_inc)
lisar <- local_moran(queen_w, ma["rnd"],permutations = 9999)
plot_clust(poly=ma, rnd = "rnd",lisa=lisar, p= TRUE)
```

We have quite a few $I_i$ values associated with a very low permutation-based p-value (The smallest p-value in this example is `r min(lisar$p_vals)`). This would lead one to assume that several polygons exhibit significant levels of spatial autocorrelation with their neighbors when in fact this is due to chance--a classic example of a Type I error resulting from multiple comparisons.

To further illustrate the issue, consider generating 200 independent realizations of a random spatial process. Each realization involves computing permutation-based p-values for all polygons, resulting in a large number of tests and an increased likelihood of false positives. This generates 200 x 323 permutation-based p-values. With this working example, about 10% of the outcomes result in a permutation-based p-value of 0.05 or less.

```{r fig14c, fig.width=8, fig.height=2, echo=FALSE, fig.cap="Distribution of permutation-based p-values following 200 realizations of a random process. Blue numbers are the p-value bins, red numbers under each bins are calculated percentages."}
p.dist <- vector()
for (i in 1:200){
  ma$rnd <- sample(ma$house_inc)
  lisa2   <- local_moran(queen_w, ma["rnd"], permutations = 9999)
  p.dist <- rbind(p.dist, lisa2$p_vals)
}
#hist(p.dist)
p.sum <- cut(p.dist , breaks = p.breaks,labels = p.labels ) %>% table()
p.tab <- p.sum  / sum(p.sum ) * 100
p.tab.names <- names(p.tab)
names(p.tab) <- rep(" ",5)
OP <- par(col.main = "#707070", oma = c(0,0,0,0), ylbias=2)
mosaicplot(p.tab, main="permutation-based p-values distribution", cex.axis = 0.8, off = 4,
           las=1)
mtext( p.tab.names, side = 3, adj = c(0.01, 0.065, 0.12, 0.21, 0.6) ,
       cex=0.8, col = "blue")
mtext( as.character(round(p.tab, 2)), side = 1, adj = c(0.01, 0.061, 0.12, 0.21, 0.6) ,
       cex=0.8, col = "red")
par(OP)
```

This problem is known in the field of statistics as the **multiple comparison problem**. Several solutions have been offered, each with their pros and cons. One solution that seems to have gained traction as of late is the *False Discoverer Rate* correction (FDR). The FDR correction aims to control the expected proportion of false positives among the set of features deemed significant.

There are at least a couple of different implementation of FDR. One implementation starts by ranking the permutation-based p-values, $p$, from smallest to largest and assigning a rank, $i$, to each value. Next, a reference value is computed for each rank as $i(\alpha/n)$ where $\alpha$ is the desired significance cutoff value (0.05, for example) and $n$ is the total number of observations (e.g. polygons) for which a permutation-based p-value has been computed (343 in our working example). All computed $p_i$ values that are less than $i(\alpha/n)$ are deemed significant at the desired $\alpha$ value.

Applying the FDR correction to the permutation-based p-value cutoff of 0.05 in the household income data gives us the following cluster map.

```{r  fig14d, echo=FALSE, fig.width =9, fig.height=4, fig.cap="Clusters deemed significant at the p=0.05 level when applying the FDR correction.", fig.align='center', results = "hide"}

ord <- order(ma$p)
ma <- ma[ord, ]
inc.lag <- inc.lag[ord]
M.df <- M.df[ord, ]
ma$fdr <- 1:nrow(ma) /nrow(ma) * 0.05
sig <- ma$p < ma$fdr

OP2 <- par(mfrow = c(1,2), cex = 0.75, tck = -0.01)
 eda_lm(dat1, income, lag, sd = FALSE, show.par = FALSE)
 palette(c("blue", "lightblue", "darkgoldenrod2", "red"))
 points(ma$house_inc[sig], inc.lag[sig], pch = 16, col = M.df$mean[sig])

 OP <- par(mar = c(0,0,0,0))
  plot(st_geometry(ma), col = "#EDEDED", border = "white")
  plot(st_geometry(ma[sig,]), col = M.df$mean[sig], border = "white", add = TRUE)
 par(OP)
par(OP2)
```


Applying the FDR correction to the permutation-based p-value cutoff of 0.01 generates even fewer clusters:

```{r  fig14e, echo=FALSE, fig.width =9, fig.height=4, fig.cap="Clusters deemed significant at the p=0.01 level when applying the FDR correction.", fig.align='center', results = "hide"}

ord <- order(ma$p)
ma <- ma[ord, ]
inc.lag <- inc.lag[ord]
M.df <- M.df[ord, ]
ma$fdr <- 1:nrow(ma) /nrow(ma) * 0.01
sig <- ma$p < ma$fdr

OP2 <- par(mfrow = c(1,2), cex = 0.75, tck = -0.01)
 eda_lm(dat1, income, lag, sd = FALSE, show.par = FALSE)
 palette(c("blue", "lightblue", "darkgoldenrod2", "red"))
 points(ma$house_inc[sig], inc.lag[sig], pch = 16, col = M.df$mean[sig])

 OP <- par(mar = c(0,0,0,0))
  plot(st_geometry(ma), col = "#EDEDED", border = "white")
  plot(st_geometry(ma[sig,]), col = M.df$mean[sig], border = "white", add = TRUE)
 par(OP)
par(OP2)
```

It's important to note that there is no one perfect solution to the multiple comparison problem. This, and other challenges facing inferential interpretation in a spatial framework such as the potential for overlapping neighbors (i.e. same polygons used in separate permutation tests) make traditional approaches to interpreting  significance challenging. It is thus recommended to interpret these tests with caution.

## Moran's *I* equation explained

The Moran's *I* statistic can be expressed in several mathematically equivalent forms, each offering different insights into its structure and interpretation. One commonly used formulation is:

$$
I = \frac{N}{\sum\limits_i (X_i-\bar X)^2}
    \frac{\sum\limits_i \sum\limits_j w_{ij}(X_i-\bar X)(X_j-\bar X)}{\sum\limits_i \sum\limits_j w_{ij}} \tag{1}
$$

Here, $N$ is the total number of spatial units, $X_i$ and $X_j$ are the attribute values at locations $i$ and $j$, $\bar{X}$ is the mean of $X$, and $w_{ij}$ is the spatial weight between locations $i$ and $j$--typically defined by contiguity, distance, or other spatial relationships.

There are a few key components of Equation (1) worth highlighting. First, you'll note the standardization of both sets of values by the subtraction of each value in $X_i$ or $X_j$ by the mean of $X$. This highlights the fact that we are seeking to compare the deviation of each value from an overall mean and not the deviation of their absolute values.

Second, you'll note an inverted variance term on the left-hand side of equation (1): this is a measure of spread. You might recall from an introductory statistics course that the variance can be computed as:

$$
s^2 = \frac{\sum\limits_i (X_i-\bar X)^2}{N}\tag{2}
$$
Note that a more common measure of variance, the sample variance, where one divides the above numerator by $(n-1)$ can also be adopted in the Moran's *I* calculation.

Equation (1) is thus dividing the large fraction on the right-hand side by the variance. This has for effect of limiting the range of possible Moran's *I* values between -1 and 1 (note that in some extreme cases, $I$ can take on a value more extreme than [-1; 1]). We can re-write the Moran's I equation by plugging in $s^2$ as follows:

$$
I = \frac{\sum\limits_i \sum\limits_j w_{ij}\frac{(X_i-\bar X)}{s}\frac{(X_j-\bar X)}{s}}{\sum\limits_i \sum\limits_j w_{ij}} \tag{3}
$$
Note that here $s\times s = s^2$. You might recognize the numerator as a sum of the product of standardized *z*-values between neighboring features. If we let $z_i = \frac{(X_i-\bar X)}{s}$ and $z_j = \frac{(X_j-\bar X)}{s}$, The Moran's I equation can be reduced to:

$$
I = \frac{\sum\limits_i \sum\limits_j w_{ij}(z_i\ z_j)}{\sum\limits_i \sum\limits_j w_{ij}} \tag{4}
$$

Recall that we are comparing a variable $X$ at $i$ to all of its neighboring values at $j$. More specifically, we are computing a summary value (such as the mean) of the neighboring values at $j$ and multiplying that by $X_i$. So, if we let $y_i = \sum\limits_j w_{ij} z_j$, the Moran's I coefficient can be rewritten as:

$$
I = \frac{\sum\limits_i z_i y_i}{\sum\limits_i \sum\limits_j w_{ij}}  \tag{5}
$$

So, $y_i$ is the average z-value for the neighboring features thus making the product $z_i y_i$ nothing more than a *correlation coefficient*.

The product $z_iy_i$ is a local measure of spatial autocorrelation, $I_i$. If we *don't* summarize across all locations $i$, we get our local I statistic, $I_i$:

$$
I_i = z_iy_i  \tag{6}
$$
The global Moran's *I* statistic, $I$, is thus the average of all $I_i$ values.

$$
I = \frac{\sum\limits_i I_i}{\sum\limits_i \sum\limits_j w_{ij}}  \tag{5}
$$

Let's explore elements of the Moran's *I* equation using the following sample dataset.

```{r fig15, fig.width = 7, fig.height=2.5, fig.cap="Simulated spatial layer. The figure on the left shows each cell's ID value. The figure in the middle shows the values for each cell. The figure on the right shows the standardized values using equation (2).", echo=FALSE, fig.align='center', message = FALSE, warning = FALSE}
# Custom functions
gridcells <- function(df, col = "A1", breaks = 7, color = "Reds"){

  if(breaks > 1){
    brks <- with(df, seq(min(df[[col]]), max(df[[col]]), length.out = breaks + 1))
    col1 <- RColorBrewer::brewer.pal(color, n =breaks)
  } else {
    col1 <- "grey80"
    brks <- c(min(df[[col]]), max(df[[col]]))
  }
  col <- ensym(col)
  ggplot(df, aes(fill = !!col)) + geom_sf() +
  geom_text(aes(x= st_coordinates(ctr)[,1],
                y= st_coordinates(ctr)[,2],
                label = !!col)) +
  scale_fill_stepsn(
    colors = col1,
    breaks = brks,
    guide = guide_coloursteps(even.steps = FALSE,
                                              show.limits = FALSE,
                                              barheight = unit(2.2, "in"),
                                              barwidth = unit(0.15, "in"))) +
  theme_void() +
  guides(fill="none")
}

sd2 <- function(x){
  sqrt(sum( (x - mean(x))^2 ) / length(x))
}

# Synthetic data
library(gridExtra)
m <- matrix(c(25, 37, 41, 33, 31, 34, 18, 38, 12, 20, 11, 31, 5, 4, 6, 13),
            byrow = FALSE, ncol = 4)
r <- stars::st_as_stars(m)
poly <- st_as_sf(r, as_points=FALSE )
poly$id <- 1:length(m)
ctr <- st_centroid(poly)

poly$z <- with(poly, (A1 - mean(A1)) / sd2(A1))
poly$z_disp <- round(poly$z, 2) # Rounded values used for display purposes only

# Plot the layer(s)
grid.arrange(gridcells(poly, "id", breaks = 1, color = "Reds") , gridcells(poly), gridcells(poly, "z_disp"), ncol = 3)
```

The first step in the computation of a Moran's *I* index is the generation of a spatial weights matrix. The weights can take on many different values. For example, one could assign a value of `1` to a neighboring cell as shown in the following matrix.

```{r fig16, echo=FALSE, fig.width = 3, fig.height=3, message=FALSE, warning=FALSE, fig.cap="Table 1.  Weights matrix for binary weights. Rows represent the $i$th index and columns represent the $j$th index."}

library(kableExtra)
library(dplyr)

nb <- poly2nb(poly, queen=TRUE)
nb.df <- nb2mat(nb, style = "B") |> as.data.frame()
colnames(nb.df) <- 1:length(m)
nb.df2 <- mutate(nb.df, across(1:16, ~ cell_spec(.x, color = ifelse(.x == 1, "red", "grey30"   ))))

knitr::kable(nb.df2, format="html", row.names = TRUE, escape = FALSE, linesep = "") %>%
    kable_styling(bootstrap_options = c("striped"), full_width = FALSE, font_size = 11) |>
    column_spec(1, bold=T) |>
    column_spec(1:(length(m)+1), extra_css = 'padding: 5px;')|>
    row_spec(1:length(m),extra_css = 'line-height: 4px;')
```

For example, cell ID `1` (whose value is `r poly$A1[1]` and whose standardized value, $z_1$, is `r poly$z_disp[1]`) has for neighbors cells `2`, `5` and `6`. Computationally (working with the standardized values), this gives us a summarized neighboring value (aka lagged value), $y_1(lag)$ of:

$$
\begin{align*}
y_1 =  \sum\limits_j w_{1j} z_j {}={} & (0)(0.21)+(1)(1.17)+(0)(1.5)+ ... + \\
                                      & (1)(0.69)+(1)(0.93)+(0)(-0.36)+...+ \\
                                      & (0)(-0.76) = 2.79
\end{align*}
$$

Computing the spatially lagged values for the other 15 cells generates the following scatterplot:

```{r fig17, echo=FALSE, fig.height=4, fig.width=4,  results='hide', fig.cap="Moran's *I* scatterplot using a binary weight. The red point is the ($z_1$, $y_1$) pair computed for cell `1`.", fig.align='center'}

lw <- nb2listw(nb, style = "B")

poly$z_lag <- lag.listw(lw, poly$z)

# M <- localmoran(poly$A1, lw)
# M.df <- data.frame(attr(M, "quadr"))

OP <- par(cex.axis=0.85)
 eda_lm(poly, z, z_lag, sd = FALSE, reg=FALSE, xlab = expression(z[i]),
         ylab =  expression(y[i]), show.par = FALSE)
 points(poly$z[1], poly$z_lag[1], pch =16, col="red")
par(OP)
```

You'll note that the range of neighboring values along the $y$-axis is much greater than that of the original values on the $x$-axis. This is not necessarily an issue given that the Moran's $I$ correlation coefficient standardizes the values by recentering them on the overall mean $(X - \bar{X})/s$. This is simply to re-emphasize that we are interested in how a neighboring value varies relative to a feature's value, regardless of the scale of values in either batches.

If there is a downside to adopting a binary weight, it's the bias that the different number of neighbors can introduce in the calculation of the spatially lagged values. In other words, a feature with `5` polygons (such as feature ID `12`) will have a larger neighboring value than a feature with `3` neighbors (such as feature ID `1`) whose neighboring value will be less if there was no spatial autocorrelation in the dataset.

A more natural weight is one where the values are standardized across each row of the weights matrix such that the weights across each row sum to one. For example:

```{r fig18 , echo=FALSE, fig.width = 3, fig.height=3, fig.cap="Table 2.  Weights metrix for row standardfized weights."}

nb.df <- nb2mat(nb, style = "W") |> round(digits = 3) |> as.data.frame()

colnames(nb.df) <- 1:length(m)
nb.df2 <- mutate(nb.df, across(1:16, ~ cell_spec(.x, color = ifelse(.x > 0, "red", "grey30"   ))))
knitr::kable(nb.df2, format="html", row.names = TRUE, escape = FALSE, linesep = "") |>
    kable_styling(bootstrap_options = c("striped"), full_width = FALSE, font_size = 11)  |>
    column_spec(1, bold=T)  |>
    column_spec(1:(length(m)+1), extra_css = 'padding: 5px;')  |>
    row_spec(1:length(m), extra_css = 'line-height: 4px;')
```

The spatially lagged value for cell ID `1` is thus computed as:

$$
\begin{align*}
y_1 =  \sum\limits_j w_{1j} z_j {}={} & (0)(0.21)+(0.333)(1.17)+(0)(1.5)+...+ \\
                                      & (0.333)(0.69)+(0.333)(0.93)+(0)(-0.36)+...+ \\
                                      & (0)(-0.76) = 0.93
\end{align*}
$$

Multiplying each neighbor by the standardized weight, then summing these values, is simply computing the neighbor's mean value.

Using the standardized weights generates the following scatter plot. Plot on the left shows the raw values on the *x* and *y* axes; plot on the right shows the standardized values $z_i$ and $y_i = \sum\limits_j w_{ij} z_j$. You'll note that the shape of the point cloud is the same in both plots given that the axes on the left plot are scaled such as to match the standardized scales in both axes.


```{r fig19, echo=FALSE, fig.height=3.5, fig.width=7,  results='hide', fig.cap="Moran's scatter plot with original values on the left and same Moran's I scatter plot on the right using the standardized values $z_i$ and $y_i$." , fig.align='center'}

lw <- nb2listw(nb, style = "W")

poly$val_lag <- lag.listw(lw, poly$A1)
poly$z_lag <- lag.listw(lw, poly$z)

# M <- localmoran(poly$A1, lw)
# M.df <- data.frame(attr(M, "quadr"))
#
# Mz <- localmoran(poly$z, lw)
# Mz.df <- data.frame(attr(Mz, "quadr"))
#
# M <- localmoran(poly$A1, lw)
# M.df <- data.frame(attr(M, "quadr"))

OP <- par(cex.axis=0.85, mfrow = c(1,2))
  eda_lm(poly, A1, val_lag, sd = FALSE, reg=FALSE, xlab = expression(x[i]),
         ylab = expression( x[i(lag)]), show.par = FALSE)
  eda_lm(poly, z, z_lag, sd = FALSE, reg=FALSE, xlab = expression(z[i]),
         ylab =  expression(y[i]), show.par = FALSE )
par(OP)
```

Note the difference in the point cloud pattern in the above plot from the one generated using the binary weights. Other weights can be used such as inverse distance and k-nearest neighbors to name just a few. However, must software implementations of the Moran's *I* statistic will adopt the row standardized weights.

### Local Moran's I

Once a spatial weight is chosen, and both $z_i$ and $y_i$ are computed. We can compute the $z_iy_i$ product for all locations of $i$ thus giving us a measure of the local Moran's I statistic. Taking feature ID of `1` in our example, we compute $I_1(lag) = 0.21 \times 0.93 = 0.19$. Computing $I_i$ for all cells gives us the following plot.

```{r fig20, fig.width =7, fig.height=3.5, results='hide', echo=FALSE, warning=FALSE, fig.align='center', fig.cap="The left plot shows the Moran's I scatter plot with the point colors symbolizing the $I_i$ values. The figure on the right shows the matching $I_i$ values mapped to each respective cell.", fig.align='center'}

poly$Istar <- with(poly, z * z_lag)

ii <- cut(poly$Istar, breaks = c(-1.5,-1,-.5,-.25,0,.25,.5,1, 1.5),
          include.lowest = TRUE)
col2 <- colorRampPalette(c( "red", "white", "#00CD00"))(8)[ii]

OP <- par(mfrow=c(1,2))
 eda_lm(poly, z, z_lag, sd = FALSE, reg=FALSE, ylab = expression( z[i(lag)] ),
        xlab = expression( z[i] ) , show.par = FALSE )
  points(poly$z, poly$z_lag, bg = col2, col=col2, pch=21)
  OP2 <- par(mar = c(4,4,4,4))
    plot(st_geometry(poly), main = NULL, col=col2, reset=FALSE)
    text(st_coordinates(st_centroid(poly)), labels=round(poly$Istar,2), offset=0, cex=0.8)
  par(OP2)
par(OP)
```

Here, we are adopting a different color scheme from that used earlier. Green colors highlight features whose values are surrounded by similar values. These can be either positive values surrounded by standardized values that tend to be positive or negative values surrounded by values that tend to be negative. In both cases, the calculated $I_i$ will be positive. Red colors highlight features whose values are surrounded by dissimilar values. These can be either negative values surrounded by values that tend to be positive or positive values surrounded by values that tend to be negative. In both cases, the calculated $I_i$ will be negative. In our example, two features have a negative Moran's *I* coefficient: cell IDs `7` and `12`.


### Global Moran's I

The Global Moran's *I* coefficient, $I$ is nothing more than a summary of the local Moran's *I* coefficients. Using a standardized weight, $I$ is the average of all $I_i$ values.

$$
\begin{pmatrix}
\frac{`r  paste(round(poly$Istar,2), collapse = "+")`}{\sum\limits_i\sum\limits_j w_{ij}} = 0.446
\end{pmatrix}
$$
In this example, $\sum\limits_i \sum\limits_j w_{ij}$ is the sum of all 256 values in Table (2) which, using standardized weights, sums to 16.

$I$ is thus the slope that best fits the data. This can be plotted using either the standardized values or the raw values.

```{r fig21, echo=FALSE, fig.height=3.5, fig.width=7,  results='hide', fig.cap="Moran's scatter with fitted Moran's *I* slope (red line).  The left plot uses the raw values $(X_i,X_i(lag))$ for its axes labels. Right plot uses the standardized values $(z_i,y_i)$ for its axes labels.", fig.align='center'}

OP <- par(mfrow = c(1,2))
  eda_lm(poly, A1, val_lag, sd= FALSE, xlab = expression(x[i]),
         ylab = expression( x[i(lag)]), show.par = FALSE)
  eda_lm(poly, z, z_lag, sd= FALSE, xlab = expression(z[i]),
         ylab = expression( y[i]), show.par = FALSE)

par(OP)
```

## Summary

This chapter introduced spatial autocorrelation as a key concept for understanding second-order spatial structure: how values at one location relate to values at nearby locations. Using Moran’s I as a central tool, we explored both global and local measures of spatial association, emphasizing the importance of spatial weights and neighborhood definitions. Through examples and permutation-based inference, we demonstrated how spatial autocorrelation can be quantified, tested, and interpreted across scales. The chapter concluded with a detailed breakdown of the Moran’s I equation, reinforcing its interpretation as a spatially weighted correlation.