This vignette demonstrates advanced techniques for examining causal
relationships between time series using the
patterncausality
package. We will focus on three key
aspects:
Through cross-validation, we aim to understand:
To demonstrate the application of cross-validation, we will begin by
importing a climate dataset from the patterncausality
package.
Now, let’s apply cross-validation to evaluate the robustness of pattern causality. We will use the Pacific North American (PNA) and North Atlantic Oscillation (NAO) climate indices as our example time series.
set.seed(123)
X <- climate_indices$PNA
Y <- climate_indices$NAO
result <- pcCrossValidation(
X = X,
Y = Y,
numberset = seq(100, 500, by = 10),
E = 3,
tau = 2,
metric = "euclidean",
h = 1,
weighted = FALSE
)
print(result$results)
#> , , positive
#>
#> value
#> 100 0.4444444
#> 110 0.3548387
#> 120 0.1851852
#> 130 0.3157895
#> 140 0.3157895
#> 150 0.4444444
#> 160 0.3571429
#> 170 0.3750000
#> 180 0.3469388
#> 190 0.2291667
#> 200 0.3478261
#> 210 0.3653846
#> 220 0.3000000
#> 230 0.3939394
#> 240 0.3230769
#> 250 0.2881356
#> 260 0.3166667
#> 270 0.3055556
#> 280 0.2753623
#> 290 0.3625000
#> 300 0.3382353
#> 310 0.3068182
#> 320 0.3690476
#> 330 0.2375000
#> 340 0.2727273
#> 350 0.2608696
#> 360 0.3409091
#> 370 0.3414634
#> 380 0.2826087
#> 390 0.3522727
#> 400 0.2980769
#> 410 0.3548387
#> 420 0.3238095
#> 430 0.2803738
#> 440 0.2844037
#> 450 0.3083333
#> 460 0.2905983
#> 470 0.2941176
#> 480 0.3120000
#> 490 0.2892562
#> 500 0.3030303
#>
#> , , negative
#>
#> value
#> 100 0.16666667
#> 110 0.06451613
#> 120 0.29629630
#> 130 0.07894737
#> 140 0.18421053
#> 150 0.25925926
#> 160 0.21428571
#> 170 0.12500000
#> 180 0.14285714
#> 190 0.35416667
#> 200 0.17391304
#> 210 0.30769231
#> 220 0.32000000
#> 230 0.18181818
#> 240 0.10769231
#> 250 0.30508475
#> 260 0.13333333
#> 270 0.29166667
#> 280 0.31884058
#> 290 0.11250000
#> 300 0.11764706
#> 310 0.17045455
#> 320 0.13095238
#> 330 0.25000000
#> 340 0.23863636
#> 350 0.29347826
#> 360 0.17045455
#> 370 0.14634146
#> 380 0.23913043
#> 390 0.13636364
#> 400 0.25000000
#> 410 0.15053763
#> 420 0.16190476
#> 430 0.24299065
#> 440 0.21100917
#> 450 0.25000000
#> 460 0.22222222
#> 470 0.24369748
#> 480 0.22400000
#> 490 0.23140496
#> 500 0.22727273
#>
#> , , dark
#>
#> value
#> 100 0.3888889
#> 110 0.5806452
#> 120 0.5185185
#> 130 0.6052632
#> 140 0.5000000
#> 150 0.2962963
#> 160 0.4285714
#> 170 0.5000000
#> 180 0.5102041
#> 190 0.4166667
#> 200 0.4782609
#> 210 0.3269231
#> 220 0.3800000
#> 230 0.4242424
#> 240 0.5692308
#> 250 0.4067797
#> 260 0.5500000
#> 270 0.4027778
#> 280 0.4057971
#> 290 0.5250000
#> 300 0.5441176
#> 310 0.5227273
#> 320 0.5000000
#> 330 0.5125000
#> 340 0.4886364
#> 350 0.4456522
#> 360 0.4886364
#> 370 0.5121951
#> 380 0.4782609
#> 390 0.5113636
#> 400 0.4519231
#> 410 0.4946237
#> 420 0.5142857
#> 430 0.4766355
#> 440 0.5045872
#> 450 0.4416667
#> 460 0.4871795
#> 470 0.4621849
#> 480 0.4640000
#> 490 0.4793388
#> 500 0.4696970
To better visualize the results, we will use the plot
function to generate a line chart.
As you can see from the plot, the location of the causality tends to stabilize as the sample size increases. This indicates that our method is effective at capturing the underlying patterns and causal connections within the time series.
In this tutorial, you’ve learned how to use cross-validation to assess the reliability of time series causality and how to use visualization tools to better understand the results.
Now, let’s examine the cross-validation process when the
random
parameter is set to FALSE
. This
approach uses a systematic sampling method rather than random
sampling.
set.seed(123)
X <- climate_indices$PNA
Y <- climate_indices$NAO
result_non_random <- pcCrossValidation(
X = X,
Y = Y,
numberset = seq(100, 500, by = 100),
E = 3,
tau = 2,
metric = "euclidean",
h = 1,
weighted = FALSE,
random = FALSE
)
print(result_non_random$results)
#> , , positive
#>
#> value
#> 100 0.2941176
#> 200 0.2400000
#> 300 0.2972973
#> 400 0.2692308
#> 500 0.3000000
#>
#> , , negative
#>
#> value
#> 100 0.1764706
#> 200 0.3200000
#> 300 0.3108108
#> 400 0.2596154
#> 500 0.2307692
#>
#> , , dark
#>
#> value
#> 100 0.5294118
#> 200 0.4400000
#> 300 0.3918919
#> 400 0.4711538
#> 500 0.4692308
We can also visualize the results of the non-random cross-validation:
By comparing the results of the random and non-random cross-validation, you can gain a deeper understanding of how different sampling methods affect the stability and reliability of the causality analysis.
To obtain more robust results and understand the uncertainty in our causality measures, we can use bootstrap sampling in our cross-validation analysis. This approach repeatedly samples the data with replacement and provides statistical summaries of the causality measures.
set.seed(123)
X <- climate_indices$PNA
Y <- climate_indices$NAO
result_boot <- pcCrossValidation(
X = X,
Y = Y,
numberset = seq(100, 500, by = 100),
E = 3,
tau = 2,
metric = "euclidean",
h = 1,
weighted = FALSE,
random = TRUE,
bootstrap = 10 # Perform 100 bootstrap iterations
)
The bootstrap analysis provides several statistical measures for each sample size: - Mean: Average causality measure across bootstrap samples - 5% and 95% quantiles: Confidence intervals for the causality measure - Median: Central tendency measure robust to outliers
Let’s examine the results:
print(result_boot$results)
#> , , positive
#>
#> mean 5% 95% median
#> 100 0.2935145 0.00000000 0.5130952 0.3484848
#> 200 0.2128219 0.02416667 0.3593833 0.2586976
#> 300 0.3054764 0.17454545 0.4886905 0.2940141
#> 400 0.2814055 0.20677419 0.3564555 0.2854029
#> 500 0.2921291 0.18026316 0.3955296 0.2940621
#>
#> , , negative
#>
#> mean 5% 95% median
#> 100 0.3364599 0.1065934 0.6964286 0.3030303
#> 200 0.3573748 0.2100710 0.5719253 0.3361582
#> 300 0.2927158 0.1591206 0.4129583 0.2900433
#> 400 0.3209360 0.2532708 0.3866656 0.3158903
#> 500 0.3021787 0.1848489 0.4481203 0.2892157
#>
#> , , dark
#>
#> mean 5% 95% median
#> 100 0.3700256 0.1733083 0.5119565 0.4160839
#> 200 0.4298033 0.3531579 0.5409119 0.4099116
#> 300 0.4018078 0.3214286 0.4705645 0.4075221
#> 400 0.3976585 0.3348185 0.4485013 0.3989247
#> 500 0.4056922 0.3365340 0.4638937 0.4129274
We can visualize the bootstrap results using the plot function, which now shows confidence intervals:
The shaded area in the plot represents the range between the 5th and 95th percentiles of the bootstrap samples, providing a measure of uncertainty in our causality estimates. The solid line shows the median value, which is more robust to outliers than the mean.