Data-Science-Notebook/scrapsheet.qmd at main · ercbk/Data-Science-Notebook · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
---
title: "Scrapsheet"
---

## Other Places

-   Tea Garden - 2 dinners

## Grocery list

-   Lettuce
-   tomato
-   lunch meat
-   kind protein bars peanut butter, banana, dark chocolate
-   dried mango
-   cereal
-   sugarless chocolate
-   frozen
    -   chicken breast
    -   frozen lunch?
    -   fries
    -   detroit pepperoi pizza
    -   mini-pizza
    -   pot pie
-   strawberry juice

## Misc

-   Notes created but not added \_quarto.yml
    -   ide-positron
    -   llms-preprocessing
    -   llms-production
    -   job-consulting
    -   job-interview
    -   job-resume
    -   logos
    -   visualization-base
-   Hierarchical Bootstrap
    -   bootstrap types
        -   <https://www.r-bloggers.com/2019/09/understanding-bootstrap-confidence-interval-output-from-the-r-boot-package/>
    -   [Uncertainty quantification for cross-validation](https://radlfabs.github.io/posts/thesis/)
        -   Procedure as described ([Davison and Hinkley 1997](https://radlfabs.github.io/posts/thesis/#ref-davison_bootstrap_1997); [Goldstein 2010](https://radlfabs.github.io/posts/thesis/#ref-goldstein_bootstrapping_2010))
            -   Sample with replacement from the fold indices –\>

            -   Sample with replacement from the validation preds of the newly set of fold indices

            -   Calculate CV estimate on the new set of validation preds

            -   CI Variants: basic, normal, studentized, percentile
    -   My understanding
        1.  Sample fold indices w/replacement
        2.  For each sampled fold's validation set, sample the predictions w/replacement
        3.  For each sampled fold's predictions on the validation set, calculate the score
        4.  Average the scores across the folds.
        5.  Repeat 1-10K times
        6.  Calculate CI variant on the distribution of averaged scores
    -   Questions
        -   How many bootstrap iterations?
-   Empirical Orthogonal Functions (EOFs)
    -   A form of PCA applied to spatiotemporal data that is useful for understanding patterns in fields like climate science, oceanography, and meteorology.

    -   Components

        -   Spatial Patterns (EOFs) - Shows where *variability* is concentrated
        -   Time series (principal components) - Shows when each pattern is *active* and with what *amplitude*

    -   Process

        -   Locations are variables and observations are time series
        -   Preprocess (scale) and perform PCA
        -   Each EOF is an eigenvector ($V$) representing a spatial pattern, and its eigenvalue tells you how much variance that pattern explains. The PC is a time series ($\text{PC1}_i = V_{(,1)} \cdot A_{(i,)}$)

    -   Example: In sea surface temperature data, EOF1 might reveal the El Niño/La Niña pattern — showing warming/cooling in the tropical Pacific. The associated PC would be a time series showing El Niño events as positive peaks and La Niña as negative troughs.

## Multivariable Geostatistics

$$
\begin{align}
Z_1(s) &= X_1\beta_1 + e_1(s) \\
\vdots \\
Z_n(s) &= X_n \beta_n + e_n(s)
\end{align}
$$

-   **Cross-Variogram** - For each pair of residual variables, it describes the covariance of $e_i(s)$ and $e_j(s+h)$.

    -   A non-zero cross-variance indicates that $e_j(s+h)$ may help predict (or simulate) $e_i(s)$, and is especially true if $Z_j(s)$ is more densely sampled than $Z_i(s)$.

-   **Cokriging** and **Cosimulation** are the multivariable versions of kriging and simulation.

-   Examples:

    ``` r
    library(gstat)
    demo(cokriging)
    demo(cosimulation)
    ```

## Structural Equation Modeling (SEM)

-   Packages
    -   [{]{style="color: #990000"}[influence.SEM](https://cran.r-project.org/web/packages/influence.SEM/index.html){style="color: #990000"}[}]{style="color: #990000"} - A set of tools for evaluating several measures of case influence for structural equation models
-   "Certain datasets lead to inadmissible solutions in structural equation modeling (Paxton, Curran, Bollen, Kirby, & Chen, 2001)" ([source](https://arxiv.org/abs/2509.11741))

## Drawdown Implied Correlation (DIC)

-   Notes from

    -   [Drawdown Implied Correlations (Part 1)](https://cssanalytics.wordpress.com/2024/12/23/drawdown-implied-correlations-part-1/)
    -   [Drawdown Implied Correlations Part 2: Generalized Downside Implied Correlations](https://cssanalytics.wordpress.com/2025/01/09/drawdown-implied-correlations-part-2-generalized-downside-implied-correlations/)
    -   [Iterative PSD Shrinkage (IPS)](https://cssanalytics.wordpress.com/2025/01/21/iterative-psd-shrinkage-ips/)

-   Formula

    $$
    \text{DIC} = \frac{4\text{MDD}_{AB}^2-\text{MDD}_{A}^2-\text{MDD}_{B}^2}{2\cdot \text{MDD}_{A} \cdot \text{MDD}_{B}}
    $$

    -   $\text{MDD}$ is the Maximum Drawdown
    -   $A$ is an asset, $B$ is an asset and $AB$ is a portfolio with both $A$ and $B$

-   Asset returns are not Normal — they have “fat tails”

-   Two assets can lose money simultaneously, even while maintaining a negative (pearson) correlation, leaving the portfolio exposed to significant losses.

    -   We typically rely on a dynamic or rolling correlation to measure diversification

-   In contrast to correlations and volatility, drawdowns are nonlinear and path-dependent making them complementary for risk analysis.

-   For the DIC measure, you can certainly use drawdowns entirely within a lookback window to keep the measure mathematically consistent, but it is recommended that you use a much bigger window for calculation to avoid a lot of noise.

    -   Regardless using drawdowns from all-time highs will slighly change the final values in such a way that they can be more negative than -1 which is why you need to bound the DIC between 1 and -1 to provide a practical correlation measure.

-   When constructing a portfolio with multiple assets, the portfolio’s drawdown series (the peak-to-trough losses of the combined portfolio) behaves differently than the individual drawdown series of the constituent assets. This difference arises from how the assets interact in a portfolio.

    -   The drawdown of the portfolio is not simply the sum or average of individual asset drawdowns; instead, it reflects the combined behavior of the assets as they interact over time. Two or more assets in the portfolio may experience drawdowns at different times or to different extents, and their drawdown implied correlations will directly influence how the portfolio’s total drawdown evolves.
    -   If two assets experience drawdowns simultaneously, their joint drawdown will be greater than what you would expect from either asset alone, this will lead to a measurement of high correlation.
    -   If the portfolio drawdown is moderate to low compared to the individual assets’ drawdowns this will lead to a measurement of low correlation.
    -   Therefore, the drawdown of the portfolio can reflect behavior and interactions between assets that individual asset drawdowns and returns cannot capture. This is the key reason correlating individual asset drawdowns will not fully explain the portfolio drawdowns.

-   Single Reference Process

    1.  Drawdown Calculation for Each Asset:
        -   For Asset A and Asset B, calculate the drawdowns from their respective all-time highs over a rolling window (e.g., 60 days).

        -   For the joint time series (AB), calculate the combined drawdown from all-time highs over the same 60-day window.
    2.  Find Maximum Drawdown for AB:
        -   Identify the maximum drawdown for the joint time series (AB) over the 60-day rolling window.

        -   Retrieve the corresponding drawdown values for Asset A and Asset B on the same day that the maximum drawdown for AB occurs.
    3.  Compute the DIC:
        -   Calculate the implied correlation between the drawdowns of A, B, and AB on the specific day.

        -   This gives the DIC for the pair of assets based on the maximum drawdown for the joint time series (AB).

-   The DIC can be calculated using only one drawdown point while you need a minimum of 3 data points to compute a correlation between drawdowns.

-   The “standard” version of the DIC which uses the max drawdown over some window.

-   Note you can certainly use the top % or drawdowns above a threshold as well. But because we are only looking at maximum drawdowns with this variation, in order to create a rolling daily measurement I suggest a slight modification to the original calculation by using a “triple point” reference.

-   Triple Reference

    -   This means we are going to look at three reference points which represent the maximum drawdown for each asset and the portfolio. The purpose of a triple reference is to get as much information as possible from a shorter window and increase accuracy while reducing indicator volatility.

    -   Calculating DIC at the point of maximum drawdown (individually) for A , B, and the joint series AB and averaging the three results.\
        (need pic)

    -   The maximum drawdown for AB could be influenced by an unusually strong movement in one asset, which might not reflect the risk dynamics between A and B themselves. By averaging the DIC from the three scenarios (max drawdown of AB, A, and B), you smooth out this potential bias and get a more robust measure of correlation.

-   Triple Reference Process

    1.  Find the point of Max Drawdown for Asset A:

        -   Now, repeat the process but for Asset A as the reference. Find the maximum drawdown for A over the same 60-day rolling window.

        -   Retrieve the corresponding drawdown values for Asset B and AB on the same day that the maximum drawdown for A occurs. Calculate the DIC using the exact same formula.

    2.  Find the point of Max Drawdown for Asset B:

        -   Similarly, find the maximum drawdown for Asset B over the same 60-day window.

        -   Retrieve the corresponding drawdown values for A and AB on the same day as the maximum drawdown for B. Calculate the DIC using the same formula.

    3.  Calculate and Average DICs:

        -   You now have three DICs: one from the maximum drawdown for AB, one from the maximum drawdown for A, and one from the maximum drawdown for B.

        -   The final DIC is the average of these three DICs, providing a comprehensive view of the correlation during drawdown periods for both individual assets and their joint performance.

## DBT

### dbt-expectations

-   Feaures
    -   Free package
    -   Integrates into already existing dbt project
    -   Assertive testing
-   Set-Up
    -   Specify in package.yml

        ``` yaml
        packages:
          - package: calogica/dbt_expectations
            version: 0.10.4
        ```

    -   Run `dbt deps` to install

#### Tests

-   Tests can be specified on a model, source, seed, or column in one of your YAML files
-   Source Data
    -   Always apply your tests to the source data if possible

    -   Because sources are defined in your YAML files, this is where you will want to write your tests

        ``` yaml
        sources:
          - name: company_customers
            database: company
            schema: customers
            description: "Contains personal customer information for company"
            tables:
              - name: addresses
                description: "Customer addresses for the company"
                tests:
                  - dbt_expectations.expect_table_column_count_to_be_between:
                      min_value: 1
                      max_value: 10
        ```

        -   Specify the test under one of the source’s tables and not the source itself
        -   check to make sure the [customers.addresses]{.var-text} table has between 1-10 columns.
-   Models
    -   Adding tests to your complex data models is great for ensuring your data is as expected *after* the transformation process.

    -   Similar sytax to specifying in Source Data

        ``` yaml
        models:
          - name: stg_addresses
            description: "Customer addresses for the company"
            tests:
              - dbt_expectations.expect_table_column_count_to_be_between:
                  min_value: 1
                  max_value: 10
        ```

        -   Tests on the addresses staging model
-   Column
    -   Can only apply the test to one column

    -   Preferrable on source data

    -   Make sure the column names of your sources and models are fully documented before you can implement column tests.

        ``` yaml
        models:
          - name: stg_addresses
            description: "Customer addresses for the company"
            columns:
              - name: address_id
                description: "The primary key of this table"
                tests:
                  - dbt_expectations.expect_column_to_exist
        ```

        -   Tests whether [address_id]{.var-text} exists as a column within the model
-   Check for Recent Data
    -   Applies to one column only

    -   Recommendation: Making this interval 3 days max, so you can catch the issue at the source fairly quickly

    -   Use Case: FiveTran shows data connector syncs working, but you aren't sure

        ``` yaml
        tests:
          - dbt_expectations.expect_row_values_to_have_recent_data:
                datepart: day
                interval: 3
        ```

        -   If there is no new data from the last 3 days then the test will throw an error.
-   Compare Column Values
    -   Applies to models, seeds, and sources

    -   Compares if the value in column A is greater than the value in column B.

        ``` yaml
        tests:
          - dbt_expectations.expect_column_pair_values_A_to_be_greater_than_B:
              column_A: total_amount
              column_B: sub_total
              or_equal: False
        ```

        -   If the source contains high-quality data, the total should always be greater than the subtotal
-   Check Column Type
    -   Use Case: If your source is a spreadsheet, they are particularly prone to these types of data entry errors

        ``` yaml
        tests:
          - dbt_expectations.expect_column_values_to_be_of_type:
              column_type: timestamp_ntz
        ```

        -   This makes sure the time stamp is a format.
        -   Different time stamps across different models and data sources can become an issue with joins, etc.
-   Add Row Conditions
    -   These can be added to other tests

    -   A commin condition is “id is not null”

        ``` yaml
        tests:
          - dbt_expectations.expect_column_values_to_be_in_set:
              value_set: ['cat','dog','pig']
              quote_values: true
              row_condition: "animal_id is not null"
        ```

        -   This test only looks at the column values whose row does *not*have a null [animal_id]{.var-text}.

## Functional Data Analysis and Forecasting

-   Each observation in a functional dataset consists of a collection of points representing a continuous curve or surface over a compact domain (e.g., a fixed length of time or region of space).
-   Functional data provide detailed information of a continuous process
-   Apply a dimension reduction technique, such as fPCA, and use a subset of the transformed features as predictor variables
    -   Only accounts for vertical functional variability, where *vertical* variability (also known as y or amplitude variability) is the variability in the height of functions
-   Horizontal variability (also known as x or phase variability) is the variability in the location of peaks and valleys of the functions.
-   Requires enough data so that smoothing functions can accurately interpolate the curve.
-   Me
    -   Seems like uber-nonlinear modeling. The coefficients and the predictors are smoothing functions. No flexibility on which terms get smoothers — all do. Also, the RHS is an integral.
    -   Seems like this can be used for multivariate/group time series forecasting or as a feature reduction technique
-   Packages
    -   [{]{style="color: #990000"}[FastFGEE](https://cran.r-project.org/web/packages/fastFGEE/index.html){style="color: #990000"}[}]{style="color: #990000"} - Fits functional generalized estimating equations for longitudinal functional outcomes and covariates using a one-step estimator that is fast even for large cluster sizes or large numbers of clusters
    -   [{]{style="color: #990000"}[fdars](https://cran.r-project.org/web/packages/fdars/index.html){style="color: #990000"}[}]{style="color: #990000"} - Written in Rust. Provides methods for functional data manipulation, depth computation, distance metrics, regression, and statistical testing. Supports both 1D functional data (curves) and 2D functional data (surfaces).
    -   [{]{style="color: #990000"}[fkcentroids](https://cran.r-project.org/web/packages/fkcentroids/index.html){style="color: #990000"}[}]{style="color: #990000"} - Functional K-Centroids Clustering Using Phase and Amplitude Components
    -   [{]{style="color: #990000"}[hdftsa](https://cran.r-project.org/web/packages/hdftsa/index.html){style="color: #990000"}[}]{style="color: #990000"} - High-Dimensional Functional Time Series Analysis
    -   [{]{style="color: #990000"}[MFSD](https://cran.r-project.org/web/packages/MFSD/index.html){style="color: #990000"}[}]{style="color: #990000"} - Analysis of multivariate functional spatial data, including spectral multivariate functional principal component analysis and related statistical procedures
    -   [{]{style="color: #990000"}[mlr3fda](https://mlr3fda.mlr-org.com/){style="color: #990000"}[}]{style="color: #990000"} - fda in mlr3
    -   [{]{style="color: #990000"}[refund](https://cran.r-project.org/web/packages/refund/index.html){style="color: #990000"}[}]{style="color: #990000"} - Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression.
    -   [{]{style="color: #990000"}[refundBayes](https://cran.r-project.org/web/packages/refundBayes/index.html){style="color: #990000"}[}]{style="color: #990000"} - Bayesian regression with functional data, including regression with scalar, survival, or functional outcomes
    -   [{]{style="color: #990000"}[roahd](https://astamm.github.io/roahd/){style="color: #990000"}[}]{style="color: #990000"} - The Robust Analysis of High-dimensional Data package allows to use a set of statistical tools for the exploration and robustification of univariate and multivariate functional datasets through the use of depth-based statistical methods.
        -   Functions for generating functional data
        -   Band depths and modified band depths,
        -   Modified band depths for multivariate functional data,
        -   Epigraph and hypograph indexes,
        -   Spearman and Kendall’s correlation indexes for functional data,
        -   Confidence intervals and tests on Spearman’s correlation coefficients for univariate and multivariate functional data.
    -   [{]{style="color: #990000"}[SelectBoost.FDA](https://fbertran.github.io/SelectBoost.FDA/){style="color: #990000"}[}]{style="color: #990000"} - SelectBoost-Style Variable Selection for Functional Data Analysis
    -   [{]{style="color: #990000"}[tidyfun](https://cran.r-project.org/web/packages/tidyfun/index.html){style="color: #990000"}[}]{style="color: #990000"} - Represent, visualize, describe and wrangle functional data in tidy data frames
    -   [{]{style="color: #990000"}[veesa](https://cran.r-project.org/web/packages/veesa/index.html){style="color: #990000"}[}]{style="color: #990000"} ([Paper](https://arxiv.org/abs/2501.07602)) - Pipeline for Explainable Machine Learning with Functional Data
        -   Accounts for the vertical and horizontal variability in the functional data
        -   Provides an explanation in the original data space of how the model uses variability in the functional data for prediction
-   Papers
    -   [Penalized Spline M-Estimators for Discretely Sampled Functional Data: Existence and Asymptotics](https://arxiv.org/abs/2508.12000)
-   Types of Functional Data
    -   age-specific mortality rates (Shang et all, 2024)
        -   [Nonstationary functional time series forecasting](https://arxiv.org/abs/2411.12423)
        -   [Forecasting high-dimensional functional time series: Application to sub-national age-specific mortality](https://arxiv.org/abs/2305.19749)
            -   age- and sex-specific mortality rates in the United States, France, and Japan, in which there are 51 states, 95 departments, and 47 prefectures
    -   heights of children measured over time (Ramsay and Silverman, 2005)
    -   silhouettes of animals extracted from images (Srivastava and Klassen, 2016)
    -   glucose monitoring (Danne et al., 2017)
    -   fitness tracking (Henriksen et al., 2018)
    -   environmental sensors (Butts-Wilmsmeyer, Rapp and Guthrie, 2020)

## Gaussian Processes

-   Notes from
    -   [Compactly-supported nonstationary kernels for computing exact Gaussian processes on big data](https://arxiv.org/abs/2411.05869)
        -   alternative kernel that can discover and encode both sparsity and nonstationarity
    -   [Reading Notes on BRISC: Bootstrap for Rapid Inference on Spatial Covariances](https://juanxie19.github.io/posts/brisc/)
-   Packaages
    -   bigGP - Implements parallel linear algebra operations using threading and message-passing, which is useful for kriging and Gaussian process regression
    -   laGP - Implements local approximate Gaussian process regression for large-scale modeling and sparse computation with massive data sets.
-   A preeminent framework for stochastic function approximation, statistical modeling of real-world measurements, and non-parametric and nonlinear regression within machine learning (ML) and surrogate modeling.
-   Unlike many other machine learning methods, GPs include an implicit characterization of uncertainty
-   Traditional implementations of GPs involve stationary kernels (also termed covariance functions) that limit their flexibility and exact methods for inference that prevent application to data sets with more than about ten thousand points. (paper and its packages fix this)
    -   Other methods to fix this are generally difficult to implement for large data sets due to their large numbers of hyperparameters which leads to overfitting and the need for specialized algorithms for training
-   In regression and density estimation, Gaussian processes have been widely used as nonparametric priors for unknown random functions.
-   Reasons for Popularity
    -   **Analytical Tractability**:
        -   GPs provide a **closed-form solution** for many problems, making them analytically tractable.

        -   For example, the posterior distribution of a GP can be derived explicitly, allowing for exact inference in many cases.
    -   **Marginal and Conditional Distributions**:
        -   Any **marginal distribution** of a GP is also Gaussian. This means that if you take a subset of the random variables in a GP, their joint distribution remains Gaussian.

        -   Similarly, the **conditional distribution** of a GP is Gaussian. This property is particularly useful for making predictions at new locations, as the conditional distribution can be computed analytically.
    -   **Flexibility in Modeling**:
        -   GPs can model complex, **non-linear relationships** by choosing an appropriate covariance (kernel) function.

        -   Common kernel functions include the **Radial Basis Function (RBF)**, **Matérn**, and **Exponential kernels**, each of which captures different types of relationships in the data.
    -   **Probabilistic Predictions**:
        -   GPs provide **uncertainty estimates** along with predictions. This is crucial for decision-making in applications like Bayesian optimization, where understanding the uncertainty is as important as the prediction itself.
    -   **Applications**:
        -   GPs are widely used in **geostatistics** (e.g., kriging), **machine learning** (e.g., regression, classification), and **Bayesian optimization** (e.g., hyperparameter tuning).

        -   They are also used in **time series analysis**, **robotics**, and **environmental modeling**.
    -   **Kernel Design**:
        -   The choice of kernel function allows GPs to capture a wide range of behaviors, such as periodicity, smoothness, and trends.

        -   Kernels can also be combined or adapted to create more complex models.
    -   **Interpretability**:
        -   The parameters of the kernel function often have intuitive interpretations, such as length scales or variance, making GPs more interpretable than some other machine learning models.
-   Despite their many advantages, GPs face a significant computational bottleneck: the need to invert the covariance matrix $K(s, s')$ which has $O(n^3)$ complexity, where n is the number of observations). For large datasets, this becomes infeasible, limiting the scalability of traditional GPs.
-   To address the computational challenges of traditional GPs, Nearest Neighbor Gaussian Process (NNGP) has been developed. NNGP approximates the full GP by limiting dependencies between data points to a small subset of nearest neighbors. This reduces the computational complexity while retaining the key properties of GPs, making it a scalable alternative for large datasets.

## Wavelets

-   Notes from

    -   [TrendLSW: Trend and Spectral Estimation of Nonstationary Time Series in R](https://arxiv.org/abs/2406.05012)
    -   ChatGPT
        -   [My link](https://chatgpt.com/c/672cd99f-7480-8002-80cd-39263f52950b)
        -   [Public Link](https://chatgpt.com/share/672cddd9-ff2c-8002-a105-be470cc41dc2)

-   Packages

    -   [{]{style="color: #990000"}[DWaveNARDL](https://cran.r-project.org/web/packages/DWaveNARDL/index.html){style="color: #990000"}[}]{style="color: #990000"} - Dual Wavelet Based NARDL Model

        -   Nonlinear Autoregressive Distributed Lag model for noisy time series analysis

        -   Designed to capture both short-run and long-run relationships

        -   Useful for analyzing economic and financial time series data that exhibit both long-term trends and short-term fluctuations

-   Wavelets are particularly suited for analyzing nonstationary time series because they can capture both time and frequency information.

-   Spectral estimation

    -   Spectral estimation is a technique used to analyze the frequency content of time series data, particularly focusing on how the variance (or power) is distributed across different frequencies. This is especially useful in nonstationary time series, where statistical properties, such as the mean and variance, change over time.
    -   In classical settings, spectral estimation often involves the Fourier transform, where stationary processes are assumed. For nonstationary processes, wavelet-based methods are popular.
    -   Usage
        -   Identifying underlying periodicities, understanding the evolution of variance across different time periods, and detecting anomalies or regime shifts in time series.
        -   If the time series contains periodic behavior (such as seasonal patterns), the spectral plot will show high power at the corresponding frequency
        -   In an EWS plot, you may see that a certain frequency band has high power only during certain time intervals, indicating a time-localized periodicity.
        -   A flat or consistent spectral plot indicates stationarity, while time-varying plots (especially with wavelet-based approaches) show how different scales contribute at different times, revealing nonstationarity. An increasing or decreasing trend in a particular frequency band over time might indicate a nonstationary process.
        -   Sudden spikes or drops in spectral power at specific times and frequencies could indicate anomalies, abrupt changes, or unusual behavior in the time series.
            -   A sudden burst (i.e. transient, not consistent) of power at a low scales (high-frequency) may indicate an abrupt event, such as a machine breakdown in industrial monitoring data
            -   If the burst occurs at high scales (low frequencies), it may indicate a sudden, large-scale trend change, such as a long-term shift or event.
        -   In economic time series, higher scales (low frequencies) might represent long-term economic cycles, while lower scales (high frequencies) might correspond to short-term market volatility.
        -   Analysts can detect sub-seasonal or irregular cycles that might not be immediately obvious.
        -   If the model’s spectral plot aligns with the observed data's spectral plot, this indicates a good fit. Discrepancies in power or patterns suggest areas where the model might need improvement.
            -   e.g. After fitting a time series model, an analyst might generate a simulated spectral plot and compare it with the real data to see if the model captures both the trends and the variabilities at different scales.
        -   Time-frequency or time-scale plots provide a localized view of how spectral properties change, enabling analysts to detect events or features that occur intermittently.
            -   e.g. In seismic data, an EWS plot could reveal bursts of energy at different scales corresponding to earthquake tremors and aftershocks.
        -   High power at certain frequencies suggests strong periodicity or correlation at corresponding time lags.
            -   e.g. In biological signals, changes in autocorrelation can be related to transitions between different physiological states (e.g., sleep stages).

-   Scales

    -   Scales refer to the different levels of resolution at which the data is analyzed. These scales are analogous to frequencies in Fourier analysis but provide a time-localized view of how signal components of different frequencies evolve over time.
    -   Each scale $j$ corresponds to a specific level of detail, and the evolutionary wavelet spectrum $S_j(z)$estimates the power at scale $j$ over time $z$. This allows for the identification of how variance at different frequencies changes across time, which is crucial for analyzing nonstationary time series.
    -   The wavelet function at a particular scale acts like a band-pass filter, capturing specific ranges of frequencies. Lower scales capture higher-frequency details (short-term fluctuations), while higher scales capture lower-frequency trends (long-term patterns).
    -   At higher scales, the wavelet stretches over a longer portion of the time series, capturing slower, low-frequency changes. At lower scales, the wavelet is more compressed, capturing faster, high-frequency changes. Unlike Fourier transforms, which assume a fixed frequency range, wavelets allow for more adaptive, localized analysis.
    -   **Low Scales (High Frequencies)**: Capture fine, fast-varying details, like noise or short-term oscillations.

    <!-- -->

    -   **High Scales (Low Frequencies)**: Capture broad, slow-varying features, such as long-term trends or cycles.

    -   Scales to Periodicity

        -   Frequency Relationship\
            $$
            f_j = \frac{k}{2^j \Delta t}
            $$

            -   $f_j$ : Frequency at scale $j$

            -   $\Delta t$ : Time step of your data (e.g. 1 day for daily, 1 sec for seconds, etc.)

            -   $k$ : Constant that depends on the choice of wavelet

                -   e.g. Morlet wavelet, $k \approx 1.03$

        -   Scale to Periodicity\
            $$
            \begin{align}
            P_j &= \frac{1}{f_j} \\
            &\approx \frac{2^j \Delta t}{k}
            \end{align}
            $$

        -   [Example]{.ribbon-highlight}: 1 sec data with power detected at $S_5$ (scale 5) using a Morlet wavelet\
            $$
            P_5 \approx \frac{2^5 \times 1}{1.03} \approx 31.07 \;\mbox{sec}
            $$

            -   If the high power persists over time in your wavelet spectrum (such as in an Evolutionary Wavelet Spectrum or Scalogram plot), it suggests a consistent periodic signal at that time scale.

            <!-- -->

            -   If the high power is transient, it indicates that the periodic behavior is localized in time, occurring only during certain time intervals.

-   Trend Estimation

    -   Wavelets with more vanishing moments can handle smoother and more complex polynomial trends.

    <!-- -->

    -   Wavelets with fewer vanishing moments are better at capturing sharp, localized features but may not handle smooth trends as well.

    -   Example: A wavelet with 4 vanishing moments means it can remove cubic trends from the data.

    -   Choosing the number of vanishing moments

        -   For smooth trends: If the data is expected to have a smooth, slowly varying trend, you should choose a wavelet with a higher number of vanishing moments (e.g., 4 or more). This allows the wavelet to annihilate polynomial trends up to a higher degree and isolate the trend from noise or high-frequency components.

        <!-- -->

        -   For sharp changes or noise: If the data contains sharp changes or you’re more interested in detecting localized features, wavelets with fewer vanishing moments (e.g., 1 or 2) may be more appropriate.

-   Wavelet Types

    -   **Phase distortion** occurs when the phase of different frequency components of a signal is altered unevenly, leading to a misalignment in the reconstructed signal (i.e. shifts in position). Important for trend estimation. The more symmetric a wavelet is, the less phase distortion it introduces

        -   The **phase** describes the position of the waveform relative to a reference point in time. For example, in a sine wave, the phase tells us where the peaks and troughs of the wave occur.
        -   When a signal is passed through a filter or transformation (such as a wavelet or Fourier transform), each frequency component might experience a shift in its phase and by different amounts which results in the distortion.

    -   Wavelets with strong time localization allow them to detect sharp, abrupt changes

    -   Guidelines

        -   Daubechies EP Wavelets: Choose for *sharp, localized features* (e.g. spike, discontinuities) or if phase shifts are not a major concern. They are ideal for compression or denoising while preserving features like discontinuities and detecting abrupt changes but may introduce phase distortion and boundary effects.
            -   e.g. seismic , financial data

        <!-- -->

        -   Daubechies LA Wavelets: Choose for *smooth trends* and when minimizing phase distortion is important. They are particularly effective for trend estimation and nonstationary data with long-term, smooth features.

            -   e.g. temperature changes, economic growth

## Ranking Algo

-   Notes from [Multi-Attribute Preferences: A Transfer Learning Approach](https://arxiv.org/abs/2408.10558)
-   preference data are typically elicited by individuals, whether in the form of pairwise comparisons, partial rankings or click-through data, which are aggregated into a single coherent ranking that best reflects these preferences
-   Use Cases
    -   data consisting of hotel rankings, where consumers rank various attributes of hotels such as breakfast, hygiene, price, quality of service, but also their overall satisfaction with the hotel
    -   different types of food that are ranked on various properties, such as different aspects of taste, smell, visual aspects, but also their overall ranking
-   primary attribute - the main attribute of interest
    -   Typically the overall preference or satisfaction, but not necessarily
-   secondary attributes - The other attributes on which the objects are evaluated
-   jointly learning tasks
    -   multi-task learning - concerns the improvement of multiple related learning tasks by borrowing relevant information among these tasks and therefore coincides with existing methods that aim to model multi-attribute preference data
-   learning a single task
    -   transfer learning - aims to optimise the efficiency of learning a single task, by utilising relevant information from other task
    -   the single task of interest is called the target, whilst the other tasks are sources, and forms a parallel to the primary and secondary attributes
    -   only the Gaussian graphical model and the Gaussian mixture model have been enriched by the transfer learning framework.
-   Paper goals
    -   Utilizing Bradley-Terry and its generalization the Plackett-Luce models – in order to improve inference on parameters underlying a primary attribute by utilising information contained in the secondary attributes
        -   Models frequently used in pairwise comparison data
        -   method is then incorporated into the transfer learning framework and extended upon, resulting in algorithms that generate estimates for the primary attribute with and without a known set of informative secondary attributes
    -   typically only a subset of the secondary attributes is useful when estimating the primary attribute parameters, we adapt the framework proposed by Tian and Feng, where we introduce an algorithm that is able to effectively infer the set of informative secondary attributes
    -   Bradley-Terry\
        $$
        \begin{aligned}
        &P(o_j > o_l) = \frac{e^{\alpha_j}}{e^{\alpha_j} + e^{\alpha_l}} \\
        &\text{where} \; 1 \le j \ne l \le M
        \end{aligned}
        $$
        -   Each individual $i$ assigns their preference for one object $j$ over another object $l$ from a total pool of $M$ objects.
        -   Assumes that underlying each object there exists some worth $\alpha$ that relates to its probability of being preferred over another objects.
        -   These pairwise comparisons can be presented by an undirected graph $\mathcal{G} =(\mathcal{V}, \mathcal{E})$, with vertices $\mathcal{V} = \{1, \ldots, M\}$ and edge set $\mathcal{E}$ that has the property that $(j, l) \in \mathcal{E}$ if and only if objects $j$ and $l$ are compared at least once in the data. The following conditions are postulated for the pairwise comparison graph.
    -   Assumptions
    -   Data
        -   partial ranking: $\{o_1 \gt \cdots \gt o_m\}$
        -   pairwise comparisons: $\bigcap_{1 \le j \ne l \le M} \: \{o_j \gt o_l\}$

## Trend Following/Momentum

-   Notes from [Beyond Trend Following: Deep Learning for Market Trend Prediction](https://arxiv.org/abs/2407.13685)
-   Read
    -   [Designing Robust Trend-Following System](https://quantpedia.com/designing-robust-trend-following-system/)
    -   [Diversifying Trend Following Strategies Improves Portfolio Efficiency](https://alphaarchitect.com/2024/12/portfolio-efficiency/)
        -   Diversify across CTAs
        -   It's a tail hedge, so there will be long periods of loss — i.e. not for faint of heart
        -   Improve the efficiency of their portfolio by also adding allocations to the other uncorrelated strategies
    -   [Exploration of CTA Momentum Strategies Using ETFs](https://quantpedia.com/exploration-of-cta-momentum-strategies-using-etfs/)
-   Trend following
    -   Trend following or trend trading is an investment strategy based on the expectation of price movements to continue in the same direction: buy an asset when its price goes up, sell it when its price goes down.
        -   a particular criterion to detect when prices move in a particular direction over time and every investor uses his own criterion
    -   Traditional trend following is usually done on futures. Just follow trends on a large, diversified set of futures markets, covering major asset classes.
        -   Diversification is key: with multiple assets with low or negative correlations, you can achieve higher returns at a lower risk.
    -   Trend following on stocks can easily yield negative returns in the short side (when prices go down). When we trade only on the long side, it does not always add any real value.
        -   Standard trend following is not expected to work with stocks, since their correlation is too high.
    -   Compared with a passive index ETF, trend following requires additional work and creates potential risks, yet it does not always yield actual benefits.
        -   Trend following on single stocks, or a few of them, however, is not attractive for the risk you have to assume.
    -   Bear regime strategy (Meb Fabor on [Odd Lots](https://www.bloomberg.com/news/articles/2024-10-18/meb-faber-on-why-prudent-investors-keep-getting-punished?srnd=oddlots))
-   momentum investing
    -   When a stock price goes up for a while, the likelihood of rising higher is greater than the likelihood of falling. Likewise, a stock going up faster than other stocks is likely to keep going up faster than other stocks.
    -   One explanation is that people who buy past winners and sell past losers temporarily move prices. An alternative explanation is that the market underreacts to information on short-term prospects but overreacts to information on long-term prospects.
    -   Andreas Clenow employs the following trading rules on a weekly basis:
        -   A. F. Clenow, Stocks on the Move: Beating the Market with Hedge Fund Momentum Strategies.Equilateral Publishing, 2015.
        -   rank stocks on volatility-adjusted momentum (using an exponential 90-day regression, multiplied by its coefficient of determination),
        -   calculate position sizes (targeting a daily move of 10 basis points),
        -   check the index filter (S&P 500 above its 200-day moving average), and build your portfolio.
        -   Individual stocks are disqualified when they are below their 100-day moving average or have experienced a gap over 15%.
        -   When, in the weekly portfolio rebalancing, a stock is no longer in the top 20% of the S&P 500 ranking or fails to meet the qualification criteria (moving average and gap), it is sold. It is replaced by other stocks only if the index is in a positive trend. Twice per month, position sizes are also rebalanced to control risk.

## Clustering/Hierarchical TS

-   Notes from [Constructing hierarchical time series through clustering: Is there an optimal way for forecasting?](https://arxiv.org/html/2404.06064v1)
    -   Code: <https://github.com/AngelPone/project_hierarchy>
-   The models used to obtain base forecasts and the reconciliation method are fixed throughout the experiments
-   coherent, that is they respect the aggregation constraints implied by the hierarchical structure. Coherent forecasts facilitate aligned decisions by agents acting upon different variables within the hierarchy. For example, consider a retail setting, where a warehouse manager supplies stock to individual store managers within their region. Forecasts could be incoherent when the warehouse manager forecasts low total demand while store managers forecast high demand, leading to supply shortages.
-   Clustered different representations (the original time series, forecast errors, features of both), different distance metrics (Euclidean, dynamic time warping), and different clustering paradigms (k-medioids, hierarchical).
    -   For features, they used 56 features from {tsfeatures}
    -   in-sample one-step-ahead forecast error as a representation of the time series, since a key step in MinT reconciliation is to estimate the $\boldsymbol{W}_h$ matrix.
        -   It is important to note that raw time series and in-sample error representations are standardized to eliminate the impact of scale variations.
-   rq1
    -   natural hierarchy outperforms the two-level hierarchy, and data-driven hierarchy via clustering can further improve forecast performance compared to natural hierarchy.
        -   “grouping” is the idea that some correct subsets of series are chosen to form new middle-level series
        -   “structure” of the hierarchy, includes the number of middle-level series, the depth of the hierarchy, and the distribution of group sizes in the middle layer(s).
    -   optimal clustering method depends on the dataset characteristics
-   rq2
    -   the driver of forecast improvement is the enlarged number of series in the hierarchy and/or its **structure**, rather than similarities between the time series (i.e. grouping).
-   rq3
    -   an equally-weighted combination of reconciled forecasts derived from multiple hierarchies improves forecast reconciliation performance
    -   our approach averages not only different coherent forecasts, but also across hierarchies with completely different middle level series. This is possible since only coherent bottom and top level forecasts are averaged and evaluated.
-   Section 2 describes the trace minimization reconciliation method (min T from {forecast})

## lab 91

-   clvtools for prob type, h2o::automl for ML
-   agg, cohort, prob, ml, fcast
-   group by, summarize, pad_by_time, ungroup
-   lag - use horizon and use 2\*horizon
-   rolling - 2,3,6 (uses lag parameters and 2; lags were 3 and 6 months with horizon = 3)
-   splits: timetk::time_series_split, cumulative = TRUE says use all previous data(?)

## Rhino

-   rhinoverse.dev
-   opiniated project structure, development toolbox, guides you towards best practices
-   `rhino::init()` or RStudio New Project wizard
-   github discussions for questions
-   Can use other UI packages and not just those in rhinoverse
-   Project structure
    -   config.yml for different environments (e.g. dev, prod)
    -   main.R with server and ui
    -   view - modules that rely on reactivity
    -   static - imgs
    -   styles - sass files (css stuff)
    -   dependencies - explicit list of packages
    -   cypress - unit tests of functions
-   `options(shiny.qutoreload = TRUE)` - once you save, the app changes automatically
-   addins
    -   formatting, lintr
    -   create rhino module
    -   build sass - automatically shows changes in app when changing and saving sass file
    -   build javascript - same as build sass but for react components
-   Uses {box} for function imports from packages and has a box linter
-   dependency management
    -   `pkg_install`/`remove` - install packages from everywhere and not just cran. Updates dependency.R and renv.lock
-   Add react components with {shiny.react}

## Signature Transform

-   Todo
    -   Continue reading
    -   Look at the separate papers from with the applied data examples are taken from
    -   Go back to original Amazon paper and see if signature parts and its appendix make more sense.
    -   Look at Discussion section in Signatory github and ask questions
-   Misc
    -   $e$ is a monomial (pg 13)
    -   $\lambda$ is a real number (pg 13)
    -   $\otimes$ is defined as the the *joining* (i.e. concantenating) of multi-indexes of monomials: $e_{i_1} \cdots e_{i_k} \otimes e_{j_1} \cdots e_{j_m} = e_{i_1} \cdots e_{i_k} e_{j_1} \cdots e_{j_m}$ (pg 13)
        -   Chen's Identity: $S(X*Y)_{a,c} = S(X)_{a,b} \otimes S(Y)_{b,c}$ where $X*Y$ is the concantenation of two paths (pg 14)
            -   So the signature of a concantenated path is equal to the circle-product of the signatures of the component paths.
        -   $\otimes n$ is the n^th^ power with respect to the circle product, $\otimes$ (pg15)\
-   Workflow
    -   Create a continuous path $X_i$ from each time-series $\{Y_i\}$ (row-wise)
    -   If needed, make use of the lead-lag transform to account for the variability in data
        -   Cumalative sum transform is another
    -   Compute the truncated signature $S(X_i)|_L$ of the path $X_i$ up to level $L$
        -   Either a Full or Log signature
    -   Standardize each signature column
    -   Use the terms of signature $\{S^I_i\}$ as features
-   Issue
    -   Degeneracy in the terms of the signature causes this representation not to be unique and introducing a problem of colinearity of the signature terms.
        -   Solution: LASSO, ridge or elastic net regularization
        -   Paper uses a 2-step lasso where signature features are selected by LASSO. Then those selected features are used in a second regression with other predictors.
-   Signature
    -   $S^{(1)}_{a,b} = X_b - X_a$
    -   $S^{(1,1)}_{a,b} = \frac{(X_b - X_a)^2}{2!}$
    -   $S^{(1,1,1)}_{a,b} = \frac{(X_b - X_a)^3}{3!}$
-   Cumulative + Lead Lag Signature Truncated to Level 2
    -   Signature
        -   $S(\tilde X)|_{L=2} = (1, S^{(1)}, S^{(2)}, S^{(1,1)}, S^{(1,2)}, S^{(2,1)}, S^{(2,2)})$
        -   $S^{(1)} = S^{(2)} = \sum_i^N X_i$
        -   $S^{(1,1)} = S^{(2,2)} = \frac{1}{2} \left(\sum_i^N X_i \right)^2$
        -   $S^{(1,2)} = \frac{1}{2} \left[\left(\sum_i^N X_i\right)^2 + \sum_i^N X_i^2 \right]$
        -   $S^{(1,2)} = \frac{1}{2} \left[\left(\sum_i^N X_i\right)^2 - \sum_i^N X_i^2 \right]$
    -   Moments
        -   Mean(X): $\frac{1}{N}S^{(1)}$
        -   Var(X): $-\frac{N+1}{N^2}S^{(1,2)} + \frac{N-1}{N^2}S^{(2,1)}$
-   Lead Lag Signature Truncated to Level 2
    -   $S(\tilde X)|_{L=2} = (1, S^{(1)}, S^{(2)}, S^{(1,1)}, S^{(1,2)}, S^{(2,1)}, S^{(2,2)})$
    -   $S^{(1)} = S^{(2)} = \sum_i^{N-1} (X_{i+1} - X_i)$
    -   $S^{(1,1)} = S^{(2,2)} = \frac{1}{2} \left(\sum_i^N (X_{i+1} - X_i) \right)^2$
    -   $S^{(1,2)} = \frac{1}{2} \left[\left(\sum_i^N (X_{i+1} - X_i)\right)^2 + \sum_i^N (X_{i+1} - X_i) \right]$
    -   $S^{(2,1)} = \frac{1}{2} \left[\left(\sum_i^N (X_{i+1} - X_i)\right)^2 - \sum_i^N (X_{i+1} - X_i) \right]$
-   Log Signature Truncated to Level 2\
    $$
    \begin{aligned}
    &\log S(X) = (\Delta X, \Delta X, \frac{1}{2}\text{QV}(X))\\
    &\begin{aligned}
    \text{where} \quad &\Delta X = X_N - X_1 \\
    &\text{QV}(X)) = \sum_{i=1}^{N-1} (X_{i+1} - X_i)^2
    \end{aligned}
    \end{aligned}
    $$
    -   [Example]{.ribbon-highlight}: eq 2.17, Calculation for Quadratic Variation (QV)

        ``` r
        x <- c(1, 4, 2, 6)
        x_lead <- dplyr::lead(x1)

        QV <- function(x1, x2){
          x1_a <- x1[-length(x1)]
          x2_a <- x2[-length(x2)]
          sq_diff <- function(j1, j2) {
            (j2 - j1)^2
          }
          purrr::map2_dbl(x1_a, x2_a, sq_diff) |> sum()
        }

        QV(x, x_lead)
        #> [1] 29
        ```
-   Questions
    -   $\otimes n$ makes no sense (pg 15) — just literally
    -   Log transform makes no sense (pg 15, 16) in terms of applying it to data
    -   Formula never provided lag-lead dimensions (pg 20)
        -   What's a lead-lag embedding and how are these things connected.
    -   How can it be -14.5 in 2.17 with QV is always positive? (pg 26)
    -   eq 2.18, Standard Signature, $S(X^2) = (1, 5, 5, 12.5, −2, 27, 12.5)$ --- How is this calculated? (pg 26)
        -   S^(1,2)^ and S^(2,1)^ don't match lead-lag algorithm. NO WHERE in the paper is the formula for these terms explicitly given. Think it might have to do with shuffle product maybe.
    -   What is the last column in the feature matrix (pg 31)? Paragraph makes it sound like it's the mean.
    -   In eq 2.34, what the fuck is going on in the 3rd dimension? It's supposed to be an indicator variable (i.e. 0 or 1)
    -   Are there guidelines on when to use cum/lead-lag transforms and full/log signatures and Level 1,2, or 3?
        -   In order to capture the quadratic variation of the price, the path is extended by means of a lead-lag transform ()
    -   Annoying phrases
        -   One can easily rewrite (pg 26) - then just spits out an answer with no previous example of how it was obtained.
-   Full Signature
    -   The terms of the signature are iterated integrals of a path, while the path is normally constructed by an interpolation of data points. One can compute such iterated integrals using several computational algorithms (cubature methods) which are generally straightforward to implement. (sect 2.3, pg 34)
    -   Signature approach is to convert data into paths and then compute the iterated integrals of the resulting paths (sect 2.4.1, pg 35)

## Reproducibility

-   Notes from <https://www.brodrigues.co/blog/2023-07-13-nix_for_r_part1/>
-   To ensure that a project is reproducible you need to deal with at least four things:
    -   Make sure that the required/correct version of R (or any other language) is installed
    -   Make sure that the required versions of packages are installed
    -   Make sure that system dependencies are installed (for example, you'd need a working Java installation to install the {rJava} R package on Linux)
    -   Make sure that you can install all of this for the hardware you have on hand.
-   Consensus seems to be a mixture of Docker to deal with system dependencies,`{renv}`for the packages (or`{groundhog}`, or a fixed CRAN snapshot like those [Posit provides](https://packagemanager.posit.co/__docs__/user/get-repo-url/#ui-frozen-urls)) and the [R installation manager](https://github.com/r-lib/rig) to install the correct version of R (unless you use a Docker image as base that already ships the required version by default). As for the last point, the only way out is to be able to compile the software for the target architecture.
-   Nix
    -   a package manager for Linux distributions, macOS and apparently it even works on Windows if you enable WSL2.

    -   huge package repository, over 80K packages

    -   possible to install software in (relatively) isolated environments

## Text Tiling

-   Previous articles

    -   5 sentence chunks - Instead of creating chunks large enough to fit into a context window (langchain default), I propose that the chunk size should be the number of sentences it generally takes to express a discrete idea. This is because we will later embed this chunk of text, essentially distilling its semantic meaning into a vector. I currently use 5 sentences (but you can experiment with other numbers). I tend to have a 1-sentence overlap between chunks, just to ensure continuity so that each chunk has some contextual information about the previous chunk. ([2-stage summarizing method](https://towardsdatascience.com/summarize-podcast-transcripts-and-long-texts-better-with-nlp-and-ai-e04c89d3b2cb))
    -   Chunk markdown documents by section using header tags (h1, h2, etc.) ([company doc searchable db](https://towardsdatascience.com/how-i-turned-my-companys-docs-into-a-searchable-database-with-openai-4f2d34bd8736#8ed1))
    -   Chunk size = Context window size (langchain default)

-   [nltk.tokenize.texttiling](https://www.nltk.org/_modules/nltk/tokenize/texttiling.html) - text tiling method from {{nltk}}

-   Created a test document that was the amalgam 4 different articles that can be used to test tiling method undefined

## Propensity Score Models?

-   Articles
    -   Stephen Senn paper on why propensity scores are redundant, <https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3133>. I think this might only be for RCTs though.
        -   From [thread](https://x.com/stephensenn/status/1788917201476423756)
        -   Also, [thread](https://x.com/AleksanderMolak/status/1788479511958352322) and a [video](https://www.youtube.com/watch?v=nT_yCwXSz54&ab_channel=CausalPythonwithAlexMolak) of 1hr podcast with Senn describing it
    -   [Thread](https://bsky.app/profile/noahgreifer.bsky.social/post/3lgiz64ikak2t) on covariate balancing propensity score (CBPS)
-   bayesian vid (currently at 23:14)
    -   I don't remember which video this is
-   in observational analysis its imagined that sample is drawn from a joint distribution of all the variables
-   in causal inf, imagine intervening to change Z, treatment, independent of X, confounders.
-   Frequentist method: G-Estimation
    -   Models

        -   propensity score models, $b(x;\gamma)$
            -   Equivalent to $\text{Pr} [Z = 1 |x]$

            -   Estimating Equation for $\gamma$

                $$
                \sum \limits_{i=1}^n x_i^T (z_i - b(x_i; \gamma)) = 0
                $$
        -   treatment free mean model, $\mu_0(x; \beta)$
            -   Used for doubly robust estimation
        -   treatment effect (or blip) model $\tau z$, which can be extended to $z\mu_1(x;\tau)$

    -   Propensity Score Regression\

        $$
        Y = Z \tau + b(X; \hat{\gamma}) \phi + \epsilon
        $$

        -   $\phi$ is the estimated coefficient for the propensity score model
-   Bayesian
    -   There are other ways but this procedure is recommended
    -   Perform full Bayesian estimation of $\gamma$ , plug that (best) estimate into the propensity score model, $b(x_i; \hat{\gamma})$ , and then perform Bayesian analysis of $\tau$ (i.e. propensity score regression)
        -   The propensity score model part of the formula is basically a hack and not mimmicking any part of the dgp therefore for bayesians, the regression model is a misspecification.

## Copulas

-   Todo

    -   <https://twiecki.io/blog/2018/05/03/copulas/> - explainer, good starting point
    -   <https://copulae.readthedocs.io/en/latest/explainers/introduction.html> - another beginner explainer
    -   <https://www.r-bloggers.com/2015/10/modelling-dependence-with-copulas-in-r/> - practical example using returns of two stocks
    -   [Paper](https://arxiv.org/abs/2403.15862)- Non-monotone dependence modeling with copulas: an application to the volume-return relationship - no code but discusses how r pkg is used

-   Packages

    -   {[copula](https://cran.r-project.org/web/packages/copula/index.html)} - Multivariate Dependence with Copulas — lots of vignettes
    -   {{[copulae](https://github.com/DanielBok/copulae)}} - Multivariate data modelling with Copulas in Python

-   I think copulas are used for bias correction in post-processing separate forecasts of variables that are related. See [paper](https://www.annualreviews.org/doi/10.1146/annurev-statistics-062713-085831#_i59) (section 4.2 and 4.3)

    -   stocks of the same sector
    -   ensemble forecasts
        -   weather - meteorologists will forecast a variable (e.g. temp) many times but each time the model uses a different set of atmosphereic conditions. These forecasts are put into a regression (i.e. the ensemble) to create the final forecast. But that forecast is biased because the forecasts are related to each other. Post-processing with copula corrects this.

-   Ensemble Copula Coupling (ECC) applies the empirical copula of the original ensemble to samples from the postprocessed predictive distributions. ([paper](https://arxiv.org/pdf/1302.7149.pdf))

    1.  Generate a raw ensemble, consisting of multiple runs of the computer model that differ in the inputs or model parameters in suitable ways.
    2.  Apply statistical postprocessing techniques, such as Bayesian model averaging or nonhomogeneous regression, to correct for systematic errors in the raw ensemble, to obtain calibrated and sharp predictive distributions for each univariate output variable individually.
    3.  Draw a sample from each postprocessed predictive distribution.
    4.  Rearrange the sampled values in the rank order structure of the raw ensemble to obtain the ECC postprocessed ensemble

-   Depending on the use of Quantiles, Random draws or Transformations at the sampling stage, we distinguish the ECC-Q, ECC-R and ECC-T variants

-   ECC is based on empirical copulas aimed at restoring the dependence structure of the forecast and is derived from the rank order of the members in the raw ensemble forecast, under a perfect model assumption, with exchangeable ensemble members. For Schaake shuffle (SSH), on the other hand, the dependence structure is derived from historical observations instead. (Overview of subject - [paper](https://journals.ametsoc.org/view/journals/bams/102/3/BAMS-D-19-0308.1.xml))

-   Packages

    -   <https://cran.r-project.org/web/packages/ensembleBMA/index.html>
    -   <https://cran.r-project.org/web/packages/ensemblepp/index.html>
        -   Data for book, Statistical Postprocessing of Ensemble Forecasts
    -   <https://cran.r-project.org/web/packages/ensembleMOS/index.html>

-   Meteorology bias-corrected forecast ([chatgpt 4o](https://chatgpt.com/share/4a17c9f3-e3ec-4ebc-8b9e-fcc38a583dfc))

    ``` r
    # Load necessary packages
    install.packages("copula")
    install.packages("fitdistrplus")
    install.packages("PerformanceAnalytics")

    library(copula)
    library(fitdistrplus)
    library(PerformanceAnalytics)

    # Simulated data for demonstration
    set.seed(123)
    n <- 1000

    # Simulating ensemble forecasts from three different models
    forecast1 <- rnorm(n, mean = 20, sd = 5)
    forecast2 <- rnorm(n, mean = 21, sd = 5)
    forecast3 <- rnorm(n, mean = 19, sd = 5)

    # Simulating observed temperatures
    observed <- 0.5 * forecast1 + 0.3 * forecast2 + 0.2 * forecast3 + rnorm(n, mean = 0, sd = 2)

    # Combine forecasts and observations into a data frame
    data <- data.frame(forecast1, forecast2, forecast3, observed)

    # Fit normal distributions to each forecast and the observed temperature
    margins <- list(
      forecast1 = fitdist(data$forecast1, "norm"),
      forecast2 = fitdist(data$forecast2, "norm"),
      forecast3 = fitdist(data$forecast3, "norm"),
      observed = fitdist(data$observed, "norm")
    )

    # Transform data to uniform margins
    u1 <- pnorm(data$forecast1, mean = margins$forecast1$estimate["mean"], sd = margins$forecast1$estimate["sd"])
    u2 <- pnorm(data$forecast2, mean = margins$forecast2$estimate["mean"], sd = margins$forecast2$estimate["sd"])
    u3 <- pnorm(data$forecast3, mean = margins$forecast3$estimate["mean"], sd = margins$forecast3$estimate["sd"])
    u_obs <- pnorm(data$observed, mean = margins$observed$estimate["mean"], sd = margins$observed$estimate["sd"])

    # Combine uniform margins into a matrix
    u_matrix <- cbind(u1, u2, u3, u_obs)

    # Fit a normal copula to the pseudo-observations
    normal.cop <- normalCopula(dim = 4)
    fit.cop <- fitCopula(normal.cop, u_matrix, method = "ml")

    # Generate new samples from the fitted copula
    copula.samples <- rCopula(n, fit.cop@copula)

    # Transform copula samples back to original scale
    bias_corrected_data <- data.frame(
      forecast1 = qnorm(copula.samples[, 1], mean = margins$forecast1$estimate["mean"], sd = margins$forecast1$estimate["sd"]),
      forecast2 = qnorm(copula.samples[, 2], mean = margins$forecast2$estimate["mean"], sd = margins$forecast2$estimate["sd"]),
      forecast3 = qnorm(copula.samples[, 3], mean = margins$forecast3$estimate["mean"], sd = margins$forecast3$estimate["sd"]),
      observed = qnorm(copula.samples[, 4], mean = margins$observed$estimate["mean"], sd = margins$observed$estimate["sd"])
    )

    # Compute the mean and standard deviation of the original and bias-corrected forecasts
    original_stats <- data.frame(
      Mean = colMeans(data[, 1:3]),
      SD = apply(data[, 1:3], 2, sd)
    )

    bias_corrected_stats <- data.frame(
      Mean = colMeans(bias_corrected_data[, 1:3]),
      SD = apply(bias_corrected_data[, 1:3], 2, sd)
    )

    # Print original and bias-corrected statistics
    cat("Original Forecast Statistics:\n")
    print(original_stats)

    cat("Bias-Corrected Forecast Statistics:\n")
    print(bias_corrected_stats)

    # Plot the original and bias-corrected forecasts
    par(mfrow = c(2, 3))
    for (i in 1:3) {
      plot(data[, i], data$observed, main = paste("Original Forecast", i), xlab = "Forecast", ylab = "Observed")
      plot(bias_corrected_data[, i], bias_corrected_data$observed, main = paste("Bias-Corrected Forecast", i), xlab = "Forecast", ylab = "Observed")
    }
    ```

-