Skip to content

Commit 9b3c12f

Browse files
committed
Merge branch 'main' into spr-dist-test
2 parents d08e84d + 5a151d6 commit 9b3c12f

126 files changed

Lines changed: 7198 additions & 2514 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.Rbuildignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@
55
^memcheck$
66
^CRAN-RELEASE$
77
^src\-
8+
^src\.
89
CONTRIBUTING
910
README.md
1011
^\.github
12+
^\.lintr
1113
cran-comments.md
1214
man-roxygen
1315
data-raw
@@ -25,6 +27,10 @@ revdep
2527
^vignettes/.*_cache$
2628
^codemeta\.json$
2729
^CODE_OF_CONDUCT\.md$
30+
^\codecov\.yml$
2831
^\.travis\.yml$
2932
^\.zenodo\.json$
3033
^\.covrignore$
34+
^\.vs.*$
35+
^\.vscode.*$
36+
^CRAN-SUBMISSION$

.github/copilot-instructions.md

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
# TreeDist R Package Development
2+
3+
TreeDist is an R package providing efficient implementations of functions for the comparison of phylogenetic trees. It includes C++ code for performance, comprehensive testing, benchmarking infrastructure, and extensive CI/CD workflows.
4+
Correctness, speed and user-friendliness are priorities.
5+
6+
Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here.
7+
8+
9+
## Context: Object structures
10+
11+
TreeDist handles phylogenetic trees, which are typically presented in one of two formats:
12+
13+
- A `phylo` object is a list containing elements:
14+
- `"edge": a 2d matrix in which each row identifies the parent vertex (column 1) and child vertex (column 2) of an edge
15+
in the tree
16+
- `"Nnode"`: Number of internal nodes
17+
- `"tip.label"`: Labels for tips with index 1, 2, ... `NTip(tree) == length(tree[["tip.label"]])`
18+
A phylo object with the attribute "preorder" has edges and internal nodes listed in a strict preorder sequence
19+
(see `TreeTools::Preorder()`; the attribute "postorder" indicates that edges are numbered in an arbitrary postorder sequence.
20+
21+
- A `Splits` object (from `TreeTools::as.Splits()`) is a raw matrix where each row corresponds to an edge in the tree.
22+
Each row is named with the integer index of a node associated with the edge (=bipartition split); the bits of the raw
23+
vector determine which of the two bipartitions each of the `attr(x, "nTip")` leaves (labelled as in the "tip.label" attribute)
24+
belongs to. The `TRUE/FALSE` labelling is arbitrary unless `TreeTools::PolarizeSplits` is called.
25+
26+
TreeDist can be considered a "descendant package" of TreeTools, which I also maintain;
27+
TreeTools contains core functionality for tree and split manipulation, designed to be compatible with TreeDist.
28+
29+
30+
## Working Effectively
31+
32+
### Bootstrap and Development Setup
33+
- Install R and development tools:
34+
```bash
35+
sudo apt update && sudo apt install -y r-base r-base-dev build-essential
36+
sudo apt install -y libcurl4-openssl-dev libssl-dev libxml2-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev
37+
```
38+
- Install R development packages:
39+
```bash
40+
sudo R -e "install.packages(c('devtools', 'testthat', 'roxygen2', 'lintr'), repos='https://cran.r-project.org/')"
41+
```
42+
- Install TreeDist dependencies:
43+
```bash
44+
sudo R -e "install.packages(c('ape', 'bit64', 'lifecycle', 'colorspace', 'fastmatch', 'RCurl', 'R.cache', 'Rdpack', 'stringi', 'PlotTools', 'TreeTools'), repos='https://cran.r-project.org/', dependencies=TRUE)"
45+
```
46+
47+
### Building and Checking
48+
- **NEVER CANCEL**: R package builds can take 10-30+ minutes depending on dependencies and system performance. Set timeout to 60+ minutes.
49+
- Build the package:
50+
```bash
51+
R CMD build .
52+
```
53+
- **NEVER CANCEL**: Full R CMD check takes 15-45+ minutes. Set timeout to 90+ minutes.
54+
- Check package (full validation):
55+
```bash
56+
R CMD check TreeDist_*.tar.gz
57+
```
58+
- Quick check (development, ~2-5 minutes):
59+
```bash
60+
R CMD check --no-build-vignettes --no-manual .
61+
```
62+
63+
### Testing
64+
- **NEVER CANCEL**: Test suite has 61+ test files and can take 10-30+ minutes. Set timeout to 60+ minutes.
65+
- Run all tests:
66+
```bash
67+
R -e "devtools::test()"
68+
```
69+
- Run tests with testthat directly:
70+
```bash
71+
R -e "testthat::test_dir('tests/testthat')"
72+
```
73+
- Run specific test file:
74+
```bash
75+
R -e "testthat::test_file('tests/testthat/test-tree_display.R')"
76+
```
77+
78+
### Development Workflow Commands
79+
- Load package for development:
80+
```bash
81+
R -e "devtools::load_all()"
82+
```
83+
- Check code style (~1-2 minutes):
84+
```bash
85+
R -e "lintr::lint_dir('.')"
86+
```
87+
- Build documentation:
88+
```bash
89+
R -e "devtools::document()"
90+
```
91+
- **NEVER CANCEL**: Build vignettes takes 5-15+ minutes. Set timeout to 30+ minutes.
92+
- Build vignettes:
93+
```bash
94+
R -e "devtools::build_vignettes(install = FALSE)"
95+
```
96+
97+
### Key Dependencies
98+
**Critical**: TreeDist requires these packages to build successfully:
99+
- `ape` (>= 5.0) - Phylogenetic analysis package
100+
- `Rcpp` (>= 1.0.8) - C++ integration (must install before TreeTools)
101+
- `TreeTools` (>= 1.16) - Core tree manipulation (large dependency)
102+
- `Rdpack` (>= 0.7) - Bibliography and citation support
103+
- `shinyjs` - Interactive web applications
104+
- `colorspace` - Color space manipulation
105+
106+
## Validation
107+
108+
**CRITICAL: Always run the full test suite before proposing changes**
109+
```bash
110+
R -e "devtools::test()"
111+
```
112+
113+
- Always run R CMD check for complete validation before finalizing changes.
114+
- ALWAYS run the full test suite when modifying C++ code in src/ directory.
115+
- ALWAYS run lintr to ensure code style compliance before committing.
116+
- For performance-critical changes, run benchmarks in benchmark/ directory:
117+
```bash
118+
R -e "source('benchmark/_run_benchmarks.R')"
119+
```
120+
- Memory checking is available but optional (takes significant time):
121+
```bash
122+
R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < memcheck/tests.R
123+
```
124+
**Note**: Memory check scripts available in `memcheck/` directory:
125+
- `memcheck/tests.R` - Run test suite with valgrind
126+
- `memcheck/all.R` - Run tests, examples, and build vignettes with memory checking
127+
128+
## Validation Scenarios
129+
After making code changes, validate functionality by testing core phylogenetic tree operations:
130+
131+
### Essential Pre-commit Validation Steps
132+
1. **Install core dependencies first**:
133+
```bash
134+
sudo R -e "install.packages(c('ape', 'colorspace', 'Rdpack', 'shinyjs', 'TreeTools'), repos='https://cran.r-project.org/', dependencies=TRUE)"
135+
```
136+
137+
2. **Basic build test** (quick validation):
138+
```bash
139+
R CMD build . --no-build-vignettes
140+
```
141+
142+
3. **Run linting** (expects GitHub Actions format):
143+
```bash
144+
R -e "lintr::lint_dir('.')"
145+
```
146+
147+
4. **Load package for testing**:
148+
```bash
149+
R -e "devtools::load_all()"
150+
```
151+
152+
153+
## Time Expectations & Critical Warnings
154+
- **R startup**: ~0.1 seconds
155+
- **Linting**: 1-3 minutes for full codebase
156+
- **Quick check** (no vignettes/manual): 2-5 minutes
157+
- **Documentation building**: 2-5 minutes
158+
- **Test suite**: 10-30+ minutes (NEVER CANCEL - set 60+ minute timeout)
159+
- **Full R CMD check**: 15-45+ minutes (NEVER CANCEL - set 90+ minute timeout)
160+
- **Package build**: 10-30+ minutes (NEVER CANCEL - set 60+ minute timeout)
161+
- **Vignette building**: 5-15+ minutes (NEVER CANCEL - set 30+ minute timeout)
162+
- **Benchmarks**: 5-20+ minutes (NEVER CANCEL - set 30+ minute timeout)
163+
164+
## Repository Structure
165+
### Key Directories
166+
- `R/` - R source code (31 R files with 500+ exported functions)
167+
- `src/` - C++ source code requiring compilation (14 C++ files, SystemRequirements: C++17)
168+
- `tests/testthat/` - Test suite (30+ test files)
169+
- `benchmark/` - Performance benchmarking (10+ benchmark files)
170+
- `man/` - Generated documentation (do not edit manually)
171+
- `vignettes/` - Package tutorials and documentation
172+
- `data/` - Package data files
173+
- `.github/workflows/` - Extensive CI/CD with R-CMD-check, benchmarks, memory checks
174+
175+
### Important Files
176+
- `DESCRIPTION` - Package metadata, dependencies, and system requirements
177+
- `NAMESPACE` - Generated by roxygen2 (do not edit manually)
178+
- `NEWS.md` - Version history (update for user-facing changes)
179+
- `tests/testthat.R` - Test runner entry point
180+
- `inst/_pkgdown.yml` - template for online documentation.
181+
(Check that new functions are registered here, ideally using `@family`)
182+
183+
## Common Tasks
184+
### After Making Changes
185+
1. Load and test changes interactively:
186+
```bash
187+
R -e "devtools::load_all(); # test your changes interactively"
188+
```
189+
2. Run linting:
190+
```bash
191+
R -e "lintr::lint_dir('.')"
192+
```
193+
3. Run relevant tests:
194+
```bash
195+
R -e "devtools::test()"
196+
```
197+
4. For C++ changes, always run full check:
198+
```bash
199+
R CMD check --no-build-vignettes .
200+
```
201+
5. Ensure that temporary files are not included in the commit,
202+
either by deleting them, not `git add`ing them, or adding a parsimonious
203+
pattern to `.gitignore`.
204+
205+
### Code Style Guidelines
206+
- Follow Google's R style guide
207+
- Use camelCase for variable names, TitleCase for function names
208+
- Use Oxford ending 'ize' (not 'ise') and UK spelling where applicable
209+
- Document functions with roxygen2 comments
210+
- Include test cases for new functionality
211+
212+
### CI Will Fail If
213+
- R CMD check fails
214+
- Tests fail
215+
- Code style violations (lintr)
216+
- Missing or inadequate documentation
217+
- Missing test coverage for new code
218+
219+
### CI/CD Workflows Available
220+
- **R-CMD-check.yml**: Comprehensive checks on Windows, macOS, Ubuntu across R versions
221+
- **benchmark.yml**: Performance regression testing triggered on PRs
222+
- **memcheck.yml**: Memory checking with valgrind (runs `tests`, `examples`, `vignettes`)
223+
- **ASan.yml**: Address sanitizer checks
224+
- **pkgdown.yml**: Documentation site generation.
225+
- **revdepcheck.yml**: Downstream dependency validation
226+
227+
## Troubleshooting
228+
### Common Build Issues
229+
- **Missing dependencies**: Install system packages first: `sudo apt install -y libcurl4-openssl-dev libssl-dev libxml2-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev`
230+
- **Rdpack warnings**: These are normal when `Rdpack` isn't installed but don't prevent building
231+
- **Package won't build**: Install core dependencies: `ape`, `colorspace`, `Rdpack`, `shinyjs`, `TreeTools`
232+
- **TreeTools installation fails**: This is a large dependency - allow 10+ minutes for compilation
233+
- **Tests fail after C++ changes**: rebuild package completely with `R CMD build .`
234+
- **Documentation warnings**: Run `devtools::document()` to regenerate documentation
235+
- **Benchmarks fail**: Performance regressions may need investigation
236+
- **Memory issues**: Use valgrind checking for C++ code validation
237+
238+
### System Requirements Validation
239+
- R version 4.0+ required (tested with R 4.3.3)
240+
- C++17 compiler support required
241+
- Minimum 2GB RAM recommended for building with dependencies
242+
- Allow 60+ minutes for full dependency installation from scratch

.github/workflows/ASan.yml

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Address Sanitizer: Replicate CRAN's gcc-ASAN 'Additional Test'
2+
on:
3+
workflow_dispatch:
4+
push:
5+
branches:
6+
- main
7+
- master
8+
- '**asan**'
9+
paths:
10+
- '.github/workflows/ASan.yml'
11+
- 'src/**'
12+
- 'inst/include/**'
13+
- 'memcheck/**'
14+
- 'tests/testthat/**.R'
15+
- 'vignettes/**.Rmd'
16+
pull_request:
17+
paths:
18+
- '.github/workflows/ASan.yml'
19+
- 'src/**'
20+
- 'inst/include/**'
21+
- 'memcheck/**'
22+
- 'tests/testthat/**.R'
23+
- 'vignettes/**.Rmd'
24+
25+
name: gcc-ASAN
26+
27+
jobs:
28+
mem-check:
29+
runs-on: ubuntu-24.04 # Update RSPM when increasing
30+
31+
name: AddressSanitizer ${{ matrix.config.test }}
32+
33+
strategy:
34+
fail-fast: false
35+
matrix:
36+
config:
37+
- {test: 'tests'}
38+
- {test: 'examples'}
39+
- {test: 'vignettes'}
40+
41+
env:
42+
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
43+
_R_CHECK_FORCE_SUGGESTS_: false
44+
RSPM: https://packagemanager.rstudio.com/cran/__linux__/noble/latest
45+
USING_ASAN: true
46+
STRINGI_DISABLE_PKG_CONFIG: true
47+
BIOCONDUCTOR_USE_CONTAINER_REPOSITORY: FALSE # For stringi
48+
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
49+
ASAN_OPTIONS: detect_container_overflow=1:verify_asan_link_order=0
50+
51+
steps:
52+
- uses: actions/checkout@v5
53+
54+
- name: Initialize ASan configuration
55+
run: |
56+
export LD_PRELOAD=$(gcc -print-file-name=libasan.so)
57+
58+
echo "PKG_CFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer" > src/Makevars
59+
echo "PKG_CXXFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer" >> src/Makevars
60+
61+
mkdir ~/.R
62+
echo "LDFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer" >> ~/.R/Makevars
63+
64+
- uses: r-lib/actions/setup-r@v2
65+
with:
66+
r-version: release # CRAN uses devel, but takes ages to load deps.
67+
extra-repositories: https://ms609.github.io/packages/
68+
69+
- name: Set up R dependencies
70+
uses: r-lib/actions/setup-r-dependencies@v2
71+
with:
72+
extra-packages: |
73+
ms609/TreeDistData
74+
dependencies: "'soft'"
75+
needs: |
76+
memcheck
77+
78+
- name: Install package
79+
run: |
80+
cd ..
81+
R CMD build --no-build-vignettes --no-manual --no-resave-data TreeDist
82+
R CMD INSTALL TreeDist*.tar.gz
83+
cd TreeDist
84+
85+
- name: ASAN - memcheck ${{ matrix.config.test }}
86+
run: |
87+
Rscript memcheck/${{ matrix.config.test }}.R

0 commit comments

Comments
 (0)