FastSurvival provides fast alternatives to the standard survival analysis
functions in the
survival package.
Every function is designed for repeated evaluation inside large simulation
loops — adaptive sample-size re-estimation, probability-of-success
calculations, regional consistency evaluation in multi-regional trials —
where the standard iterative or object-building overhead of survfit(),
survdiff(), and coxph() becomes a computational bottleneck. Core
computations are implemented in C++ via
Rcpp for maximum performance.
| Function | Replaces | Approximate speed gain |
|---|---|---|
survfit_fast() |
survfit() + summary() at a single time point |
~50x |
survdiff_fast() |
survdiff() |
~40x |
coxph_fast() |
coxph() (point estimate + Wald CI) |
~30x |
simdata_fast() |
Custom simulation scripts | — |
Speed gains are based on median times from 1,000 microbenchmark replicates on a typical phase-3 trial dataset (n = 600, event rate 80%) with \code{presorted = TRUE}. Results vary by hardware and sample size.
# Install from GitHub
# install.packages("remotes")
remotes::install_github("gosukehommaEX/FastSurvival")library(FastSurvival)
library(survival)
# ----------------------------------------------------------------
# survfit_fast(): Kaplan-Meier at a single time point
# ----------------------------------------------------------------
ord <- order(ovarian$futime)
t_s <- ovarian$futime[ord]
e_s <- ovarian$fustat[ord]
survfit_fast(t_s, e_s, t_eval = 500, conf.type = "log")
# ----------------------------------------------------------------
# survdiff_fast(): log-rank test
# ----------------------------------------------------------------
survdiff_fast(ovarian$futime, ovarian$fustat, ovarian$rx,
control = 1, side = 2)
# ----------------------------------------------------------------
# coxph_fast(): hazard ratio via the Pike-Halley Estimator (closed-form)
# ----------------------------------------------------------------
coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1)
# ----------------------------------------------------------------
# simdata_fast(): clinical trial simulation
# ----------------------------------------------------------------
# Two-group trial, simple exponential, no dropout
df <- simdata_fast(
nsim = 1000,
n = c(100, 100),
a.time = c(0, 12),
a.rate = 200 / 12,
e.median = list(18, 24),
seed = 1
)
head(df)
# Two-group trial, piecewise exponential (delayed treatment effect)
df2 <- simdata_fast(
nsim = 1000,
n = c(100, 100),
a.time = c(0, 12),
a.rate = 200 / 12,
e.hazard = list(c(0.08, 0.08), c(0.08, 0.04)),
e.time = c(0, 6, Inf),
seed = 2
)The three analysis functions return S3-class objects
(survfit_fast, survdiff_fast, coxph_fast) with print() methods that
display the results in a format similar to the corresponding survival
package functions. Each object is internally a numeric vector, so it can be
used directly in arithmetic, subsetting, and aggregation after stripping
the class (see the simulation example below).
survfit_fast evaluates the Kaplan-Meier estimator at a single specified
time point. A C++ backend locates the evaluation cutoff via binary search,
then accumulates the Kaplan-Meier product and the Greenwood variance sum in
a single scan over event positions, without constructing intermediate
vectors. This makes it orders of magnitude faster than survfit() plus
summary() when the same sorted data are evaluated repeatedly at a fixed
landmark time inside a simulation loop.
survdiff_fast computes the log-rank statistic using a two-pointer merge
scan over the pooled sorted data, walking the time axis once while
maintaining per-group at-risk counters and processing tied event times
atomically. The C++ backend avoids the rank construction, tabulate(), and
reverse cumulative sum operations of the standard R implementation, and the
function itself bypasses the S3 object construction and formula parsing of
survdiff(). It returns either a one-sided Z-score (side = 1) or a
two-sided chi-square statistic (side = 2).
coxph_fast implements the Pike-Halley Estimator proposed by Homma
(2025), a closed-form approximation to the Cox partial likelihood maximizer.
The estimator anchors at the Pike closed-form estimate and applies a single
analytic Halley correction to the Cox score, giving residual error of order
O_p(n^{-3/2}) relative to the Cox maximum likelihood estimate. On the
pharmacoSmoking dataset (tie rate 77.5%), the Pike-Halley Estimator
reproduces the Breslow-based Cox estimate to within on the order of 1e-08.
The Wald confidence interval uses the observed information at the Pike
anchor as the variance estimate. The C++ backend performs group splitting,
at-risk counting, and per-distinct-event-time accumulation in a single pass.
simdata_fast generates individual patient data for one- or two-group time-to-event trials. Accrual times follow a piecewise uniform distribution. Survival and dropout times follow either a simple or piecewise exponential distribution, selected automatically based on whether a scalar or vector hazard is supplied. C++ backends handle piecewise sampling and two-group interleaving, and random number generation uses dqrng for speed.
A typical simulation workflow combines all four functions. Inside the loop
the returned objects can be stored as-is. When aggregating results across
simulations, strip the S3 class with unclass() so that rbind() produces
an ordinary numeric matrix:
library(FastSurvival)
df <- simdata_fast(
nsim = 1000,
n = c(100, 100),
a.time = c(0, 12),
a.rate = 200 / 12,
e.hazard = list(0.08, 0.05),
d.hazard = list(0.01, 0.01),
seed = 42
)
results <- vector("list", 1000L)
for (s in seq_len(1000L)) {
d <- df[df$sim == s, ]
results[[s]] <- coxph_fast(d$tte, d$event, d$group, control = 1)
}
hr_mat <- do.call(rbind, lapply(results, unclass))
colMeans(hr_mat)Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. Manuscript under review.
Collett, D. (2014). Modelling Survival Data in Medical Research (3rd ed.). Chapman and Hall/CRC.
MIT © 2025 Gosuke Homma