This document is the long-form companion to Table S2 in the paper
(functionality_table.tex). For every operation listed in the table,
it records the cell assignment (✓ full, ◐ partial, blank none) for each
of the three comparison libraries — ARGneedle-lib, matUtils (BTE), and
DendroPy — together with the actual function/class/CLI subcommand
that supports the assignment and a documentation link. The
comparisons are illustrative, not exhaustive: the goal is to defend
the markers in Table S2 with concrete pointers a reader can verify.
- tskit — Python API reference: https://tskit.dev/tskit/docs/stable/python-api.html
- ARGneedle-lib — manual:
https://palamaralab.github.io/software/argneedle/manual/;
Python source under
https://github.com/PalamaraLab/arg-needle-lib/tree/main/src
(the public package re-exports
arg_needle_lib_pybind,grm,metrics,serialize_arg, andconvert). - matUtils (BTE) — matUtils CLI: https://usher-wiki.readthedocs.io/en/latest/matUtils.html; BTE Python interface source: https://github.com/jmcbroome/BTE/blob/main/src/bte.pyx.
- DendroPy — library reference: https://jeetsukumaran.github.io/DendroPy/library/index.html.
Markers are deliberately conservative: a partial mark (◐) is used when the library can perform the operation only on restricted inputs, only via conversion through another library, or only with substantial caveats relative to the tskit equivalent.
This section is about properties of the in-memory data structure each library exposes — what biological entities can be represented and how they are accessed in code. File-format concerns are in section 2 (file formats).
- tskit (✓): the tree-sequence data model encodes a full
ancestral recombination graph as a shared-edge structure. See
tskit.TreeSequenceand the data-model overview at https://tskit.dev/tskit/docs/stable/data-model.html. - ARGneedle-lib (✓): the headline data structure of the
library; the
arg_needle_lib.ARGclass stores a full recombination graph and supports round-tripping viaserialize_arg/deserialize_arg. - matUtils/BTE (blank): the MAT data model is a single phylogeny — there is no notion of recombination or alternative topologies along a genome.
- DendroPy (blank): stores a
TreeorTreeList; no ARG data structure.
- tskit (✓): the
Site
and
Mutation
tables are first-class members of
TableCollection; mutations are anchored to specific edges and inherited by descendants during tree traversal. - ARGneedle-lib (✓): mutations are an intrinsic part of the
ARG;
generate_mutations,get_mutations_matrix, andget_genotype(inarg_needle_lib_pybind.cpp) operate on mutations attached to ARG edges. - matUtils/BTE (✓): mutation annotation is the entire point of
the data model —
MATNode.mutationsis a first-class node property and the protobuf format encodes per-edge mutations natively. - DendroPy (blank): sequence/character data lives in a separate
CharacterMatrixkeyed by taxon; mutations are not anchored to tree edges and traversal does not propagate them.
- tskit (✓):
TreeSequence.reference_sequence/has_reference_sequenceandTableCollection.reference_sequencestore the reference inside the data structure itself, alongside schema metadata. - ARGneedle-lib (blank): no reference-sequence concept anywhere in the pybind layer or Python wrappers.
- matUtils/BTE (blank): the reference is supplied as an
external
--input-fasta(-f) argument tomatUtils summaryandmatUtils extract— seelib_docs/matutils.rstlines 142 and 246. It is not stored inside the protobuf. - DendroPy (blank): no reference concept.
- tskit (✓): the
IndividualTablegroups one or more sample nodes under a single biological individual, capturing ploidy directly. This makes diploids, pedigrees, and family-structured simulations first-class. - ARGneedle-lib (◐): there is no individual table, but the
Python API has diploid-aware helpers —
exact_arg_grmandmonte_carlo_arg_grmaccept adiploid=Trueflag, andhaploid_grm_to_diploid(inlib_docs/argneedle_grm.pylines 26–37) pairs neighbouring sample IDs as a haploid couple. Counted as partial because the convention is implicit. - matUtils/BTE (blank):
MATNodeis a single-leaf abstraction; no individual or ploidy concept. (The word "individual" appears inlib_docs/bte.pyxonly in unrelated docstrings.) - DendroPy (blank):
TaxonandTaxonNamespacecarry labels but there is no individual-vs-sample distinction.
- tskit (✓): the
PopulationTablestores population assignments inside the data structure, and the population-aware statistics (Fst, divergence, joint AFS) consume it directly. - ARGneedle-lib (blank): samples are a flat haploid list; no population concept in the pybind layer.
- matUtils/BTE (blank): matUtils accepts geographic
region annotations as an external TSV (used by
introduce) but the MAT data structure itself has no population concept. - DendroPy (blank): population labels can be encoded as taxon
labels and passed externally to
popgenstat, but they are not a typed part of the tree data structure.
- tskit (✓): every column of every Table is a NumPy array
(
nodes.time,edges.left,edges.right,mutations.node, …). This enables vectorised bulk operations and underpins much of tskit's performance. - ARGneedle-lib (blank): the ARG is a C++ node-pointer object exposed through per-node Python wrappers; no NumPy column views.
- matUtils/BTE (blank):
MATNodeis a per-node wrapper around the underlying C++ object; tree-wide attributes are accessed by traversal, not bulk array. - DendroPy (blank): the
Tree/Node/Edgegraph is a pure Python object hierarchy.
This section is about which named on-disk file formats each library can read or write, including each library's native binary serialisation format.
- tskit (✓): the
.treesfile format (kastore-backed) loaded viatskit.loadand written viaTreeSequence.dump. - ARGneedle-lib (✓):
.argnHDF5 format viaarg_needle_lib.serialize_arg/deserialize_arg. - matUtils/BTE (✓): UShER mutation-annotated tree protobuf
(
.pb) viamatUtils extract --write-pbandMATree.save_pb/MATree.from_pb. - DendroPy (blank): all on-disk formats are text (Newick, NEXUS, NeXML, PHYLIP, FASTA).
- tskit (✓):
TreeSequence.write_vcfandTreeSequence.as_vcf. - ARGneedle-lib (blank): no VCF writer in the pybind or
wrapper layers; users round-trip via
arg_to_tskitand callwrite_vcffrom tskit. - matUtils/BTE (✓):
matUtils extract --write-vcfandMATree.write_vcf(lib_docs/bte.pyx). - DendroPy (blank): no VCF reader or writer.
- tskit (✓):
Tree.as_newickandTreeSequence.as_newick. - ARGneedle-lib (✓):
arg_needle_lib.arg_to_newick(registered in the pybind layer asarg_to_newwick). - matUtils/BTE (✓):
matUtils extract --write-newick;MATree.get_newick/MATree.write_newick. - DendroPy (✓):
Tree.write(schema="newick")andTree.as_string(schema="newick").
- tskit (✓):
TreeSequence.as_fasta/write_fastaandTreeSequence.alignments. - ARGneedle-lib (blank): no FASTA writer.
- matUtils/BTE (blank): the six FASTA mentions in the matUtils
documentation (
lib_docs/matutils.rstlines 59, 63, 122, 142, 186, 246) are all references to the input reference FASTA consumed by--translateand--write-taxodium. Line 186 explicitly tells users to convert a matUtils-emitted VCF to FASTA via the externalvcf2fastatool. BTE'sbte.pyxhas no FASTA writer either. - DendroPy (✓):
CharacterMatrix.write(schema="fasta").
- tskit (blank): not supported.
- ARGneedle-lib (blank): not supported.
- matUtils/BTE (blank): not supported.
- DendroPy (✓): first-class NEXUS reader/writer via the
unified
read/writeschemas (NeXML is also supported).
- tskit (✓):
TreeSequence.treesiterator andTreeSequence.breakpoints. - ARGneedle-lib (blank): the C++ side iterates local trees via
stab queries (e.g.
bitset_overlap_stab,stab_return_all_bitsets), but the Python API does not expose a per-tree iterator. The documented route to per-tree analysis isarg_needle_lib.arg_to_tskitfollowed by tskit's own iterator, so the operation is not natively available. - matUtils/BTE (blank): the data model is a single phylogeny, not a sequence of trees along a genome; no notion of recombination.
- DendroPy (blank): same — single tree or list of trees, no genomic positioning.
- tskit (✓):
Tree.preorder,Tree.postorder,Tree.timeasc/timedesc. - ARGneedle-lib (blank): the C++
arg_traversal.hppmachinery traverses ARG nodes (time_efficient_visit), but no Python-level pre/post-order iterator over local trees is exposed. Users convert to tskit for traversal, so the operation is not natively available. - matUtils/BTE (✓):
MATree.depth_first_expansion(pre-order) andMATree.breadth_first_expansion. - DendroPy (✓):
Tree.preorder_node_iter,postorder_node_iter,levelorder_node_iter,leaf_node_iter.
- tskit (✓):
Tree.mrcaandTree.tmrca. - ARGneedle-lib (✓):
arg_needle_lib.most_recent_common_ancestorin the pybind layer; alsotmrca_msefor ARG-vs-ARG TMRCA comparisons. - matUtils/BTE (✓):
MATree.LCA(node_ids)(last common ancestor) andMATNode.parentfor traversal upward. - DendroPy (✓):
Tree.mrca(taxa=...).
- tskit (✓):
Tree.branch_length,Tree.total_branch_length. - ARGneedle-lib (◐): branch lengths derive from
ARGNodetime attributes and the C++ traversal returns time intervals (local_volume,total_volumegive cumulative branch length); no per-edgebranch_lengthaccessor in the Python API. - matUtils/BTE (✓):
MATNode.branch_length/MATNode.set_branch_length. - DendroPy (✓):
Edge.length,Tree.length(),Node.distance_from_root.
- tskit (✓):
Tree.kc_distance,TreeSequence.kc_distance, andTree.rf_distance. - ARGneedle-lib (✓):
arg_needle_lib.kc_topologyandarg_needle_lib.metrics.kc2_tmrca_mse_stab/rf_total_variation_stabcompute KC² and scaled RF between ARGs. - matUtils/BTE (blank): no Robinson-Foulds, KC, or quartet distance is exposed in either matUtils or BTE; topology comparison is not part of the matUtils workflow.
- DendroPy (✓):
treecompare.symmetric_difference(unweighted RF),weighted_robinson_foulds_distance,euclidean_distance. No KC, but RF is the standard metric so this still counts as full support.
- tskit (✓):
Tree.colless_index,Tree.sackin_index,Tree.b1_index, andTree.b2_indexcover the standard balance/imbalance indices. - ARGneedle-lib (blank): no balance metrics exposed.
- matUtils/BTE (◐):
MATree.tree_entropyreports per-split entropy andcount_clades_inclusivegives clade sizes — usable as shape descriptors but not the standard balance indices. - DendroPy (✓):
treemeasure.colless_tree_imbalance,sackin_index,pybus_harvey_gamma.
- tskit (✓):
TreeSequence.simplify. - ARGneedle-lib (blank): no native equivalent — the ARG retains
all ancestry. Users convert to tskit and call
simplifythere. - matUtils/BTE (blank): the closest analogue is
extract-ing a subtree, which prunes leaves but does not remove internal nodes that no longer contribute history. - DendroPy (blank):
prune_taxaremoves leaves; there is no notion of a sample-restricted ancestral graph to simplify.
- tskit (✓):
TreeSequence.subsetandsimplifywith a sample list. - ARGneedle-lib (blank): no Python-level subset operation; users convert to tskit. Downgraded from the table's first-pass mark.
- matUtils/BTE (✓):
matUtils extract --samples/--clade/--regex;MATree.subtree,MATree.get_clade,MATree.get_regex,MATree.get_random. - DendroPy (✓):
Tree.extract_tree_with_taxa,extract_tree_without_taxa,prune_taxa,prune_leaves_without_taxa.
- tskit (✓):
TreeSequence.union. - ARGneedle-lib (blank): no ARG union operation.
- matUtils/BTE (blank): no union of two MATs.
- DendroPy (blank):
TreeListconcatenates lists of trees but there is no shared-history union.
- tskit (✓):
TreeSequence.keep_intervalsanddelete_intervals. - ARGneedle-lib (◐):
arg_needle_lib.trim_argrestricts an ARG to a single contiguous interval, but there is no general multi-interval keep/delete. - matUtils/BTE (blank): the data model has no notion of genomic intervals over which the tree changes.
- DendroPy (blank): same.
- tskit (✓):
TreeSequence.trim. - ARGneedle-lib (✓):
arg_needle_lib.trim_argprovides exactly this operation. Promoted from blank in the first-pass table. - matUtils/BTE (blank): not applicable.
- DendroPy (blank): not applicable.
- tskit (✓):
TreeSequence.diversityandsegregating_sites; branch and site mode, windowed. - ARGneedle-lib (blank): no diversity statistics in the Python API.
- matUtils/BTE (◐):
MATree.compute_nucleotide_diversityreturns mean pairwise nucleotide differences over the MAT. Counted as partial because it is mean π only — no windowing, no branch mode, no segregating-sites variant. Promoted from blank in the first-pass table. - DendroPy (✓):
popgenstat.nucleotide_diversity,popgenstat.num_segregating_sites, andpopgenstat.average_number_of_pairwise_differences. Promoted from blank in the first-pass table.
- tskit (✓):
TreeSequence.Tajimas_D. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not computed.
- DendroPy (✓):
popgenstat.tajimas_d. Promoted from blank.
- tskit (✓):
TreeSequence.Fstanddivergence. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (blank): the popgenstat module computes within-sample diversity statistics but does not implement Fst or between-population divergence directly.
- tskit (✓):
f2,f3,f4. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (blank): not exposed.
- tskit (✓):
TreeSequence.allele_frequency_spectrumin branch and site mode, single- and multi-population. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (◐):
popgenstat.unfolded_site_frequency_spectrumcomputes the 1D unfolded SFS for a character matrix. Counted as partial because there is no joint/multi-population SFS and no branch mode. Promoted from blank.
- tskit (✓):
TreeSequence.ld_matrixand theLdCalculatorinterface. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (blank): not exposed.
- tskit (✓): every statistic in
tskit.TreeSequenceacceptsmode="branch", computing the corresponding branch-length statistic on the trees themselves rather than on observed sites. This is the duality property and has no analogue in the comparison libraries. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (blank): not exposed.
- tskit (✓):
TreeSequence.link_ancestors(and the underlyingTableCollection.link_ancestors) returns an edge table describing, for each sample in a specified set, which segments of the genome are inherited from which members of a specified set of ancestors. Introduced in Tsambos et al. (2023). - ARGneedle-lib (blank): no equivalent in the Python API; the library's ARG primitives operate on the whole ARG rather than restricting to sample-to-ancestor paths.
- matUtils/BTE (blank): not applicable — a mutation-annotated tree has no notion of multiple local ancestors along a genome.
- DendroPy (blank): not applicable.
- tskit (✓):
TreeSequence.ibd_segmentswith multiplewithin/between, length, and MRCA filters. - ARGneedle-lib (blank): the library has no IBD-segment
extractor in its Python API. The practical route is
arg_to_tskitfollowed bytskit.TreeSequence.ibd_segments, but the operation is not natively available. - matUtils/BTE (blank): not applicable to a single phylogeny.
- DendroPy (blank): not applicable.
- tskit (✓):
TreeSequence.genealogical_nearest_neighbours. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (blank): not exposed.
- tskit (✓):
TreeSequence.genetic_relatedness_matrix,genetic_relatedness,genetic_relatedness_weighted,genetic_relatedness_vector. - ARGneedle-lib (✓):
arg_needle_lib.exact_arg_grmandmonte_carlo_arg_grm(inarg_needle_lib.grm) are headline features and are accompanied bygower_center,row_column_center, andwrite_grmfor downstream use. - matUtils/BTE (blank): not exposed.
- DendroPy (blank): not exposed.
- tskit (✓):
TreeSequence.divergence,divergence_matrix, andTree.tmrca. - ARGneedle-lib (✓):
arg_needle_lib.distance_matrixanddistance_matrix_v2give pairwise distances;tmrca_mseandkc_tmrca_vectorsgive TMRCA-based comparisons. ARG nodes carry times directly so coalescence-time queries are first-class. - matUtils/BTE (◐): distances on a MAT are mutation-count
parsimony distances along the tree (recoverable via traversal of
MATNode.mutationsand branch lengths); no native pairwise divergence-time matrix. Counted as partial. - DendroPy (◐):
treemeasure.patristic_distancegives branch-length sums between taxa, andnode_agesreturns coalescence times for an ultrametric tree. Counted as partial because there is no built-in pairwise distance matrix function beyond looping overpatristic_distancecalls.
- tskit (✓):
TreeSequence.mean_descendants. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (◐):
MATree.count_clades_inclusivereturns the number of leaves under each annotated clade — a related but coarser per-clade descendant count. Counted as partial. - DendroPy (blank): no direct equivalent (users iterate
leaf_iterper node).
- tskit (✓):
TreeSequence.extend_haplotypesreturns a new tree sequence in which the span of each ancestral node is extended across adjacent marginal trees wherever the relevant parent–child relationship continues to hold, producing a more parsimonious edge table without changing the genotypes. Introduced in Fritze et al. (2026). - ARGneedle-lib (blank): no equivalent operation; the library's ARG editing primitives are restricted to trimming.
- matUtils/BTE (blank): not applicable to mutation-annotated trees, which lack the multi-tree structure this operation acts on.
- DendroPy (blank): not applicable.
- tskit (✓):
TreeSequence.variantsandgenotype_matrix. - ARGneedle-lib (◐):
arg_needle_lib.get_mutations_matrixandget_genotypereturn mutation/genotype matrices but the API is oriented around mapping genotypes onto an ARG rather than iterating per-site Variant objects. Promoted from blank. - matUtils/BTE (✓):
MATree.get_mutation_samples,get_mutation,count_mutation_types, andcount_haplotypesenumerate variants and the samples carrying each mutation. - DendroPy (blank): character matrices store sequences but there is no per-variant iterator.
- tskit (✓):
TreeSequence.haplotypesandalignments. - ARGneedle-lib (blank): no haplotype iteration is exposed
(
write_mutations_to_hapswrites a HAPS file but there is no Python iterator). - matUtils/BTE (✓):
MATree.get_haplotypereconstructs the full mutation set carried by a sample relative to the reference;count_haplotypesenumerates unique haplotypes. - DendroPy (blank): sequences are stored verbatim in
CharacterMatrix; no reconstruction from a tree.
- tskit (✓):
Tree.map_mutationsperforms Hartigan parsimony to place mutations on a tree. - ARGneedle-lib (✓):
map_genotype_to_ARG,map_genotype_to_ARG_diploid,mutation_match, andmutation_bestplace genotypes optimally onto an ARG. Promoted from blank. - matUtils/BTE (✓): parsimony placement is the central operation
of UShER and matUtils;
MATree.simple_parsimonyandMATree.get_parsimony_scoreexpose it from BTE. - DendroPy (blank): no parsimony placement (DendroPy targets Bayesian/likelihood workflows externally).
- tskit (✓):
Tree.draw_svgandTreeSequence.draw_svgproduce styled, configurable SVG. - ARGneedle-lib (blank): no built-in plotting.
- matUtils/BTE (blank): matUtils emits Auspice JSON for visualisation in an external viewer; it does not draw SVG itself.
- DendroPy (blank): no SVG drawing (TikZ output is the closest vector format — see below).
- tskit (✓):
TreeSequence.draw_svgdraws all trees along the genome with shared coordinate axes. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not applicable.
- DendroPy (blank): not applicable.
- tskit (✓):
Tree.draw_textandTreeSequence.draw_text. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (✓):
Tree.as_ascii_plotandTree.print_plot;Tree.as_tikz_plotfor vector output.
- tskit (✓):
tskit.MetadataSchemawith JSON, struct, and permissive codecs; every table column has its own schema and validation. - ARGneedle-lib (blank): ARG nodes carry numeric attributes (time, IDs) but no general metadata-schema mechanism.
- matUtils/BTE (◐):
MATNode.annotationscarries clade-level annotations andmatUtils annotateadds named clade labels — a fixed, unstructured kind of metadata. No user-defined schema. - DendroPy (blank): taxa and trees carry free-form
annotationscollections, but there is no schema or validation layer.
- tskit (✓):
TreeSequence.provenancesand the provenance schema; every operation that produces a new tree sequence appends a validated provenance record. - ARGneedle-lib (blank): not exposed.
- matUtils/BTE (blank): not exposed.
- DendroPy (blank): not exposed.