-
causal-cmd v1.0.x
+
causal-cmd v1.10.x
Introduction
-
Causal-cmd is a Java application that provides a Command-Line Interface (CLI) tool for causal discovery algorithms produced by the Center for Causal Discovery. The application currently includes the following algorithms:
-bpc, eb, fang-concatenated, fas, fask-concatenated, fci, fges, fges-mb, fofc, ftfc, gfci, glasso, imgs_cont, imgs_disc, mbfs, mgm, pc-all, pc-stable-max, r-skew, r-skew-e, r1, r2, r3, r4, rfci, skew, skew-e, ts-fci, ts-gfci, ts-imgs
+
Causal-cmd is a Java application that provides a Command-Line Interface (CLI) tool for causal discovery algorithms produced by the Center for Causal Discovery. The application currently includes the following algorithms:
+boss, bpc, ccd, cpc, cstar, fas, fask, fask-pw, fci, fcimax, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, pcmax, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci
Causal discovery algorithms are a class of search algorithms that explore a space of graphical causal models, i.e., graphical models where directed edges imply causation, for a model (or models) that are a good fit for a dataset. We suggest that newcomers to the field review Causation, Prediction and Search by Spirtes, Glymour and Scheines for a primer on the subject.
Causal discovery algorithms allow a user to uncover the causal relationships between variables in a dataset. These discovered causal relationships may be used further--understanding the underlying the processes of a system (e.g., the metabolic pathways of an organism), hypothesis generation (e.g., variables that best explain an outcome), guide experimentation (e.g., what gene knockout experiments should be performed) or prediction (e.g. parameterization of the causal graph using data and then using it as a classifier).
Command Line Usage
@@ -163,42 +132,45 @@
Command Line Usage
Keep in mind that causal-cmd has different switches for different algorithms. To start, type the following command in your terminal:
java -jar causal-cmd-<version number>-jar-with-dependencies.jar
-
-
Note: we are using causal-cmd-<version number>-jar-with-dependencies.jar to indicate the actual executable jar of specific version number that is being used.
+
** Note: we are using causal-cmd-<version number>-jar-with-dependencies.jar to indicate the actual executable jar of specific version number that is being used. **
And you'll see the following instructions:
-
Missing required options: algorithm, data-type, dataset, delimiter
-usage: java -jar causal-cmd-1.0.0.jar --algorithm <string> [--comment-marker <string>] --data-type <string> --dataset <files> --delimiter <string> [--help] [--help-all] [--json-graph] [--metadata <file>] [--no-header] [--out <directory>] [--prefix <string>] [--quote-char <character>] [--skip-latest] [--skip-validation] [--thread <string>] [--version]
- --algorithm <string> Algorithm: bpc, eb, fas, fask, fask-concatenated, fci, fges, fges-mb, fofc, ftfc, gfci, glasso, imgs_cont, imgs_disc, lingam, mbfs, mgm, mimbuild, multi-fask, pc-all, r-skew, r-skew-e, r1, r2, r3, r4, rfci, rfci-bsc, skew, skew-e, ts-fci, ts-gfci, ts-imgs
+Missing required options: algorithm, data-type, dataset, delimiter
+usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm <string> [--comment-marker <string>] --data-type <string> --dataset <files> [--default] --delimiter <string> [--experimental] [--help] [--help-algo-desc] [--help-all] [--help-score-desc] [--help-test-desc] [--json-graph] [--metadata <file>] [--no-header] [--out <directory>] [--prefix <string>] [--quote-char <character>] [--skip-validation] [--version]
+ --algorithm <string> Algorithm: boss, bpc, ccd, cpc, cstar, dagma, direct-lingam, fas, fask, fask-pw, fci, fci-iod, fci-max, fges, fges-mb, fofc, ftfc, gfci, grasp, grasp-fci, ica-ling-d, ica-lingam, images, mgm, pag-sampling-rfci, pc, pc-mb, r-boss, r-skew, r3, rfci, skew, spfci, svar-fci, svar-gfci
--comment-marker <string> Comment marker.
- --data-type <string> Data type: continuous, covariance, discrete, mixed
+ --data-type <string> Data type: all, continuous, covariance, discrete, mixed
--dataset <files> Dataset. Multiple files are seperated by commas.
+ --default Use Tetrad default parameter values.
--delimiter <string> Delimiter: colon, comma, pipe, semicolon, space, tab, whitespace
+ --experimental Show experimental algorithms, tests, and scores.
--help Show help.
+ --help-algo-desc Show all the algorithms along with their descriptions.
--help-all Show all options and descriptions.
+ --help-score-desc Show all the scores along with their descriptions.
+ --help-test-desc Show all the independence tests along with their descriptions.
--json-graph Write out graph as json.
--metadata <file> Metadata file. Cannot apply to dataset without header.
--no-header Indicates tabular dataset has no header.
--out <directory> Output directory
- --prefix <string> Output file name prefix.
+ --prefix <string> Replace the default output filename prefix in the format of <algorithm>_<numeric timestamp>.
--quote-char <character> Single character denotes quote.
- --skip-latest Skip checking for latest software version.
--skip-validation Skip validation.
- --thread <string> Number threads.
--version Show version.
Use --help for guidance list of options. Use --help-all to show all options.
-
-
By specifying an algorithm using the --algorithm switch the program will indicate the additional required switches. The program reminds the user of required switches to run. In general most algorithms also require data-type, dataset, delimiter and score. The switch --help-all displays and extended list of switches for the algorithm.
Example of listing all available options for an algorithm:
-$ java -jar causal-cmd-1.0.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic --help
+$ java -jar causal-cmd-1.9.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score --help
- usage: java -jar causal-cmd-1.0.0.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic [--addOriginalDataset] [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker <string>] [--exclude-var <file>] [--extract-struct-model] [--faithfulnessAssumed] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge <file>] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxDegree <integer>] [--metadata <file>] [--missing-marker <string>] [--no-header] [--numberResampling <integer>] [--out <directory>] [--penaltyDiscount <double>] [--percentResampleSize <integer>] [--prefix <string>] [--quote-char <character>] [--resamplingEnsemble <integer>] [--resamplingWithReplacement] [--skip-latest] [--skip-validation] [--symmetricFirstStep] [--thread <string>] [--verbose]
- --addOriginalDataset Yes, if adding an original dataset as another bootstrapping
+usage: java -jar Causal-cmd Project-1.10.0.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score [--addOriginalDataset] [--choose-dag-in-pattern] [--choose-mag-in-pag] [--comment-marker <string>] [--default] [--exclude-var <file>] [--experimental] [--external-graph <file>] [--extract-struct-model] [--faithfulnessAssumed] [--generate-complete-graph] [--genereate-pag-from-dag] [--genereate-pag-from-tsdag] [--genereate-pattern-from-dag] [--json-graph] [--knowledge <file>] [--make-all-edges-undirected] [--make-bidirected-undirected] [--make-undirected-bidirected] [--maxDegree <integer>] [--meekVerbose] [--metadata <file>] [--missing-marker <string>] [--no-header] [--numberResampling <integer>] [--out <directory>] [--parallelized] [--penaltyDiscount <double>] [--percentResampleSize <integer>] [--precomputeCovariances] [--prefix <string>] [--quote-char <character>] [--resamplingEnsemble <integer>] [--resamplingWithReplacement] [--saveBootstrapGraphs] [--seed <long>] [--semBicRule <integer>] [--semBicStructurePrior <double>] [--skip-validation] [--symmetricFirstStep] [--timeLag <integer>] [--verbose]
+ --addOriginalDataset Yes, if adding the original dataset as another bootstrapping
--choose-dag-in-pattern Choose DAG in Pattern graph.
--choose-mag-in-pag Choose MAG in PAG.
--comment-marker <string> Comment marker.
+ --default Use Tetrad default parameter values.
--exclude-var <file> Variables to be excluded from run.
+ --experimental Show experimental algorithms, tests, and scores.
+ --external-graph <file> External graph file.
--extract-struct-model Extract sturct model.
--faithfulnessAssumed Yes if (one edge) faithfulness should be assumed
--generate-complete-graph Generate complete graph.
@@ -211,33 +183,37 @@ Command Line Usage
--make-bidirected-undirected Make bidirected edges undirected.
--make-undirected-bidirected Make undirected edges bidirected.
--maxDegree <integer> The maximum degree of the graph (min = -1)
+ --meekVerbose Yes if verbose output for Meek rule applications should be printed or logged
--metadata <file> Metadata file. Cannot apply to dataset without header.
--missing-marker <string> Denotes missing value.
--no-header Indicates tabular dataset has no header.
--numberResampling <integer> The number of bootstraps/resampling iterations (min = 0)
--out <directory> Output directory
+ --parallelized Yes if the search should be parallelized
--penaltyDiscount <double> Penalty discount (min = 0.0)
- --percentResampleSize <integer> The percentage of resample size (min = 0.1)
- --prefix <string> Output file name prefix.
+ --percentResampleSize <integer> The percentage of resample size (min = 10%)
+ --precomputeCovariances True if covariance matrix should be precomputed for tubular continuous data
+ --prefix <string> Replace the default output filename prefix in the format of <algorithm>_<numeric timestamp>.
--quote-char <character> Single character denotes quote.
- --resamplingEnsemble <integer> Ensemble method: Preserved (0), Highest (1), Majority (2)
+ --resamplingEnsemble <integer> Ensemble method: Preserved (1), Highest (2), Majority (3)
--resamplingWithReplacement Yes, if sampling with replacement (bootstrapping)
- --skip-latest Skip checking for latest software version.
+ --saveBootstrapGraphs Yes if individual bootstrapping graphs should be saved
+ --seed <long> Seed for pseudorandom number generator (-1 = off)
+ --semBicRule <integer> Lambda: 1 = Chickering, 2 = Nandy
+ --semBicStructurePrior <double> Structure Prior for SEM BIC (default 0)
--skip-validation Skip validation.
--symmetricFirstStep Yes if the first step step for FGES should do scoring for both X->Y and Y->X
- --thread <string> Number threads.
+ --timeLag <integer> A time lag for time series data, automatically applied (zero if none)
--verbose Yes if verbose output should be printed or logged
-
-
In this example, we'll be running the FGES algorith on the dataset Retention.txt.
- java -jar causal-cmd-1.0.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic
+$ java -jar causal-cmd-1.10.0-jar-with-dependencies.jar --algorithm fges --data-type continuous --dataset Retention.txt --delimiter tab --score sem-bic-score
-
-This command will output by default two files fges_.txt which is a log of the algorithm's activity and fges__graph.txt which is a textual description of the graph output from the algorithm.
+This command will output by default one file fges_<unix timestamp>.txt which is a log and result of the algorithm's activity.
+'--json-graph' option will enable output fges_<unix timestamp>_graph.json which is a json graph from the algorithm, which is equivalent to the exported json file from tetrad-gui.
Example log output from causal-cmd:
-================================================================================
-FGES (Thu, March 07, 2019 01:35:52 PM)
+================================================================================
+FGES (Wed, October 04, 2023 01:42:43 PM)
================================================================================
Runtime Parameters
@@ -262,62 +238,61 @@ Command Line Usage
--------------------------------------------------------------------------------
addOriginalDataset: no
faithfulnessAssumed: no
-maxDegree: 4
+maxDegree: 1000
+meekVerbose: no
numberResampling: 0
+parallelized: no
penaltyDiscount: 2.0
percentResampleSize: 100
+precomputeCovariances: no
resamplingEnsemble: 1
resamplingWithReplacement: no
+saveBootstrapGraphs: no
+seed: -1
+semBicRule: 1
+semBicStructurePrior: 0.0
symmetricFirstStep: no
+timeLag: 0
verbose: no
-Thu, March 07, 2019 01:35:52 PM: Start data validation on file Retention.txt.
-Thu, March 07, 2019 01:35:52 PM: End data validation on file Retention.txt.
+Wed, October 04, 2023 01:42:45 PM: Start data validation on file Retention.txt.
+Wed, October 04, 2023 01:42:45 PM: End data validation on file Retention.txt.
There are 170 cases and 8 variables.
-Thu, March 07, 2019 01:35:52 PM: Start reading in file Retention.txt.
-Thu, March 07, 2019 01:35:52 PM: Finished reading in file Retention.txt.
-Thu, March 07, 2019 01:35:52 PM: File Retention.txt contains 170 cases, 8 variables.
+Wed, October 04, 2023 01:42:45 PM: Start reading in file Retention.txt.
+Wed, October 04, 2023 01:42:45 PM: Finished reading in file Retention.txt.
+Wed, October 04, 2023 01:42:45 PM: File Retention.txt contains 170 cases, 8 variables.
-Start search: Thu, March 07, 2019 01:35:52 PM
-Model Score = -10405.015309407505
-stdt_accept_rate Score = -885.0945664409373
-rjct_rate Score = -975.5205793071359
-stdt_tchr_ratio Score = -482.16396573676974
-tst_scores Score = -670.6795165893456
-fac_salary Score = -3135.015062099098
-grad_rate Score = -994.4511569099334
-stdt_clss_stndng Score = -1082.2947239986954
-spending_per_stdt Score = -2938.5267971188982
-End search: Thu, March 07, 2019 01:35:52 PM
-
+Start search: Wed, October 04, 2023 01:42:45 PM
+End search: Wed, October 04, 2023 01:42:45 PM
-Example graph output from causal-cmd:
-Graph Nodes:
+================================================================================
+Graph Nodes:
spending_per_stdt;grad_rate;stdt_clss_stndng;rjct_rate;tst_scores;stdt_accept_rate;stdt_tchr_ratio;fac_salary
Graph Edges:
-1. fac_salary --- spending_per_stdt
-2. fac_salary --- stdt_accept_rate
+1. spending_per_stdt --- fac_salary
+2. spending_per_stdt --- rjct_rate
3. spending_per_stdt --- stdt_tchr_ratio
-4. stdt_clss_stndng --- rjct_rate
-5. tst_scores --- fac_salary
-6. tst_scores --- grad_rate
-7. tst_scores --- spending_per_stdt
-8. tst_scores --- stdt_clss_stndng
+4. stdt_accept_rate --- fac_salary
+5. stdt_clss_stndng --- rjct_rate
+6. stdt_clss_stndng --- tst_scores
+7. tst_scores --- fac_salary
+8. tst_scores --- grad_rate
+9. tst_scores --- rjct_rate
+10. tst_scores --- spending_per_stdt
Graph Attributes:
-BIC: -10405.015309
+Score: -5181.565079
Graph Node Attributes:
-BIC: [spending_per_stdt: -2938.526797;grad_rate: -994.451157;stdt_clss_stndng: -1082.294724;rjct_rate: -975.520579;tst_scores: -670.679517;stdt_accept_rate: -885.094566;stdt_tchr_ratio: -482.163966;fac_salary: -3135.015062]
+Score: [spending_per_stdt: -1408.4382541909688;grad_rate: -416.7933531919986;stdt_clss_stndng: -451.79480827547627;rjct_rate: -439.8087229322177;tst_scores: -330.2039598576225;stdt_accept_rate: -429.64771587695884;stdt_tchr_ratio: -208.85274641239832;fac_salary: -1496.025518245214]
-
Interpretation of graph output
The end of the file contains the causal graph edgesfrom the search procedure. Here is a key to the edge types:
-- A --- B - There is causal relationship between variable A and B but we cannot determine the direction of the relationship
+- A --- B - There is causal relationship between variable A and B, but we cannot determine the direction of the relationship
- A --> B - There is a causal relationship from variable A to B
The GFCI algorithm has additional edge types:
@@ -350,38 +325,29 @@ Sample Prior Knowledge File
requiredirect
x1 x2
-
The first line of the prior knowledge file must say /knowledge. And a prior knowledge file consists of three sections:
- addtemporal - tiers of variables where the first tier preceeds the last. Adding a asterisk next to the tier id prohibits edges between tier variables
-- forbiddirect - forbidden edges indicated by a list of pairs of variables
-- requireddirect - required edges indicated by a list of pairs of variables
+- forbiddirect - forbidden directed edges indicated by a list of pairs of variables: from -> to direction
+- requireddirect - required directed edges indicated by a list of pairs of variables: from -> to direction