Add new method for FBA analysis #349

Shu-Feather · 2025-07-28T19:45:53Z

Add cell growth rate method for multivariant analysis, and modify cell mass method.
Both are compatible for multigeneration analysis, and can set variant and generation params to specify the variant and generation ID you wanna to analysis.

GCSFS schedules future, which does not work in atexit callbacks, so we revert back to PyArrow. Additionally, we now use os._exit to explicitly set an exit code of 1 when an exception is raised during finalization because Python ignores exceptions in atexit callbacks by default.

…d translation

Exceptions in atexit hooks are ignored and not reflected in the final exit code. Additionally, new futures cannot be scheduled in atexit hooks (after interpreter shutdown) so asyncio does not work.

thalassemia

I haven't looked very deeply at most of these scripts yet, but I figured I might as well post the feedback I currently have as it'll completely change how this PR looks. The biggest issue right now is the amount of duplicated code. You could tackle this by creating the helper function I outlined in one of my comments and import it into the relevant modules.

ecoli/analysis/multigeneration/fba_flux.py

ecoli/analysis/multigeneration/fba_flux_heatmap.py

ecoli/analysis/multivariant/cell_growth_rate.py

ecoli/analysis/multivariant/fba_flux_process.py

thalassemia · 2025-08-06T00:37:29Z

ecoli/analysis/multivariant/fba_flux_process.py

+    target_variants = params.get("variant", None)
+    target_generation = params.get("generation", None)
+
+    # Filter by specified variants and generations


You could add this filter to the SQL query to speed that up (potentially by a lot if only reading a small subset of a large workflow output) and reduce RAM usage.

This comment still applies. I think keeping RAM usage down is particularly important for multivariant/multiseed analyses where the amount of data being read can potentially be insanely high.

ecoli/analysis/multivariant/fba_flux_separate.py

ecoli/analysis/multivariant/fba_flux_union.py

thalassemia · 2025-08-06T01:02:13Z

Also, to solve the import issue with Escher in the Pytest, you'll need to add it to the package list with uv. I think uv add Escher. Then, commit the changes to pyproject.toml and uv.lock.

How big is the Escher package and its dependencies BTW? You can get this by comparing du -sh .venv with a fresh environment (uv sync --extra dev --frozen) and after uv add Escher. I want to avoid adding anything too big as a mandatory installation requirement, if possible.

Can you check the size of plotly and ipykernel as well?

vEcoli Sherlock Documentation update

…arams

Merge with up-to-date Master.

Shu-Feather · 2025-08-16T06:10:54Z

Here are some pic. plotted by these methods:
By cell_growth_rate.py in multivariant:

By cell_mass.py in multivariant:

By fba_flux.py in multivariant: (stacked or grid method)

By fba_flux_heatmap.py in multigeneration:

By fba_flux_pca.py in single:

Shu-Feather · 2025-08-16T23:57:16Z

Add several visualization methods for protein count visualization:

By catalyst_count.py in multivariant (grid or stacked mode):

By protein_count.py in multigeneration:

thalassemia

I only skimmed most of this, because there's just too much to review in depth. The biggest area for improvement right now is redundancy across a lot of scripts. You have, for the most part, already packaged a lot of duplicated code into helper functions, so refactoring should hopefully not be too hard. The type errors in Mypy might need explicit type hints to fix, and my previous comment about adding Escher with uv still applies (to fix the failing pytests).

thalassemia · 2025-08-18T18:42:35Z

ecoli/analysis/multigeneration/fba_flux_heatmap.py

+    # Filter by specified generations if provided
+    if target_generations is not None:
+        print(f"[INFO] Target generations: {target_generations}")
+        df = df.filter(pl.col("generation").is_in(target_generations))


You could build this generation filter into your SQL query with a WHERE clause to be more efficient.

thalassemia · 2025-08-19T00:18:46Z

ecoli/analysis/multigeneration/protein_count.py

+    SELECT time, generation,
+            counts as protein_count
+    FROM unnested_counts
+    WHERE idx = {protein_idx + 1}  -- SQL uses 1-based indexing


This is overkill. I'd recommend SELECT listeners__monomer_counts[{protein_idx + 1}] AS cnt or using ecoli.library.parquet_emitter.named_idx.

thalassemia · 2025-08-19T00:20:54Z

ecoli/analysis/multigeneration/protein_count.py

+        )
+        gen_summary_pd["lower"] = (
+            gen_summary_pd["mean_count"] - gen_summary_pd["std_count"]
+        )


Do you use the upper and lower columns?

thalassemia · 2025-08-19T00:29:49Z

ecoli/analysis/multivariant/catalyst_count.py

+
+    # Filter by specified variants and generations if provided
+    target_variants = params.get("variant", None)
+    target_generations = params.get("generation", None)


Highly recommend building these filters into the SQL query. For a multivariant analysis, the time saved by not reading some of the data can become significant.

thalassemia · 2025-08-19T00:32:34Z

ecoli/analysis/multivariant/catalyst_count.py

+                avg_catalyst_counts[biocyc_id] = variant_avgs
+
+    # Create visualization based on layout mode
+    if layout_mode == "grid":


Consider moving the code in this if...else into the above one for continuity.

thalassemia · 2025-08-26T01:23:31Z

ecoli/analysis/multigeneration/fba_flux_oscillating.py

@@ -0,0 +1,834 @@
+"""
+This script preprocesses FBA flux data by mapping extended reactions to base reactions,


There's lots of redundant code across the multigeneration fba_flux_ analyses. You've already organized things into helper functions so it may not be too hard to fix this.

thalassemia · 2025-08-26T01:29:03Z

ecoli/analysis/single/fba_flux_heat_scatter.py

@@ -0,0 +1,1004 @@
+"""
+This script preprocesses FBA flux data by mapping extended reactions to base reactions,


Also a lot of redundant code.

thalassemia · 2025-08-26T02:14:07Z

ecoli/analysis/single/fba_flux_pca.py

@@ -0,0 +1,380 @@
+"""
+Visualize FBA reaction flux dynamics using PCA trajectory analysis.


Interesting idea. Do you have any examples of reaction IDs that yielded interesting plots?

thalassemia · 2025-08-26T02:17:45Z

ecoli/analysis/multivariant/fba_flux_process.py

+    target_variants = params.get("variant", None)
+    target_generation = params.get("generation", None)
+
+    # Filter by specified variants and generations


This comment still applies. I think keeping RAM usage down is particularly important for multivariant/multiseed analyses where the amount of data being read can potentially be insanely high.

ecoli/analysis/multigeneration/fba_flux_heatmap.py

Shu-Feather and others added 30 commits June 25, 2025 03:08

Mitigate ecoli/multigen parts from wcEcoli to vEcoli

11aa79d

Fix dimension mismatch in average_monomer_counts

edb539c

Modification and Add configs

134f88b

Merge remote-tracking branch 'origin/master' into GetStart

4f1200f

Migration

ad329d5

Merge remote-tracking branch 'origin' into GetStart

ca8003e

Delete personal Config

8cc9be9

Add actions read permission to Docker image security scan

024d0bb

Test docker scan in PR

37096ed

More informative error message for existing output directory

bc63c13

Fix typo

673924c

Add curl for authentication on Google Cloud

2527d95

Clarify Google Cloud setup

cf57b31

Make outdirs for Nextflow config and workflow files to be copied

b54750c

Fix outdir determination

43bac0d

Give sim chance to finish cleanly in runscripts/sim.py

54731fc

Add multigeneration analysis focused on replication, transcription an…

839521e

…d translation

Explicilty finalize Parquet emitter instead of relying on atexit hook

1e32d84

Exceptions in atexit hooks are ignored and not reflected in the final exit code. Additionally, new futures cannot be scheduled in atexit hooks (after interpreter shutdown) so asyncio does not work.

Remove unused atexit import

8125da8

Add rna_decay_03_high analysis method in multigen

cd05466

Do not set success flag if exception raised

3f6d2c9

Ensure emits are finalized even if wrapper is interrupted

28345d4

Cleanup documentation for EcoliSim

a3bfd05

Handle keyboard interrupt in engine process

8a44960

BUG FIX at ecoli_master_sim.py

bb604d5

Specify uv install method

cad3044

File_path modification for analysis.py

1572e25

Modification for 5 multigeneration analysing methods

9fbe9fd

Delete wrong-named files

930e8af

Shu-Feather added 6 commits August 3, 2025 12:54

Add multigeneration FBA flux heatmap method

29f5c17

Format modification for multigeneration FBA flux heatmap method

6d1f41b

Remove all the Summary CSV file for analyses

cc31ce8

Add biochemical process flux analysis at multivariant method

a5d7b09

Add Escher API FBA flux visualization for single method analysis

f7ac5e2

Modification for hpc documentation

4971f7d

thalassemia requested changes Aug 6, 2025

View reviewed changes

Shu-Feather and others added 7 commits August 6, 2025 12:21

Merge pull request #350 from CovertLab/Doc_Curation

deeddff

vEcoli Sherlock Documentation update

Use utils functions and SQL query to rewrite the FBA analysis methods

fad15bf

Add modified FBA heatmap method

34b0e98

Modified cell growth rate plot in multivariant

b78175b

Add overall FBA flux analysis method in multivariant, controlled by p…

af38d12

…arams

Remove redundant FBA flux analysis in multivariant method

9ae9a0b

Merge remote-tracking branch 'origin' into GetStart

2500a55

Merge with up-to-date Master.

Shu-Feather requested a review from thalassemia August 16, 2025 06:13

Shu-Feather added 2 commits August 17, 2025 07:31

Add catalyst count visualization for specified reaction with BioCyc ID

47ef272

Add visualization for specified protein count

f66dbcf

Shu-Feather added 8 commits August 18, 2025 12:34

Add single method to analysis FBA burst reactions

e23e0fb

modified analysis utils

52672f3

Add method to extract data to csv

337fbe2

modify utils

2e8bd90

Add density curve subplot to heat scatter plot

fbfa383

add multigenerational level heat scatter plot

552ca6c

Add visualization method for oscillating reaction

10b7b8a

combined always_positive and always_negative type

74931cc

thalassemia requested changes Aug 26, 2025

View reviewed changes

thalassemia force-pushed the master branch from bec1813 to b066dc6 Compare December 10, 2025 01:58

		@@ -0,0 +1,834 @@
		"""
		This script preprocesses FBA flux data by mapping extended reactions to base reactions,

		@@ -0,0 +1,1004 @@
		"""
		This script preprocesses FBA flux data by mapping extended reactions to base reactions,

		@@ -0,0 +1,380 @@
		"""
		Visualize FBA reaction flux dynamics using PCA trajectory analysis.

Add new method for FBA analysis #349

Are you sure you want to change the base?

Add new method for FBA analysis #349

Conversation

Shu-Feather commented Jul 28, 2025

Uh oh!

thalassemia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thalassemia commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shu-Feather commented Aug 16, 2025

Uh oh!

Shu-Feather commented Aug 16, 2025

Uh oh!

thalassemia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thalassemia commented Aug 6, 2025 •

edited

Loading