Skip to content

Introduce node-embeddings domain#572

Merged
JohT merged 8 commits intomainfrom
feature/introduce-node-embeddings-domain
May 1, 2026
Merged

Introduce node-embeddings domain#572
JohT merged 8 commits intomainfrom
feature/introduce-node-embeddings-domain

Conversation

@JohT JohT self-assigned this May 1, 2026
@JohT JohT changed the base branch from feature/introduce-algorithms-and-embeddings-domain to main May 1, 2026 17:22
@JohT JohT marked this pull request as ready for review May 1, 2026 17:22
@JohT JohT requested a review from Copilot May 1, 2026 17:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new node-embeddings domain that generates Neo4j GDS node-embedding outputs (CSV + UMAP SVG charts + Markdown summary) and updates the pipeline to support Java “internal/connected” type subsets and more robust CSV-result parsing.

Changes:

  • Adds domains/node-embeddings/ with Cypher queries, CSV/Python/Markdown entrypoints, and report templates.
  • Updates projection + preparation flow to work with ConnectedInternalJavaType for Java type projections and adds Java Type dependency enrichment steps.
  • Extends CSV parsing helpers with stricter error handling and a helper that safely handles “empty result” cases.

Reviewed changes

Copilot reviewed 53 out of 56 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/projectionFunctions.sh Uses is_result_and_csv_column_greater_zero for verification and routes Java type projections through ConnectedInternalJavaType.
scripts/prepareAnalysis.sh Adjusts enrichment flow (Java artifact/type deps) and adds label management for internal/connected Java types.
scripts/parseCsvFunctions.sh Adds centralized error handling and a helper for “result + column > 0” checks.
domains/node-embeddings/summary/report_no_embedding_data.template.md Fallback content when embedding data is missing.
domains/node-embeddings/summary/report.template.md Main Markdown report template with include/fallback structure.
domains/node-embeddings/summary/nodeEmbeddingsSummary.sh Assembles the Markdown report and embeds generated include fragments.
domains/node-embeddings/queries/statistics/Node_embeddings_with_community_and_centrality.cypher Markdown stats query showing embeddings alongside community/centrality properties.
domains/node-embeddings/queries/statistics/Embedding_properties_overview.cypher Markdown stats query summarizing which embedding properties exist per label.
domains/node-embeddings/queries/node-embeddings/Set_Parameters.cypher Example params payload for running node-embedding queries.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4e_GraphSAGE_Write.cypher Writes GraphSAGE embeddings back to nodes.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4d_GraphSAGE_Stream.cypher Streams GraphSAGE embeddings with contextual columns for export.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4b_GraphSAGE_Train.cypher Trains a GraphSAGE model on the cleaned projection.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3e_Node2Vec_Write.cypher Writes Node2Vec embeddings back to nodes.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Tuneable_Stream.cypher Tuneable Node2Vec streaming variant.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Stream.cypher Streams Node2Vec embeddings with contextual columns for export.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3c_Node2Vec_Mutate.cypher Mutates Node2Vec embeddings into the projection.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3a_Node2Vec_Estimate.cypher Estimates Node2Vec memory usage.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Tuneable_Stream.cypher Tuneable HashGNN streaming variant.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Stream.cypher Streams HashGNN embeddings with contextual columns for export.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2c_Hash_GNN_Mutate.cypher Mutates HashGNN embeddings into the projection.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2a_Hash_GNN_Estimate.cypher Estimates HashGNN memory usage.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Write.cypher Writes FastRP embeddings back to nodes.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Tuneable_Write.cypher Tuneable FastRP write variant.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Tuneable_Stream.cypher Tuneable FastRP streaming variant.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Stream.cypher Streams FastRP embeddings with contextual columns for export.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1c_Fast_Random_Projection_Mutate.cypher Mutates FastRP embeddings into the projection.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1b_Fast_Random_Projection_Statistics.cypher FastRP stats query.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1a_Fast_Random_Projection_Estimate.cypher FastRP estimate query.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0c_Drop_Model.cypher Drops a GraphSAGE model by name.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0b_Prepare_Degree.cypher Prepares a degree feature for GraphSAGE.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0a_Query_Calculated.cypher Queries already-written embedding properties from nodes.
domains/node-embeddings/nodeEmbeddingsPython.sh Python report entrypoint; optionally triggers CSV step if missing.
domains/node-embeddings/nodeEmbeddingsMarkdown.sh Markdown report entrypoint delegating to summary script.
domains/node-embeddings/nodeEmbeddingsCsv.sh CSV report entrypoint running multiple embedding algorithms and writing properties/CSVs.
domains/node-embeddings/nodeEmbeddingsCharts.py Python chart generator: loads embeddings from Neo4j, UMAP-reduces, writes SVGs.
domains/node-embeddings/explore/NodeEmbeddingsTypescriptExploration.ipynb TypeScript embeddings exploration notebook.
domains/node-embeddings/explore/NodeEmbeddingsJavaExploration.ipynb Java embeddings exploration notebook.
domains/node-embeddings/README.md Domain documentation (structure, entrypoints, outputs).
domains/node-embeddings/PREREQUISITES.md Domain prerequisite documentation and execution order.
domains/node-embeddings/COPIED_FILES.md Traceability of copied Cypher sources into the domain.
domains/anomaly-detection/explore/NodeEmbeddingsHyperparameterTuningExploration.ipynb Updates tuning notebook to align with new Java type label usage.
cypher/Overview/Words_for_Wordcloud.cypher Switches type word sources to internal Java types and renames intermediate variables.
cypher/Java/Remove_internal_java_type_labels.cypher Removes the InternalJavaType label (utility query).
cypher/Java/Remove_connected_internal_java_type_labels.cypher Removes the ConnectedInternalJavaType label (utility query).
cypher/Java/Label_internal_java_types.cypher Adds InternalJavaType label based on bytecode presence and filters.
cypher/Java/Label_external_types_and_annotations.cypher Narrows match scope to Java types using label expressions.
cypher/Java/Label_connected_internal_java_types.cypher Adds ConnectedInternalJavaType label for internally connected Java types.
cypher/Dependency_Enrichment/Set_maven_artifact_version.cypher Adds Maven-derived version property enrichment to artifacts.
cypher/Dependency_Enrichment/Set_Outgoing_Java_Type_Dependencies.cypher Refines outgoing dependency calculation for Java types; skips recalculation when already populated.
cypher/Dependency_Enrichment/Set_Incoming_Java_Type_Dependencies.cypher Refines incoming dependency calculation for Java types; skips recalculation when already populated.
cypher/Dependency_Enrichment/Outgoing_Java_Artifact_Dependencies.cypher Adds outgoing dependency count/weight enrichment for artifacts.
cypher/Dependency_Enrichment/Incoming_Java_Artifact_Dependencies.cypher Adds incoming dependency count/weight enrichment for artifacts.
cypher/Dependencies_Projection/Dependencies_4c_Create_Undirected_Java_Type_Projection.cypher Removes the legacy specialized undirected Java type projection query.
cypher/Dependencies_Projection/Dependencies_3d_Create_Java_Method_Projection.cypher Coalesces node/relationship counts to avoid nulls in output.
cypher/Dependencies_Projection/Dependencies_3c_Create_Java_Type_Projection.cypher Removes the legacy specialized directed Java type projection query.
cypher/Dependencies_Projection/Dependencies_0_Verify_Projectable.cypher Generalizes verification query across labels and improves example formatting.
Comments suppressed due to low confidence (1)

scripts/parseCsvFunctions.sh:46

  • In get_csv_column_value, variables like csv_string, column_name, header, values, index, and the arrays are not declared local. Since this file is sourced, these assignments can leak into/overwrite variables in caller scripts. Please make these function-scoped with local to avoid hard-to-debug side effects.

Comment thread scripts/parseCsvFunctions.sh
Comment thread scripts/parseCsvFunctions.sh
Comment thread scripts/prepareAnalysis.sh Outdated
Comment thread domains/node-embeddings/nodeEmbeddingsCharts.py
Comment thread cypher/Java/Label_connected_internal_java_types.cypher Outdated
@JohT JohT force-pushed the feature/introduce-node-embeddings-domain branch from 37bfb04 to 160672c Compare May 1, 2026 18:12
@JohT JohT requested a review from Copilot May 1, 2026 18:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new vertical-slice node-embeddings domain that owns node-embedding generation (GDS), Python UMAP charting, and Markdown summary reporting, while updating the shared projection + enrichment pipeline to support Java internal type labeling and safer verification behavior.

Changes:

  • Added domains/node-embeddings/ with CSV generation, Python charts (UMAP), and Markdown summary templates + statistics queries.
  • Updated projection + preparation pipeline to use ConnectedInternalJavaType and to better handle empty verification-query results.
  • Added/adjusted Cypher enrichment and labeling queries (internal/connected Java types, artifact versions, dependency counts) and removed legacy Java-type projection queries.

Reviewed changes

Copilot reviewed 54 out of 57 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/reports/NodeEmbeddingsCsv.sh Removed legacy node-embeddings CSV report script (domain version replaces it).
scripts/projectionFunctions.sh Uses is_result_and_csv_column_greater_zero; defers verification until projectable; Java type projections now delegate via ConnectedInternalJavaType.
scripts/prepareAnalysis.sh Adds Java internal/connected type labeling and Java type dependency enrichment; updates verification helper usage.
scripts/parseCsvFunctions.sh Adds logError and introduces is_result_and_csv_column_greater_zero for “no result” cases.
domains/node-embeddings/summary/report_no_embedding_data.template.md Adds fallback include when embedding data is missing.
domains/node-embeddings/summary/report.template.md Adds node-embeddings Markdown report template with include points.
domains/node-embeddings/summary/nodeEmbeddingsSummary.sh Generates Markdown includes + assembles final node embeddings report.
domains/node-embeddings/queries/statistics/Node_embeddings_with_community_and_centrality.cypher Adds statistics query for sample embeddings + community + centrality.
domains/node-embeddings/queries/statistics/Embedding_properties_overview.cypher Adds statistics query summarizing which embedding properties exist and counts/dimensions.
domains/node-embeddings/queries/node-embeddings/Set_Parameters.cypher Adds example :params setup for node-embeddings queries.
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4e_GraphSAGE_Write.cypher Adds GraphSAGE write-back query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4d_GraphSAGE_Stream.cypher Adds GraphSAGE stream query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4b_GraphSAGE_Train.cypher Adds GraphSAGE train query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3e_Node2Vec_Write.cypher Adds Node2Vec write query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Tuneable_Stream.cypher Adds tuneable Node2Vec stream query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Stream.cypher Adds Node2Vec stream query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3c_Node2Vec_Mutate.cypher Adds Node2Vec mutate query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3a_Node2Vec_Estimate.cypher Adds Node2Vec estimate query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Tuneable_Stream.cypher Adds tuneable HashGNN stream query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Stream.cypher Adds HashGNN stream query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2c_Hash_GNN_Mutate.cypher Adds HashGNN mutate query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2a_Hash_GNN_Estimate.cypher Adds HashGNN estimate query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Write.cypher Adds FastRP write query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Tuneable_Write.cypher Adds tuneable FastRP write query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Tuneable_Stream.cypher Adds tuneable FastRP stream query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Stream.cypher Adds FastRP stream query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1c_Fast_Random_Projection_Mutate.cypher Adds FastRP mutate query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1b_Fast_Random_Projection_Statistics.cypher Adds FastRP stats query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1a_Fast_Random_Projection_Estimate.cypher Adds FastRP estimate query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0c_Drop_Model.cypher Adds GraphSAGE model drop query (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0b_Prepare_Degree.cypher Adds degree feature preparation for GraphSAGE (domain-local copy).
domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0a_Query_Calculated.cypher Adds query to fetch already-written embeddings with community/centrality.
domains/node-embeddings/nodeEmbeddingsPython.sh Adds Python report entrypoint (auto-runs CSV step if needed).
domains/node-embeddings/nodeEmbeddingsMarkdown.sh Adds Markdown report entrypoint (delegates to summary script).
domains/node-embeddings/nodeEmbeddingsCsv.sh Adds CSV generation entrypoint for embeddings + write-back to graph.
domains/node-embeddings/nodeEmbeddingsCharts.py Adds Neo4j-backed embedding loader + UMAP chart generator.
domains/node-embeddings/explore/NodeEmbeddingsTypescriptExploration.ipynb Updates notebook title + disables validation.
domains/node-embeddings/explore/NodeEmbeddingsJavaExploration.ipynb Updates notebook title + disables validation.
domains/node-embeddings/README.md Adds domain documentation and entrypoint overview.
domains/node-embeddings/PREREQUISITES.md Adds domain prerequisites and execution order documentation.
domains/anomaly-detection/explore/NodeEmbeddingsHyperparameterTuningExploration.ipynb Updates projection creation logic and switches Type → ConnectedInternalJavaType.
cypher/Overview/Words_for_Wordcloud.cypher Switches type selection to InternalJavaType and cleans up variable naming.
cypher/Java/Remove_internal_java_type_labels.cypher Adds query to remove InternalJavaType label.
cypher/Java/Remove_connected_internal_java_type_labels.cypher Adds query to remove ConnectedInternalJavaType label.
cypher/Java/Label_internal_java_types.cypher Adds query to label internal Java types based on bytecode/version heuristics.
cypher/Java/Label_external_types_and_annotations.cypher Tightens matching to Java types only (removes TS exclusion).
cypher/Java/Label_connected_internal_java_types.cypher Adds query to label connected internal Java types.
cypher/Dependency_Enrichment/Set_maven_artifact_version.cypher Adds artifact version enrichment from Maven POM.
cypher/Dependency_Enrichment/Set_Outgoing_Java_Type_Dependencies.cypher Adjusts outgoing Java type dependency enrichment flow/guards.
cypher/Dependency_Enrichment/Set_Incoming_Java_Type_Dependencies.cypher Adjusts incoming Java type dependency enrichment flow/guards.
cypher/Dependency_Enrichment/Outgoing_Java_Artifact_Dependencies.cypher Adds outgoing Java artifact dependency counts/weights.
cypher/Dependency_Enrichment/Incoming_Java_Artifact_Dependencies.cypher Adds incoming Java artifact dependency counts/weights.
cypher/Dependencies_Projection/Dependencies_4c_Create_Undirected_Java_Type_Projection.cypher Removes legacy undirected Java type projection query (superseded).
cypher/Dependencies_Projection/Dependencies_3d_Create_Java_Method_Projection.cypher Coalesces projection counts to avoid missing-result handling issues.
cypher/Dependencies_Projection/Dependencies_3c_Create_Java_Type_Projection.cypher Removes legacy directed Java type projection query (superseded).
cypher/Dependencies_Projection/Dependencies_0_Verify_Projectable.cypher Generalizes verification query beyond TS modules; improves example output.
README.md Updates top-level documentation links to new domain notebook and script location.

Comment thread scripts/parseCsvFunctions.sh
Comment thread domains/node-embeddings/nodeEmbeddingsCharts.py Outdated
Comment thread domains/node-embeddings/README.md Outdated
Comment thread domains/node-embeddings/PREREQUISITES.md Outdated
@JohT JohT force-pushed the feature/introduce-node-embeddings-domain branch from 160672c to 0c177e6 Compare May 1, 2026 18:43
@JohT JohT merged commit 75a6fc1 into main May 1, 2026
11 checks passed
@JohT JohT deleted the feature/introduce-node-embeddings-domain branch May 1, 2026 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants