Conversation
There was a problem hiding this comment.
Pull request overview
Introduces a new node-embeddings domain that generates Neo4j GDS node-embedding outputs (CSV + UMAP SVG charts + Markdown summary) and updates the pipeline to support Java “internal/connected” type subsets and more robust CSV-result parsing.
Changes:
- Adds
domains/node-embeddings/with Cypher queries, CSV/Python/Markdown entrypoints, and report templates. - Updates projection + preparation flow to work with
ConnectedInternalJavaTypefor Java type projections and adds Java Type dependency enrichment steps. - Extends CSV parsing helpers with stricter error handling and a helper that safely handles “empty result” cases.
Reviewed changes
Copilot reviewed 53 out of 56 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/projectionFunctions.sh | Uses is_result_and_csv_column_greater_zero for verification and routes Java type projections through ConnectedInternalJavaType. |
| scripts/prepareAnalysis.sh | Adjusts enrichment flow (Java artifact/type deps) and adds label management for internal/connected Java types. |
| scripts/parseCsvFunctions.sh | Adds centralized error handling and a helper for “result + column > 0” checks. |
| domains/node-embeddings/summary/report_no_embedding_data.template.md | Fallback content when embedding data is missing. |
| domains/node-embeddings/summary/report.template.md | Main Markdown report template with include/fallback structure. |
| domains/node-embeddings/summary/nodeEmbeddingsSummary.sh | Assembles the Markdown report and embeds generated include fragments. |
| domains/node-embeddings/queries/statistics/Node_embeddings_with_community_and_centrality.cypher | Markdown stats query showing embeddings alongside community/centrality properties. |
| domains/node-embeddings/queries/statistics/Embedding_properties_overview.cypher | Markdown stats query summarizing which embedding properties exist per label. |
| domains/node-embeddings/queries/node-embeddings/Set_Parameters.cypher | Example params payload for running node-embedding queries. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4e_GraphSAGE_Write.cypher | Writes GraphSAGE embeddings back to nodes. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4d_GraphSAGE_Stream.cypher | Streams GraphSAGE embeddings with contextual columns for export. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4b_GraphSAGE_Train.cypher | Trains a GraphSAGE model on the cleaned projection. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3e_Node2Vec_Write.cypher | Writes Node2Vec embeddings back to nodes. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Tuneable_Stream.cypher | Tuneable Node2Vec streaming variant. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Stream.cypher | Streams Node2Vec embeddings with contextual columns for export. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3c_Node2Vec_Mutate.cypher | Mutates Node2Vec embeddings into the projection. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3a_Node2Vec_Estimate.cypher | Estimates Node2Vec memory usage. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Tuneable_Stream.cypher | Tuneable HashGNN streaming variant. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Stream.cypher | Streams HashGNN embeddings with contextual columns for export. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2c_Hash_GNN_Mutate.cypher | Mutates HashGNN embeddings into the projection. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2a_Hash_GNN_Estimate.cypher | Estimates HashGNN memory usage. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Write.cypher | Writes FastRP embeddings back to nodes. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Tuneable_Write.cypher | Tuneable FastRP write variant. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Tuneable_Stream.cypher | Tuneable FastRP streaming variant. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Stream.cypher | Streams FastRP embeddings with contextual columns for export. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1c_Fast_Random_Projection_Mutate.cypher | Mutates FastRP embeddings into the projection. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1b_Fast_Random_Projection_Statistics.cypher | FastRP stats query. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1a_Fast_Random_Projection_Estimate.cypher | FastRP estimate query. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0c_Drop_Model.cypher | Drops a GraphSAGE model by name. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0b_Prepare_Degree.cypher | Prepares a degree feature for GraphSAGE. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0a_Query_Calculated.cypher | Queries already-written embedding properties from nodes. |
| domains/node-embeddings/nodeEmbeddingsPython.sh | Python report entrypoint; optionally triggers CSV step if missing. |
| domains/node-embeddings/nodeEmbeddingsMarkdown.sh | Markdown report entrypoint delegating to summary script. |
| domains/node-embeddings/nodeEmbeddingsCsv.sh | CSV report entrypoint running multiple embedding algorithms and writing properties/CSVs. |
| domains/node-embeddings/nodeEmbeddingsCharts.py | Python chart generator: loads embeddings from Neo4j, UMAP-reduces, writes SVGs. |
| domains/node-embeddings/explore/NodeEmbeddingsTypescriptExploration.ipynb | TypeScript embeddings exploration notebook. |
| domains/node-embeddings/explore/NodeEmbeddingsJavaExploration.ipynb | Java embeddings exploration notebook. |
| domains/node-embeddings/README.md | Domain documentation (structure, entrypoints, outputs). |
| domains/node-embeddings/PREREQUISITES.md | Domain prerequisite documentation and execution order. |
| domains/node-embeddings/COPIED_FILES.md | Traceability of copied Cypher sources into the domain. |
| domains/anomaly-detection/explore/NodeEmbeddingsHyperparameterTuningExploration.ipynb | Updates tuning notebook to align with new Java type label usage. |
| cypher/Overview/Words_for_Wordcloud.cypher | Switches type word sources to internal Java types and renames intermediate variables. |
| cypher/Java/Remove_internal_java_type_labels.cypher | Removes the InternalJavaType label (utility query). |
| cypher/Java/Remove_connected_internal_java_type_labels.cypher | Removes the ConnectedInternalJavaType label (utility query). |
| cypher/Java/Label_internal_java_types.cypher | Adds InternalJavaType label based on bytecode presence and filters. |
| cypher/Java/Label_external_types_and_annotations.cypher | Narrows match scope to Java types using label expressions. |
| cypher/Java/Label_connected_internal_java_types.cypher | Adds ConnectedInternalJavaType label for internally connected Java types. |
| cypher/Dependency_Enrichment/Set_maven_artifact_version.cypher | Adds Maven-derived version property enrichment to artifacts. |
| cypher/Dependency_Enrichment/Set_Outgoing_Java_Type_Dependencies.cypher | Refines outgoing dependency calculation for Java types; skips recalculation when already populated. |
| cypher/Dependency_Enrichment/Set_Incoming_Java_Type_Dependencies.cypher | Refines incoming dependency calculation for Java types; skips recalculation when already populated. |
| cypher/Dependency_Enrichment/Outgoing_Java_Artifact_Dependencies.cypher | Adds outgoing dependency count/weight enrichment for artifacts. |
| cypher/Dependency_Enrichment/Incoming_Java_Artifact_Dependencies.cypher | Adds incoming dependency count/weight enrichment for artifacts. |
| cypher/Dependencies_Projection/Dependencies_4c_Create_Undirected_Java_Type_Projection.cypher | Removes the legacy specialized undirected Java type projection query. |
| cypher/Dependencies_Projection/Dependencies_3d_Create_Java_Method_Projection.cypher | Coalesces node/relationship counts to avoid nulls in output. |
| cypher/Dependencies_Projection/Dependencies_3c_Create_Java_Type_Projection.cypher | Removes the legacy specialized directed Java type projection query. |
| cypher/Dependencies_Projection/Dependencies_0_Verify_Projectable.cypher | Generalizes verification query across labels and improves example formatting. |
Comments suppressed due to low confidence (1)
scripts/parseCsvFunctions.sh:46
- In
get_csv_column_value, variables likecsv_string,column_name,header,values,index, and the arrays are not declaredlocal. Since this file is sourced, these assignments can leak into/overwrite variables in caller scripts. Please make these function-scoped withlocalto avoid hard-to-debug side effects.
…ctedInternalJavaType label Co-authored-by: Copilot <copilot@github.com>
37bfb04 to
160672c
Compare
There was a problem hiding this comment.
Pull request overview
Introduces a new vertical-slice node-embeddings domain that owns node-embedding generation (GDS), Python UMAP charting, and Markdown summary reporting, while updating the shared projection + enrichment pipeline to support Java internal type labeling and safer verification behavior.
Changes:
- Added
domains/node-embeddings/with CSV generation, Python charts (UMAP), and Markdown summary templates + statistics queries. - Updated projection + preparation pipeline to use
ConnectedInternalJavaTypeand to better handle empty verification-query results. - Added/adjusted Cypher enrichment and labeling queries (internal/connected Java types, artifact versions, dependency counts) and removed legacy Java-type projection queries.
Reviewed changes
Copilot reviewed 54 out of 57 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/reports/NodeEmbeddingsCsv.sh | Removed legacy node-embeddings CSV report script (domain version replaces it). |
| scripts/projectionFunctions.sh | Uses is_result_and_csv_column_greater_zero; defers verification until projectable; Java type projections now delegate via ConnectedInternalJavaType. |
| scripts/prepareAnalysis.sh | Adds Java internal/connected type labeling and Java type dependency enrichment; updates verification helper usage. |
| scripts/parseCsvFunctions.sh | Adds logError and introduces is_result_and_csv_column_greater_zero for “no result” cases. |
| domains/node-embeddings/summary/report_no_embedding_data.template.md | Adds fallback include when embedding data is missing. |
| domains/node-embeddings/summary/report.template.md | Adds node-embeddings Markdown report template with include points. |
| domains/node-embeddings/summary/nodeEmbeddingsSummary.sh | Generates Markdown includes + assembles final node embeddings report. |
| domains/node-embeddings/queries/statistics/Node_embeddings_with_community_and_centrality.cypher | Adds statistics query for sample embeddings + community + centrality. |
| domains/node-embeddings/queries/statistics/Embedding_properties_overview.cypher | Adds statistics query summarizing which embedding properties exist and counts/dimensions. |
| domains/node-embeddings/queries/node-embeddings/Set_Parameters.cypher | Adds example :params setup for node-embeddings queries. |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4e_GraphSAGE_Write.cypher | Adds GraphSAGE write-back query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4d_GraphSAGE_Stream.cypher | Adds GraphSAGE stream query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_4b_GraphSAGE_Train.cypher | Adds GraphSAGE train query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3e_Node2Vec_Write.cypher | Adds Node2Vec write query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Tuneable_Stream.cypher | Adds tuneable Node2Vec stream query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3d_Node2Vec_Stream.cypher | Adds Node2Vec stream query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3c_Node2Vec_Mutate.cypher | Adds Node2Vec mutate query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_3a_Node2Vec_Estimate.cypher | Adds Node2Vec estimate query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Tuneable_Stream.cypher | Adds tuneable HashGNN stream query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2d_Hash_GNN_Stream.cypher | Adds HashGNN stream query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2c_Hash_GNN_Mutate.cypher | Adds HashGNN mutate query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_2a_Hash_GNN_Estimate.cypher | Adds HashGNN estimate query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Write.cypher | Adds FastRP write query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1e_Fast_Random_Projection_Tuneable_Write.cypher | Adds tuneable FastRP write query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Tuneable_Stream.cypher | Adds tuneable FastRP stream query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1d_Fast_Random_Projection_Stream.cypher | Adds FastRP stream query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1c_Fast_Random_Projection_Mutate.cypher | Adds FastRP mutate query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1b_Fast_Random_Projection_Statistics.cypher | Adds FastRP stats query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_1a_Fast_Random_Projection_Estimate.cypher | Adds FastRP estimate query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0c_Drop_Model.cypher | Adds GraphSAGE model drop query (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0b_Prepare_Degree.cypher | Adds degree feature preparation for GraphSAGE (domain-local copy). |
| domains/node-embeddings/queries/node-embeddings/Node_Embeddings_0a_Query_Calculated.cypher | Adds query to fetch already-written embeddings with community/centrality. |
| domains/node-embeddings/nodeEmbeddingsPython.sh | Adds Python report entrypoint (auto-runs CSV step if needed). |
| domains/node-embeddings/nodeEmbeddingsMarkdown.sh | Adds Markdown report entrypoint (delegates to summary script). |
| domains/node-embeddings/nodeEmbeddingsCsv.sh | Adds CSV generation entrypoint for embeddings + write-back to graph. |
| domains/node-embeddings/nodeEmbeddingsCharts.py | Adds Neo4j-backed embedding loader + UMAP chart generator. |
| domains/node-embeddings/explore/NodeEmbeddingsTypescriptExploration.ipynb | Updates notebook title + disables validation. |
| domains/node-embeddings/explore/NodeEmbeddingsJavaExploration.ipynb | Updates notebook title + disables validation. |
| domains/node-embeddings/README.md | Adds domain documentation and entrypoint overview. |
| domains/node-embeddings/PREREQUISITES.md | Adds domain prerequisites and execution order documentation. |
| domains/anomaly-detection/explore/NodeEmbeddingsHyperparameterTuningExploration.ipynb | Updates projection creation logic and switches Type → ConnectedInternalJavaType. |
| cypher/Overview/Words_for_Wordcloud.cypher | Switches type selection to InternalJavaType and cleans up variable naming. |
| cypher/Java/Remove_internal_java_type_labels.cypher | Adds query to remove InternalJavaType label. |
| cypher/Java/Remove_connected_internal_java_type_labels.cypher | Adds query to remove ConnectedInternalJavaType label. |
| cypher/Java/Label_internal_java_types.cypher | Adds query to label internal Java types based on bytecode/version heuristics. |
| cypher/Java/Label_external_types_and_annotations.cypher | Tightens matching to Java types only (removes TS exclusion). |
| cypher/Java/Label_connected_internal_java_types.cypher | Adds query to label connected internal Java types. |
| cypher/Dependency_Enrichment/Set_maven_artifact_version.cypher | Adds artifact version enrichment from Maven POM. |
| cypher/Dependency_Enrichment/Set_Outgoing_Java_Type_Dependencies.cypher | Adjusts outgoing Java type dependency enrichment flow/guards. |
| cypher/Dependency_Enrichment/Set_Incoming_Java_Type_Dependencies.cypher | Adjusts incoming Java type dependency enrichment flow/guards. |
| cypher/Dependency_Enrichment/Outgoing_Java_Artifact_Dependencies.cypher | Adds outgoing Java artifact dependency counts/weights. |
| cypher/Dependency_Enrichment/Incoming_Java_Artifact_Dependencies.cypher | Adds incoming Java artifact dependency counts/weights. |
| cypher/Dependencies_Projection/Dependencies_4c_Create_Undirected_Java_Type_Projection.cypher | Removes legacy undirected Java type projection query (superseded). |
| cypher/Dependencies_Projection/Dependencies_3d_Create_Java_Method_Projection.cypher | Coalesces projection counts to avoid missing-result handling issues. |
| cypher/Dependencies_Projection/Dependencies_3c_Create_Java_Type_Projection.cypher | Removes legacy directed Java type projection query (superseded). |
| cypher/Dependencies_Projection/Dependencies_0_Verify_Projectable.cypher | Generalizes verification query beyond TS modules; improves example output. |
| README.md | Updates top-level documentation links to new domain notebook and script location. |
160672c to
0c177e6
Compare
🚀 Feature
⚙️ Optimization