Skip to content

Detect duplicate JSON keys in flow files #77

Description

@sfc-gh-pvillard

Problem

NiFi flow files are stored as JSON. When a merge conflict is not cleanly resolved, the resulting JSON can contain duplicate keys in the same object:

{
  "flowContents": { "processors": [ ...branch A... ] },
  "flowContents": { "processors": [ ...branch B... ] }
}

This is invalid JSON, but Jackson's POJO binding silently applies last-wins: the first flowContents is discarded and the second is used. The diff is then computed against incomplete data, and the PR comment can be silently wrong — missing changes or showing phantom ones — with no indication that anything is wrong. Exit code is 0 and the comment looks normal.

Proposed fix

Before parsing either flow file, scan it with a streaming JSON parser that has strict duplicate detection enabled. If a duplicate key is found:

  • Post a [CAUTION] block in the PR comment identifying the file and the exact line/column
  • Exit with a non-zero status code so the PR check is blocked until the file is fixed

Example output

Flow file `submitted-changes/flows/my-flow.json` contains duplicate JSON keys
(this typically indicates a merge conflict that was not fully resolved): Duplicate field 'flowContents'
Line 7, column 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions