Skip to content

use AI for data analysis and maybe report generation #7

@lukpueh

Description

@lukpueh

The current design (see #6) has three distinct phases:

data collection → data analysis → report generation
  1. For data collection we seem well covered. Basic API queries should provide us with a rich data set.

  2. Data analysis could definitely benefit from AI-assistance. Alongside simple analyzers with deterministic rules (e.g. check for presence or absence of data, count data points and match against threshold, etc...), we could experiment with prompt-based analyzers, which e.g. evaluate documentation quality (SECURITY.md, or other community files) using existing best practice examples.

  3. At a later stage, we might also want to experiment with AI-assisted report generation, e.g. to synthesize analysis results into summaries or recommendations. (Note that the separation between analysis and report generation allows us to keep individual analyzers independent, while operating on the larger set of analyzer results in the report generation phase).

Notes on security and privacy

  • In case, we need to make authenticated requests, we must maintain full control of credentials
  • Some of the collected data might be privacy sensitive, and must not be shared with third-party AI providers (should support local LLMs)

Alternatives
In an alternative approach an overarching "agentic tool" could orchestrate the entire process, deciding which data to collect, what analysis to run, and when to stop. The check runner implemented here would become a set of tools (or skills), which the AI can call as needed.

While this sounds interesting, it has a few important downsides:

  • Less predictable/reproducible (might skip checks or hallucinate conclusions between runs)
  • Less control (see privacy/security concerns above)
  • More expensive (in terms of money for 3rd party provider, or local compute)

I suggest we start with the "pipeline tool" described above, as it already covers most of what we are interested in without any AI use at all, while allowing us to experiment with AI in a controlled manner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions