Skip to content

[Bug?] Health NLOC may undercount route-style files and inflate churn risk #375

@joptimus

Description

@joptimus

Describe the Bug

I am seeing code health scores move in a way that looks surprising because file-level nloc appears to be calculated from detected function or method bodies, not from the whole file.

That may be intentional for complexity metrics, but it seems risky when the same nloc is used for file-level signals like churn risk and score weighting.

In my game-server repo, one JavaScript API route file is about 922 lines long, with roughly 728 non-empty/non-comment lines by a simple local count. Repowise stored nloc = 41 for that file in health_file_metrics.

That small denominator then made the churn-risk finding much harsher:

90-day churn rewrote 1076% of the file's lines
441 changed lines over 41 NLOC

Using the approximate whole-file NLOC, the same churn would be closer to:

441 / 728 ~= 61%

This caused the file score to drop from about 9.7 to 6.2 between snapshots, even though the actual latest diff touching that file was small. The trend page then showed it as one of the biggest score drops, which was confusing because the main work I had done was improving a different service.

Steps to Reproduce

  1. Run repowise init or repowise update on a JavaScript or TypeScript repo with Express-style route modules.
  2. Include route files where much of the executable logic is in route callbacks or top-level module code.
  3. Open the Health tab or inspect health_file_metrics.
  4. Compare the stored nloc for those files against a simple whole-file non-empty/non-comment line count.
  5. Check churn-risk findings for files where stored nloc is much smaller than the real file size.

Expected Behavior

For file-level health metrics, I would expect Repowise to distinguish between:

file_nloc: whole-file non-comment/non-blank lines
function_nloc: lines covered by detected function or method bodies

Then file-level process signals like churn risk would use file_nloc, while complexity biomarkers could still use function-level metrics.

If this is by design, it would be helpful to understand the intended meaning of nloc in health_file_metrics, because the current value is presented and used as a file-level denominator.

Actual Behavior

The stored nloc appears to be the sum of detected function or method body NLOC. For JS/TS route modules, this can undercount the real file substantially.

In the same repo, several API route files had very small or zero stored NLOC values despite being real source files:

example API route file: actual ~922 lines, simple NLOC ~728, stored nloc 41
other API route files: stored nloc values of 0, 3, 20, 35, etc.

That makes churn-per-NLOC and related health scores look much worse than they should for files where the AST walker does not cover most of the file.

Environment

  • OS: macOS
  • Python version: 3.14.2
  • Repowise version: 0.16.0
  • Installation method: pip

Additional Context

This seems most visible in JavaScript and TypeScript API route files, especially Express-style modules where route callbacks and top-level wiring are common.

It may also affect other languages or patterns where meaningful code can live outside normal detected function/method nodes:

  • Python scripts with module-level setup or if __name__ == "__main__" blocks
  • C# top-level programs or minimal API style code
  • C++ global/static initializers, macro-heavy code, or lambdas assigned outside functions
  • JavaScript/TypeScript config modules, route modules, and top-level callbacks

The trend page adds to the confusion because “Files that moved most since last index” appears to mean “score changed most,” not necessarily “file changed most.” That part may just need clearer wording, but the NLOC denominator issue looks like the main cause of the misleading score movement.

Could you confirm whether health_file_metrics.nloc is intended to represent whole-file NLOC or only function-covered NLOC? If it is meant to be whole-file NLOC, I think this is a scoring bug. If it is intentionally function-covered NLOC, maybe churn risk should use a separate whole-file denominator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions