Skip to content

Interval Analysis#965

Merged
hernanponcedeleon merged 65 commits intohernanponcedeleon:developmentfrom
tcherrou:pr-interval-analysis
Feb 25, 2026
Merged

Interval Analysis#965
hernanponcedeleon merged 65 commits intohernanponcedeleon:developmentfrom
tcherrou:pr-interval-analysis

Conversation

@tcherrou
Copy link
Copy Markdown
Contributor

Interval Analysis

Overview

This pull request adds interval analysis (also known as range analysis or value analysis) to dartagnan's analysis pipeline after preprocessing.
The analysis determines the intervals of integer register in the intermediate representation.
These intervals can be used for program simplification.
Here, it is used to add bounds to variables in the smt encoding to speed up verification.
This work was done for my master's thesis and concluded that using interval analysis.
master_thesis.pdf
The experimental results are a bit outdated as the most recent version is more conservative during the encoding phase (mostly with the bit vector encoding).

Features

  • An interval analysis for integer registers that derives bounds.
  • Use of interval analysis to encode bounds on SMT variables to limit the search space of the solver.
  • Tests for interval analysis and interval objects.
  • Encoding support for bit vector and integer encoding.

Specifics

Intervals

Representation

An interval for a register consists of a lower bound and upper bound such that r is within that range of values.
Since registers have a certain bitwidth, the length of the interval is bounded by the number of values representable with that bitwidth.
For example, for an 8-bit registers, the widest possible interval is [-128,255].

Signedness

To my knowledge, registers are neither signed nor unsigned and signedness by determined by operations.
This ambiguity is dealt with by not making a choice on signedness.
A drawback with this approach is that some interval are sensitive the sign while some are not.
For example, the interval [-1,1] is sound when interpreted as signed but unsound when interpreted as unsigned.
These type of intervals can occur when

In contrast, the interval [-5,-1] has the same set of values unsigned as signed in terms of their bit vector representation.

No wrapping

The "top" value of the interval lattice is [MIN_SIGNED,MAX_UNSIGNED].
Additionally, interval are not wrapped.
This means that overflows and underflows are conservatively dealt with by widening to the top value for that specific bitwidth.

Note

As in #955 llvm represents values above the signed upper bound (e.g. 128 for an 8-bit integer) as their signed versions.
This can lead to unsound intervals as values are interpreted as they are represented in the IR.
Suggestions to remedy this are always welcome.

Analysis

The pull request introduces an interface for querying the interval of an integer register at a specific program event.
There are three analyses, which differ in their precision.

  1. Naive Analysis just returns top for each register.
  2. Local Analysis analyses each thread in isolation by looking at Local events and propagating information using a work list algorithm.
  3. Global Analysis extends local analysis by leveraging dartagnan's relation analysis and employing read-from pairs to analyse Load instructions for additional precision.

Note

Since Global analysis depends on Relation analysis, it must be performed after Relation analysis.

Encoding

The interval of a register is used in the encoding phase by using the bounds to constrain the SMT variable associated.

For example, a register r0 with interval [0,5] will have the SMT constraints (assuming the integer encoding) (assert (0 <= r0)) and (assert (r0 <= 5)).

Notes

To the extent of my knowledge, knowledge of the sign bit is necessary when encoding bounds on bit vectors.
As registers have inherently no sign, only sign insensitive interval are encoded.
Maybe it is possible to add some sort of sign analysis here but I am unsure.

Usage

By default, naive analysis is enabled. One can choose a specific by using the option: program.analysis.interval.
The following choices are available:

  • naive
  • local
  • global

Tests

  • Analysis correctness tests
  • Unit tests for interval operations and initialisation.

Existing tests pass on my machine besides the ones that ran out of memory.
The RelationAnalysisTest file has been updated to include the updated analysis pipeline.

@ThomasHaas
Copy link
Copy Markdown
Collaborator

Thanks for the PR.

Have you ever tried only adding the bound constraints for registers that have small intervals?
Some while ago, a student of mine tried something similar, but it turned out that encoding bounds on some registers was actually bad, i.e., hurt the solver performance.
On the other hand, bounds that proved some register to be strictly positive (or negative) looked to be quite helpful, i.e.,
r in [0, 2^32-1] was bad, but r in [1, 2^32-1] was quite good.
If the bounds are trivial (TOP), you don't want to encode them I think, especially when dealing with integer encodings.

@hernanponcedeleon
Copy link
Copy Markdown
Owner

Hi @tcherrou ... happy to see the PR from the work you did for your thesis.

Can you please rebase on top of the latest development? I will take a look to the code during the weekend but in the meantime, I would like to run some experiments similar to those in your thesis to see if the performance improvement still holds based on our latest code base.

Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ProgramEncoder.java Outdated
@tcherrou
Copy link
Copy Markdown
Contributor Author

tcherrou commented Feb 2, 2026

Thanks for the PR.

Have you ever tried only adding the bound constraints for registers that have small intervals? Some while ago, a student of mine tried something similar, but it turned out that encoding bounds on some registers was actually bad, i.e., hurt the solver performance. On the other hand, bounds that proved some register to be strictly positive (or negative) looked to be quite helpful, i.e., r in [0, 2^32-1] was bad, but r in [1, 2^32-1] was quite good. If the bounds are trivial (TOP), you don't want to encode them I think, especially when dealing with integer encodings.

@ThomasHaas

  1. Could you please elaborate on your definition of "small"?
  2. Only considering strictly positive or negative intervals is an interesting idea and something I have not tried yet.
    I will try it out actually and see what the effects are. Which type of intervals yield the best performance is still an open question to me unfortunately. The software verification benchmark suite that seemed to benefit the most of interval analysis was pthread-wmm and all found an average interval size of around less than 3.
  3. I briefly encoded TOP intervals at the start but removed it later. The reason was that it did not seem to make sense in the bit vector encoding I think. As a result for ease of code I also do not add them for the integer encoding.

@ThomasHaas
Copy link
Copy Markdown
Collaborator

1. Could you please elaborate on your definition of "small"?

Small is anything that does not get too close to the bit vector sizes, but I think your analysis reaches TOP in most of those cases anyways. From what I have seen on some local tests is that all your intervals are basically small or TOP.

2. Only considering strictly positive or negative intervals is an interesting idea and something I have not tried yet.
   I will try it out actually and see what the effects are. Which type of intervals yield the best performance is still an open question to me unfortunately. The software verification benchmark suite that seemed to benefit the most of  interval analysis was  `pthread-wmm` and all found an average interval size of around less than 3.

After testing a little bit locally, I noticed that your approach already avoids encoding intervals crossing a sign-boundary, at least if doing BV encodings. So in that sense, you get that result already. However, what could be interesting is to only encode the signedness (>0. <0, =0) as an extreme lightweight encoding. IIRC, this information was useful on the fib_bench benchmarks (the unsat versions), especially when doing the lazy/caat method (and integer encoding?).

3. I briefly encoded TOP intervals at the start but removed it later. The reason was that it did not seem to make sense in the bit vector encoding I think. As a result for ease of code I also do not add them for the integer encoding.

For BVs, they don't make sense, I agree. I think they should also be avoided on integers, cause IIRC the bounds just degrade the performance of the solver heavily. But maybe you can try once to see for yourself :)

@tcherrou
Copy link
Copy Markdown
Contributor Author

tcherrou commented Feb 2, 2026

@hernanponcedeleon It should be pushed now.
I am expecting the bit vector the performance gains to be less for the bit vector encoding with some of the new changes. I have not tested the integer encoding with the new changes yet.

@hernanponcedeleon
Copy link
Copy Markdown
Owner

@tcherrou after #964, you need to register all classes using reflection in order for the options to work in native mode.

You need at least the following patch, maybe even more if you added some further options (I have not yet check the code in details)

diff --git a/dartagnan/pom.xml b/dartagnan/pom.xml
index cba2b1653..d61b51fc4 100644
--- a/dartagnan/pom.xml
+++ b/dartagnan/pom.xml
@@ -265,6 +265,7 @@
                             <className>com.dat3m.dartagnan.encoding.WmmEncoder</className>
                             <className>com.dat3m.dartagnan.program.analysis.ReachingDefinitionsAnalysis$Config</className>
                             <className>com.dat3m.dartagnan.program.analysis.alias.AliasAnalysis$Config</className>
+                            <className>com.dat3m.dartagnan.program.analysis.interval.IntervalAnalysis$Config</className>
                             <className>com.dat3m.dartagnan.program.processing.BranchReordering</className>
                             <className>com.dat3m.dartagnan.program.processing.Inlining</className>
                             <className>com.dat3m.dartagnan.program.processing.Intrinsics</className>
diff --git a/dartagnan/src/main/resources/META-INF/native-image/com.dat3m.dartagnan/dartagnan/reflect-config.json b/dartagnan/src/main/resources/META-INF/native-image/com.dat3m.dartagnan/dartagnan/reflect-config.json
index 9c421d5ec..bbd052476 100644
--- a/dartagnan/src/main/resources/META-INF/native-image/com.dat3m.dartagnan/dartagnan/reflect-config.json
+++ b/dartagnan/src/main/resources/META-INF/native-image/com.dat3m.dartagnan/dartagnan/reflect-config.json
@@ -59,6 +59,11 @@
         "allDeclaredFields":true,
         "queryAllDeclaredMethods":true
       },
+      {
+        "name":"com.dat3m.dartagnan.program.analysis.interval.IntervalAnalysis$Config",
+        "allDeclaredFields":true,
+        "queryAllDeclaredMethods":true
+      },
       {
         "name":"com.dat3m.dartagnan.program.processing.BranchReordering",
         "allDeclaredFields":true,

@tcherrou tcherrou force-pushed the pr-interval-analysis branch 2 times, most recently from 3cfadf6 to c36ec66 Compare February 3, 2026 13:46
@tcherrou
Copy link
Copy Markdown
Contributor Author

tcherrou commented Feb 3, 2026

There were duplicate commits in the commit history after rebasing. Removed them with a force push

Copy link
Copy Markdown
Owner

@hernanponcedeleon hernanponcedeleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First quick pass up to Interval.java:L241

Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/EncodingContext.java Outdated
Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ExpressionEncoder.java Outdated
Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ExpressionEncoder.java Outdated
Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ExpressionEncoder.java Outdated
Copy link
Copy Markdown
Owner

@hernanponcedeleon hernanponcedeleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments after a quick check.

@tcherrou please fix the conflicts

Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ProgramEncoder.java Outdated
Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ProgramEncoder.java Outdated
@tcherrou tcherrou force-pushed the pr-interval-analysis branch from 99f8c46 to f1d4570 Compare February 9, 2026 11:25
Comment thread dartagnan/src/main/java/com/dat3m/dartagnan/encoding/EncodingContext.java Outdated
@tcherrou tcherrou force-pushed the pr-interval-analysis branch from c3a0dc5 to fdbed3b Compare February 12, 2026 15:01
@tcherrou tcherrou marked this pull request as draft February 13, 2026 11:30
@tcherrou tcherrou force-pushed the pr-interval-analysis branch 3 times, most recently from d2688e8 to 762c7d4 Compare February 23, 2026 09:57
Also remove computing redundant information: total number of reads

Suggested-by: Hernan Ponce de Leon <hernanl.leon@huawei.com>
Suggested-by: Hernan Ponce de Leon <hernanl.leon@huawei.com>
The analysis does not require the task.
Global analysis only needs the memory model to get the `RF` relation.

Suggested-by: Thomas Haas <t.haas@tu-bs.de>
Suggested-by: Hernan Ponce de Leon <hernanl.leon@huawei.com>
Change interval analysis option NAIVE to NONE.
Introduces optional analyses where an analysis may not be present.
Optional analyses are registered via `Context.registerOptional` method.
Users must check if an optional analysis has been registered.

Suggested-by: Hernan Ponce de Leon <hernanl.leon@huawei.com> and Thomas Haas <t.haas@tu-bs.de>
Suggested-by: Thomas Haas <t.haas@tu-bs.de>
Extract `AbstractionExpressionEvaluator` from `IntervalAnalysisWorklist`
Rename `RegisterStateVisitor`.
Move `RegisterState` class to separate file.
Removed unnecessary cast

Suggested-by: Hernan Ponce de Leon <hernanl.leon@huawei.com>
More descriptive names.
Eliminate local variables that don't help with readibility.
Remove redundant assigments.
Grouped related methods together.
Removed unnecessary `getType` method.
Match relation analysis naming convention.

Suggested-by: Thomas Haas <t.haas@tu-bs.de>
Remove custom exception classes and replace them with Preconditions.
@tcherrou tcherrou force-pushed the pr-interval-analysis branch from 7ecbf5f to 8957a2c Compare February 25, 2026 10:41
@hernanponcedeleon
Copy link
Copy Markdown
Owner

@tcherrou there is only the open discussion in IntervalAnalysis.java about the statistics report in the log. Other than that, unless you have further planned changes, you can already remove the draft status. I think this is almost ready to be merged.

Changed `double` counters to `int`.

Suggested-by: Hernan Ponce de Leon <hernanl.leon@huawei.com>

Use more descriptive labels for #Register reads/top.
@tcherrou
Copy link
Copy Markdown
Contributor Author

There is one more operator I want to try and implement and I want to run some benchmarks with some of the suggestions in this conversation.
However, this can be done in a separate pull request.

@tcherrou tcherrou marked this pull request as ready for review February 25, 2026 11:12
@hernanponcedeleon
Copy link
Copy Markdown
Owner

There is one more operator I want to try and implement and I want to run some benchmarks with some of the suggestions in this conversation. However, this can be done in a separate pull request.

Yes, I would suggest getting this one merged (I am fine with the current status, so unless @ThomasHaas has further suggestions I will merge).

If you come up with improvements, PRs will always be welcomed :D

@ThomasHaas
Copy link
Copy Markdown
Collaborator

I am fine with merging.

There is one more operator I want to try and implement and I want to run some benchmarks with some of the suggestions in this conversation.

What I would find interesting is if you bounded the result of register writes rather than register reads, because the bounds on register writes imply bounds on register reads, but there are less writers than readers in general.
Another interesting thing is exploiting control-flow branching conditions to reduce the intervals.

If you ever feel bored, you can try out these things :P.

@tcherrou
Copy link
Copy Markdown
Contributor Author

What I would find interesting is if you bounded the result of register writes rather than register reads, because the bounds on register writes imply bounds on register reads, but there are less writers than readers in general.

This sounded interesting to me too.
Only encoding strictly positive or negative bounds for integers is something I want to try too.

Another interesting thing is exploiting control-flow branching conditions to reduce the intervals.

This was actually present in an early version of the analysis but removed for simplicity.
It will probably be easier to add now with the cleaner code.

@hernanponcedeleon hernanponcedeleon merged commit 9594f82 into hernanponcedeleon:development Feb 25, 2026
7 checks passed
@hernanponcedeleon
Copy link
Copy Markdown
Owner

Thanks @tcherrou for bearing with us over the long review.

Happy to see your thesis work finally merged (@apaolillo might also be happy about this).

@apaolillo
Copy link
Copy Markdown

Excellent! Congratulations @tcherrou ! 🎉

@tcherrou
Copy link
Copy Markdown
Contributor Author

I am happy to see it finally merged.
Thank you for the lengthy review it improved the code a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants