Skip to content

Latest commit

 

History

History
1403 lines (702 loc) · 185 KB

File metadata and controls

1403 lines (702 loc) · 185 KB

Static Analysis

Syntactic Analysis

Pointer Analysis

Call Graph Analysis

Data-flow Analysis

Symbolic Execution

  • Large Language Model powered Symbolic Execution, (arXiv2025)

    • Abstract: Large Language Models (LLMs) have emerged as a promising alternative to traditional static program analysis methods, such as symbolic execution, offering the ability to reason over code directly without relying on theorem provers or SMT solvers. However, LLMs are also inherently probabilistic by nature, and therefore face significant challenges in relation to the accuracy and scale of the analysis in real-world application. Such issues often necessitate the use of larger LLMs with higher token l...
    • Labels: static analysis, symbolic execution

Abstract Interpretation

  • AbsInt-AI: Language Models for Abstract Interpretation, (ICLR2025)

    • Abstract: Static program analysis is a popular technique in software engineering. Traditional static analysis algorithms treat programs as sets of logical statements with well-defined semantics. These traditional analyzers can provide guarantees of their performance, such as guaranteeing that they will never miss a bug. However, they leave out lots of very rich information such as variable and field names. Language models for code on the other hand, take full advantage of information such as variable name...
    • Labels: static analysis, abstract interpretation, bug detection
  • Can LLMs Formally Reason as Abstract Interpreters for Program Analysis?, (arXiv2025)

    • Abstract: LLMs have demonstrated impressive capabilities in code generation and comprehension, but their potential in being able to perform program analysis in a formal, automatic manner remains under-explored. To that end, we systematically investigate whether LLMs can reason about programs using a program analysis framework called abstract interpretation. We prompt LLMs to follow two different strategies, denoted as Compositional and Fixed Point Equation, to formally reason in the style of abstract inte...
    • Labels: static analysis, abstract interpretation

Type Inference

  • An Empirical Study of Large Language Models for Type and Call Graph Analysis, (arXiv2024)

    • Abstract: Large Language Models (LLMs) are increasingly being explored for their potential in software engineering, particularly in static analysis tasks. In this study, we investigate the potential of current LLMs to enhance call-graph analysis and type inference for Python and JavaScript programs. We empirically evaluated 24 LLMs, including OpenAI's GPT series and open-source models like LLaMA and Mistral, using existing and newly developed benchmarks. Specifically, we enhanced TypeEvalPy, a micro-bench...
    • Labels: static analysis, type inference, call graph analysis
  • CKTyper: Enhancing Type Inference for Java Code Snippets by Leveraging Crowdsourcing Knowledge in Stack Overflow, (FSE2025)

    • Abstract: Code snippets are widely used in technical forums to demonstrate solutions to programming problems. They can be leveraged by developers to accelerate problem-solving. However, code snippets often lack concrete types of the APIs used in them, which impedes their understanding and resue. To enhance the description of a code snippet, a number of approaches are proposed to infer the types of APIs. Although existing approaches can achieve good performance, their performance is limited by ignoring oth...
    • Labels: static analysis, type inference
  • Generative Type Inference for Python, (ASE2023)

    • Abstract: Python is a popular dynamic programming language, evidenced by its ranking as the second most commonly used language on GitHub. However, its dynamic type system can lead to potential type errors, leading researchers to explore automatic type inference approaches for Python programs. Existing type inference approaches can be generally grouped into three categories, i.e., rule-based, supervised, and cloze- style approaches. The rule-based type inference approaches can ensure the accuracy of predic...
    • Labels: static analysis, type inference
  • Neurosymbolic Modular Refinement Type Inference, (ICSE2025)

    • Abstract: Refinement types, a type-based generalization of Floyd-Hoare logics, are an expressive and modular means of statically ensuring a wide variety of correctness, safety, and security properties of software. However, their expressiveness and modularity means that to use them, a developer must laboriously annotate all the functions in their code with potentially complex type specifications that specify the contract for that function. We present LHC, a neurosymbolic agent that uses LLMs to automatical...
    • Labels: static analysis, type inference
  • PyTy: Repairing Static Type Errors in Python, (ICSE2024)

    • Abstract: Gradual typing enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual typing in practice. This paper presents PyTy, an automated program repair approach targeted at statically dete...
    • Labels: code generation, program repair, static analysis, type inference
  • Risky Dynamic Typing-related Practices in Python: An Empirical Study, (TOSEM2024)

    • Abstract: Python’s dynamic typing nature provides developers with powerful programming abstractions. However, many type-related bugs are accumulated in code bases of Python due to the misuse of dynamic typing. The goal of this article is to aid in the understanding of developers’ high-risk practices toward dynamic typing and the early detection of type-related bugs. We first formulate the rules of six types of risky dynamic typing-related practices (type smells for short) in Python. We then develop a rule...
    • Labels: static analysis, type inference, bug detection, empirical study
  • TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference, (ICSE2025)

    • Abstract: Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex parameterized types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, des...
    • Labels: static analysis, type inference

Specification Inference

Equivalence Checking

Code Similarity Analysis

Bug Detection

Program Verification

  • Agentic Program Verification, (arXiv2025)

    • Abstract: Automatically generated code is gaining traction recently, owing to the prevalence of Large Language Models (LLMs). Further, the AlphaProof initiative has demonstrated the possibility of using AI for general mathematical reasoning. Reasoning about computer programs (software) can be accomplished via general mathematical reasoning; however, it tends to be more structured and richer in contexts. This forms an attractive proposition, since then AI agents can be used to reason about voluminous code ...
    • Labels: static analysis, program verification
  • Automated Program Refinement: Guide and Verify Code Large Language Model with Refinement Calculus, (POPL2025)

    • Abstract: Recently, the rise of code-centric large language models (LLMs) appears to have reshaped the software engineering world with low-barrier tools like Copilot that can generate code easily. However, there is no correctness guarantee for the code generated by LLMs, which suffer from the hallucination problem, and their output is fraught with risks. Besides, the end-to-end process from specification to code through LLMs is a non-transparent and uncontrolled black box. This opacity makes it difficult ...
    • Labels: code generation, program transformation, static analysis, program verification
  • Baldur: Whole-Proof Generation and Repair with Large Language Models, (FSE2023)

    • Abstract: Formally verifying software is a highly desirable but labor-intensive task. Recent work has developed methods to automate formal verification using proof assistants, such as Coq and Isabelle/HOL, e.g., by training a model to predict one proof step at a time and using that model to search through the space of possible proofs. This paper introduces a new method to automate formal verification: We use large language models, trained on natural language and code and fine-tuned on proofs, to generat...
    • Labels: static analysis, program verification
  • Can ChatGPT support software verification?, (FASE2024)

    • Abstract: Large language models have become increasingly effective in software engineering tasks such as code generation, debugging and repair. Language models like ChatGPT can not only generate code, but also explain its inner workings and in particular its correctness. This raises the question whether we can utilize ChatGPT to support formal software verification....
    • Labels: static analysis, program verification
  • Can large language models reason about program invariants?, (ICML2023)

    • Abstract: Identifying invariants is an important program analysis task with applications towards program understanding, bug finding, vulnerability analysis, and formal verification. Existing tools for identifying program invariants rely on dynamic analysis, requiring traces collected from multiple executions in order to produce reliable invariants. We study the application of large language models to invariant prediction, finding that models trained on source code and fine-tuned for invariant generation c...
    • Labels: static analysis, program verification
  • ClassInvGen: Class Invariant Synthesis using Large Language Models, (arXiv2025)

    • Abstract: Formal program specifications in the form of preconditions, postconditions, and class invariants have several benefits for the construction and maintenance of programs. They not only aid in program understanding due to their unambiguous semantics but can also be enforced dynamically (or even statically when the language supports a formal verifier). However, synthesizing high-quality specifications in an underlying programming language is limited by the expressivity of the specifications or the n...
    • Labels: static analysis, program verification
  • Clause2Inv: A Generate-Combine-Check Framework for Loop Invariant Inference, (ISSTA2025)

    • Abstract: Loop invariant inference is a fundamental, yet challenging, problem in program verification. Recent work adopts the guess-and-check framework, where candidate loop invariants are iteratively generated in the guess step and verified in the check step. A major challenge of this general framework is to produce high-quality candidate invariants in each iteration so that the inference procedure can converge quickly. Empirically, we observe that existing approaches may struggle with guessing the compl...
    • Labels: static analysis, program verification
  • CoqPilot, a plugin for LLM-based generation of proofs, (ASE2024)

    • Abstract: We present CoqPilot, a VS Code extension designed to help automate writing of Coq proofs. The plugin collects the parts of proofs marked with the admit tactic in a Coq file, i.e., proof holes, and combines LLMs along with non-machine-learning methods to generate proof candidates for the holes. Then, CoqPilot checks if each proof candidate solves the given subgoal and, if successful, replaces the hole with it. The focus of CoqPilot is twofold. Firstly, we want to allow users to seamlessly combine...
    • Labels: code generation, program synthesis, static analysis, program verification
  • Enchanting program specification synthesis by large language models using static analysis and program verification, (CAV2024)

    • Abstract: Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they either focus only on synthesizing loop invariants for numerical programs, or are tailored for specific...
    • Labels: static analysis, program verification, specification inference
  • Enhancing Automated Loop Invariant Generation for Complex Programs with Large Language Models, (arXiv2024)

    • Abstract: Automated program verification has always been an important component of building trustworthy software. While the analysis of real-world programs remains a theoretical challenge, the automation of loop invariant analysis has effectively resolved the problem. However, real-world programs that often mix complex data structures and control flows pose challenges to traditional loop invariant generation tools. To enhance the applicability of invariant generation techniques, we proposed ACInv, an Auto...
    • Labels: static analysis, program verification
  • Finding inductive loop invariants using large language models, (arXiv2023)

    • Abstract: Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding in...
    • Labels: static analysis, program verification
  • Hypothesis search: Inductive reasoning with language models, (ICLR2024)

    • Abstract: Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very poorly on more complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose...
    • Labels: code generation, program synthesis, static analysis, program verification
  • LLM Meets Bounded Model Checking: Neuro-symbolic Loop Invariant Inference, (ASE2024)

    • Abstract: Loop invariant inference, a key component in program verification, is a challenging task due to the inherent undecidability and complex loop behaviors in practice. Recently, machine learning based techniques have demonstrated impressive performance in generating loop invariants automatically. However, these methods highly rely on the labeled training data, and are intrinsically random and uncertain, leading to unstable performance. In this paper, we investigate a synergy of large language models...
    • Labels: static analysis, program verification
  • LLM-Aided Automatic Modeling for Security Protocol Verification, (ICSE2025)

    • Abstract: Symbolic protocol analysis serves as a pivotal technique for protocol design, security analysis, and the safeguarding of information assets. Several modern tools such as Tamarin and ProVerif have been proven successful in modeling and verifying real-world protocols, including complex protocols like TLS 1.3 and 5G AKA. However, developing formal models for protocol verification is a non-trivial task, which hinders the wide adoption of these powerful tools in practical protocol analysis. In this w...
    • Labels: static analysis, program verification
  • LLM-Generated Invariants for Bounded Model Checking Without Loop Unrolling, (ASE2024)

    • Abstract: We investigate a modification of the classical Bounded Model Checking (BMC) procedure that does not handle loops through unrolling but via modifications to the control flow graph (CFG). A portion of the CFG representing a loop is replaced by a node asserting invariants of the loop. We generate these invariants using Large Language Models (LLMs) and use a first-order theorem prover to ensure the correctness of the generated statements. We thus transform programs to loop-free variants in a sound m...
    • Labels: static analysis, program verification
  • Large Language Models for Safe Minimization, (ICSE2025)

    • Abstract: Several tasks in program analysis, verification, and testing are modeled as constraint solving problems, utilizing SMT solvers as the reasoning engine. In this work, we aim to investigate the reasoning capabilities of large language models (LLMs) toward reducing the size of an infeasible string constraint system by exploiting inter-constraint interactions such that the remaining ones are still unsatisfiable. We term this safe minimization. Motivated by preliminary observations of hallucination a...
    • Labels: static analysis, program verification
  • Laurel: Unblocking Automated Verification with Large Language Models, (OOPSLA2025)

    • Abstract: Program verifiers such as Dafny automate proofs by outsourcing them to an SMT solver. This automation is not perfect, however, and the solver often requires hints in the form of assertions, creating a burden for the proof engineer. In this paper, we propose, a tool that alleviates this burden by automatically generating assertions using large language models (LLMs). To improve the success rate of LLMs in this task, we design two domain-specific prompting techniques. First, we help the LLM determ...
    • Labels: static analysis, program verification
  • Lemur: Integrating large language models in automated program verification, (ICLR2024)

    • Abstract: The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that typically demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of derivation rules and prove its soundness. We instantiate the...
    • Labels: static analysis, program verification
  • Leveraging LLMs for Program Verification, (FMCAD2024)

    • Abstract: Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop’s behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program’s runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding induct...
    • Labels: static analysis, program verification
  • PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation, (NDSS2025)

    • Abstract: Formal verification is a technique that can prove the correctness of a system with respect to a certain specification or property. It is especially valuable for security-sensitive smart contracts that manage billions in cryptocurrency assets. Although existing research has developed various static verification tools (or provers) for smart contracts, a key missing component is theautomated generation of comprehensive properties, including invariants, pre-/post-conditions, and rules. Hence, indust...
    • Labels: static analysis, bug detection, program verification
  • QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning, (ICSE2025)

    • Abstract: Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis...
    • Labels: static analysis, program verification
  • Ranking llm-generated loop invariants for program verification, (EMNLP2023)

    • Abstract: Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results ...
    • Labels: static analysis, program verification, agent design, prompt strategy, sampling and ranking
  • SpecGen: Automated Generation of Formal Program Specifications via Large Language Models, (ICSE2025)

    • Abstract: In the software development process, formal program specifications play a crucial role in various stages, including requirement analysis, software testing, and verification. However, manually crafting formal program specifications is rather difficult, making the job time-consuming and labor-intensive. Moreover, it is even more challenging to write specifications that correctly and comprehensively describe the semantics of complex programs. To reduce the burden on software developers, automated s...
    • Labels: code generation, program synthesis, static analysis, specification inference, program verification
  • Towards AI-Assisted Synthesis of Verified Dafny Methods, (FSE2024)

    • Abstract: Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs’ specifications as well as code so that the code can be proved correct with respect to the specifications. Unfortunately, existing large language models show a severe lack of proficiency in ver...
    • Labels: code generation, program synthesis, static analysis, program verification
  • Towards General Loop Invariant Generation: A Benchmark of Programs with Memory Manipulation, (NeurIPS2024)

    • Abstract: Program verification is vital for ensuring software reliability, especially in the context of increasingly complex systems. Loop invariants, remaining true before and after each iteration of loops, are crucial for this verification process. Traditional provers and machine learning based methods for generating loop invariants often require expert intervention or extensive labeled data, and typically only handle numerical property verification. These methods struggle with programs involving comple...
    • Labels: static analysis, program verification, benchmark
  • Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming, (ICSE2025)

    • Abstract: Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging f...
    • Labels: code generation, program synthesis, static analysis, program verification, benchmark, empirical study
  • VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners, (arXiv2024)

    • Abstract: Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popularity has prompted research on safe and correct transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based approaches can theoretically produce correct transpilations that maintain input-output equivalence to th...
    • Labels: code generation, program transformation, static analysis, program verification
  • Verified Code Transpilation with LLMs, (NeurIPS2024)

    • Abstract: Domain-specific languages (DSLs) are integral to various software workflows. Such languages offer domain-specific optimizations and abstractions that improve code readability and maintainability. However, leveraging these languages requires developers to rewrite existing code using the specific DSL's API. While large language models (LLMs) have shown some success in automatic code transpilation, none of them provide any functional correctness guarantees on the transpiled code. Another approach f...
    • Labels: code generation, program synthesis, static analysis, program verification

Program Optimization

Code Summarization

Code Search

  • BinQuery: A Novel Framework for Natural Language-Based Binary Code Retrieval, (ISSTA2025)

    • Abstract: Binary Function Retrieval (BFR) is crucial in reverse engineering for identifying specific functions in binary code, especially those associated with malicious behavior or vulnerabilities. Traditional BFR methods rely on heuristics, often lacking the efficiency and adaptability needed for large-scale or diverse binary analysis tasks. To address these challenges, we present BinQuery, a Natural Language-based BFR (NL-based BFR) framework that uses natural language queries to retrieve relevant bina...
    • Labels: static analysis, code search, code summarization
  • LiSSA: Toward Generic Traceability Link Recovery Through Retrieval- Augmented Generation, (ICSE2025)

    • Abstract: There are a multitude of software artifacts which need to be handled during the development and maintenance of a software system. These artifacts interrelate in multiple, complex ways. Therefore, many software engineering tasks are enabled - and even empowered - by a clear understanding of artifact interrelationships and also by the continued advancement of techniques for automated artifact linking. However, current approaches in automatic Traceability Link Recovery (TLR) target mostly the links...
    • Labels: software maintenance and deployment, static analysis, code summarization, code search
  • Natural Is the Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models, (FSE2024)

    • Abstract: Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are often heavy in computational complexity, and quadratically with the length of the input code sequence. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input program should not rely on the attention patterns of an LL...
    • Labels: static analysis, code search, code summarization, code model, code model training, source code model
  • On the Effectiveness of Transfer Learning for Code Search, (TSE2023)

    • Abstract: The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to and improve code search. To this end, we pre-train a BERT-based model on combinations of natural language and source code data and fine-tune it on pairs of StackOverflow question titles and code answers. Our results show that the pre-trained models consistently ...
    • Labels: static analysis, code search, code model, code model training, source code model
  • Self-Supervised Query Reformulation for Code Search, (FSE2023)

    • Abstract: Automatic query reformulation is a widely utilized technology for enriching user requirements and enhancing the outcomes of code search. It can be conceptualized as a machine translation task, wherein the objective is to rephrase a given query into a more comprehensive alternative. While showing promising results, training such a model typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code...
    • Labels: static analysis, code search
  • Survey of Code Search Based on Deep Learning, (TOSEM2024)

    • Abstract: Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given natural language query by effectively capturing the semantic similarity between the query and code. Deep learning, being able to extract complex semantics information, has achieved great success in this field. Recently, various deep learning methods, such as graph neural networks and pretraining models, have b...
    • Labels: survey, static analysis, code search
  • Virtual Compiler Is All You Need For Assembly Code Search, (ACL2024)

    • Abstract: Assembly code search is vital for reducing the burden on reverse engineers, allowing them to quickly identify specific functions using natural language within vast binary programs.Despite its significance, this critical task is impeded by the complexities involved in building high-quality datasets. This paper explores training a Large Language Model (LLM) to emulate a general compiler. By leveraging Ubuntu packages to compile a dataset of 20 billion tokens, we further continue pre-train CodeLlam...
    • Labels: code generation, program transformation, static analysis, code search, code model, code model training, source code model
  • Zero-Shot Cross-Domain Code Search without Fine-Tuning, (FSE2025)

    • Abstract: Code search is a crucial task in software engineering, aiming to retrieve code snippets that are semantically relevant to a natural language query. Recently, Pre-trained Language Models (PLMs) have shown remarkable success and are widely adopted for code search tasks. However, PLM-based methods often struggle in cross-domain scenarios. When applied to a new domain, they typically require extensive fine-tuning with substantial data. Even worse, the data scarcity problem in new domains often force...
    • Labels: static analysis, code search

Software Composition Analysis