Skip to content

Bootstrapping

Peter Goodman edited this page Sep 19, 2023 · 2 revisions

What does PASTA's bootstrap process do, and why do we do it?

  1. Maintain liveness of the AST, via an underlying shared_ptr to pasta's ASTImpl. This part of the design had eventual python bindings in mind. If you've ever used Clang's Python bindings, then you can be bit by this annoying issue where the Python object tracking the clang::ASTContext goes out of scope, and then you start getting UAFs on all of your accessed to other things inside of the AST. In general, I find these kinds of clang/llvm/mlir designs where you have a big context object that isn't kept alive by their children/etc to be problematic from a binding perspective.

  2. Comprehensively replace clang::SourceLocation and clang::SourceRange with pasta::Token and pasta::TokenRange. Clang's source locations / ranges, and even clang::Token, are underpowered APIs that suck to use.

  3. Present value types everywhere, and communicate when something can be optional/missing. Clang's AST API is all pointer based. You don't know if a given method returning a clang::Decl * can return nullptr or not. The same goes for whether or not that method will assert. So the bootstrap process is responsible to knowing the set of "nullable" methods, and wrapping their return values with std::optional. A lot of this was just manually hitting these cases, or superficial ctrl+f for return nullptr; in various .h or .cpp files, and then we also have a more automated "blowtorch" method of trying to discover these to better keep on top of the problem.

  4. Regularize things. E.g. Decls, Stmts, Types, Attrs should all have a Kind enum. In Clang, it's close but not quite like that. Sometimes they're scoped inside a class, or called things like TypeClass. Other regularization is for renaming to make things nicer to read, e.g. dropping the get on getters, expanding contractions like Decl to Declaration, etc.

  5. Providing all the other things, like all the enums, all in one place (pasta/Forward.h) so that we pasta can present a self-contained world to the api that is completely independent of Clang APIs. Redistributing libraries+headers+cmake of a pre-compiled clang (i.e. in vcpkg) is annoying/problematic, and reduces the utility of pasta as a library. Thus, pasta is designed to be more easily packaged.

  6. Replacing problematic methods, and providing better variants. Some methods are problematic, e.g. FunctionDecl::getParameterSourceRange or FunctionDecl::getEllipsisLoc. These are problematic because their results are often completely wrong, as they rely on clang::SourceLocations possibly embedded in clang::Types, which are deduplicated, and so could be based on far-away-in-the-past locations.

  7. Introducing new utility functions, e.g. pasta::Type::SizeInBits.

  8. Unify clang::Type and clang::QualType under a single type hierarchy in PASTA.

Clone this wiki locally