Skip to content

Writing Schematics

Rodi edited this page Apr 6, 2022 · 4 revisions

The schematic language has a very similar syntax to Java. To create a schematic, a new text file is created.

A schematic file follows this format:

directive*

importStatement*

classDefinition*

Where * denotes 0 or more of the previous expression.

Directives

Directives are statements at the start of a schematic file indicating certain rules to the parser and resolver. The following directives are currently available:

Directive Parameters Description
option <name> <value> Sets the parser/resolver option with the given name to the given value.

The available options are given below:

Option Values (default) Description
access_modifiers strict|loose (strict) When resolving classes, if access_modifiers is strict, classes and fields are required to have the same access modifiers as declared in the schematic.

Import Statements

To make writing schematics easier, you can declare type name aliases (import types) at the top of the schematic file. Adding these statements, for example,

import java.lang.String;
import java.io.InputStream;

tells the schematic parser to resolve String to java.lang.String and InputStream to java.io.InputStream.

Class Definitions

The main content of a schematic file will likely be the class definitions. These declare patterns to search for within a given set of dex files. Classes are defined almost exactly as in Java but fields must be declared before methods.

A simple example is shown here:

import java.lang.String;
import java.io.InputStream;

class MyClass {
   
    String filePath;
    String parentFilePath;

    InputStream open()
}

This class definition will tell dexsearch to find any class in the given dex files which at least has two fields of type String and one method with return type InputStream (which are mapped to java.lang.String and java.io.InputStream respectively by the import statements).

It is important to note that the field and method names are not enforced here (i.e. they do not have to be filePath, parentFilePath and open in the matched class found in the dex file). To enforce field/method names, they must be prefixed by a $ as follows:

class MyClass {

    String $filePath;
    String $parentFilePath;

    InputStream $open()
}

Access Modifiers

Access modifiers for classes, fields and methods can be declared as in Java:

public class MyClass {

    private String $filePath;
    public String $parentFilePath;

    static InputStream $open()
}

These are ignored if the access_modifiers option is set to loose. The class keyword can also be substituted for interface or enum (e.g. interface MyInterface { ...).

Attributes

Class definitions and their members can be marked with attributes which change the behaviour of the parser and resolver. Attributes are declared before the definition of what they are attributing between a pair of square brackets and are separated by commas [attribute1, attribute2, ...].

[exact]
class MyClass {

    Object usefulField;

    [discard]
    String $methodName()
}

Here, the exact and discard attributes have been used as examples on MyClass and methodName respectively. All attributes and their usages are shown in the following table:

Attribute Usage Description
late class, field, method Indicates that the class, field or method should not be resolved naturally. Late objects are resolved only when another object requests it. For example, by an out type or binding event (see Types and Binding Events).
exact class Indicates a class' members must match those defined in the schematic exactly. If a candidate class has one too few or many fields or methods, it will not be bound.
certain class Indicates that a class' resolution is guaranteed to be successful. Certain classes are resolved first and cannot depend on uncertain classes (or their resolution will obviously fail).
fuzzy class Indicates a class should be resolved fuzzily. This means dexsearch will gather metrics about the best or closest match to the schematic-defined class, but will not bind this best match unless it would be matched had the class not been marked fuzzy.
very_fuzzy class Indicates a class should be resolved very fuzzily. This is the same as fuzzy, except the resolver will actually bind the closest match to the class, even if it would not have been bound non-fuzzily.
expected class Indicates that a class should not be bound to any other class than its expected class. See Expected Bindings
discard class, field, method Indicates the class, field or method should not be included in any mapping or code generation. Discarded objects are still allowed to be referenced in other parts of the schematic, however.
optional field, method Indicates that the field or method are not required for a successful binding of the parent class. It is often sensible to combine this with discard as there is no guarantee optional members will be resolved.
not field, method Indicates this field or method should not appear in the bound class for the parent class schematic.
conserve instruction Indicates this instruction should not consume the instruction it binds to, allowing later instructions to match the same instruction as the one marked with conserve. See Method Bodies.
strict instruction Indicates this instruction's binding event must be successful for the method containing it to successfully match.
marker none Currently unsupported. Indicates a marker who's member must be resolved after all previously declared members in the schematic and after all subsequently declared members.

Note, attribute names are case insensitive.

Expected Bindings

If a class name is known to have a persistent name in a dex file across builds, the resolver can be notified to check an expected class first.

import android.content.Context;

class MainActivity expects com.example.ui.MainActivity {

   void $attachBaseContext(Context)
}

This will tell the resolver to attempt to bind MainActivity to com.example.ui.MainActivity before trying every other class. Normal binding will occur, that is com.example.ui.MainActivity must contain a method with exact name attachBaseContext which has exactly one argument of type Context.

If this initial expected binding fails and the class is not marked with the expected attribute, dexsearch will attempt to bind MainActivity normally, as if the expects statement was not present.

Types and Type References

Multiple type expressions can be used with dexsearch. It is possible to reference Java types, resolved types in the schema, arrays and so on. These type expressions can be used anywhere in a schematic where you'd normally see them in Java (e.g. field types, method return types and in instruction matchers which will be shown later).

Basic Java Types

As shown already, basic Java types are written the same way they would be in Java. You can import them at the top of the schematic file and use them as field types or method return types. If not imported, they must be fully qualified, even if they are in the java.lang package and you wouldn't normally have to import them (i.e. java.lang.String not just String).

You can also reference known Java types in the dex files loaded by dexsearch (something like com.example.ui.MainActivity, for example).

Inheritance

Type inheritance can be modelled in a schematic in the same way it is done in Java.

class MyClass {

    < extends java.io.InputStream > inputStream;
    < implements android.view.View$OnClickListener > onClickListener;
}

Here, simple Java types were used (InputStream and View$OnClickListener). These types could be substituted with any of the other types shown in this section. Recursive inheritance definitions are also supported such as

< extends < implements android.view.View$OnClickListener > > nestedInheritanceExample;

Multiple interfaces can also be required by delimiting them with a comma (< implements Interface1, Interface2, ... >).

Note the lack of ? before extends and implements and the whitespace after < and before >. This is because the ? is verbose and Antlr's parser does not like it when there is no whitespace around the < and > for some reason.

Current Type

The current type that is attempting to be bound to the class schematic can be accessed through the this keyword. This is particularly useful for finding enums as all their fields are static and of the same type they are declared in.

enum MyEnum {
    static this $ENUM_FIELD_1;
    static this $ENUM_FIELD_2;
}

Referenced Types

Already resolved types from the current schematic file can be referenced by prefixing their name in the schema with a !.

import java.lang.Object;

class MyClass {
    Object exampleField;

    void $<init>(Object)
}

class MyOtherClass {
    !MyClass myClassField;
}

Note, if MyClass fails to resolve before MyOtherClass, MyOtherClass will also fail to resolve. Classes are resolved in the order they are defined in the schematic file, with the exception of certain classes, which are always resolved first.

Out Types

If a class will surely be resolved correctly, it can be used to resolve another undefined or late class. Out types are prefixed with a #.

import java.lang.String;

class CertainlyCorrect extends #ParentType {
    
    void $knownMethod(#KnownArgumentType)
}

[late]
class KnownArgumentType {

    String $knownStringField;
}

In the above example, both usages of out types are shown. They can either refer to other classes in the schematic marked as late, or completely undefined classes. Here, ParentType gets bound to whatever class CertainlyCorrect extends but KnownArgumentType only gets bound if knownMethod's first argument is a type which matches the late attributed KnownArgumentType's schematic definition. It is easy to see how powerful out types are at defining constraints on the other class schematics they're used within.

Primitive Types

Primitive types (such as int, char, long etc.) are defined as they are in Java.

Any Type

The any type matches any type and does not enforce any constraints. A * is used to indicate the any type.

* myFieldAnyType;

* methodReturningAnyTypeAndTakingAnySingleArgument(*)

Arrays

Array types are defined in the same way as Java. Just postfix any type (basic Java type, inheritance type, any type, out type etc.) with [] and it is considered an array of that type.

< extends java.io.InputStream >[] inputStreamArray;
*[] anyArray;

Varargs Type

If the number of arguments for a method are not known or not cared about, the varargs type can be used, denoted by ....

void varargsMethod1(...)

void varargsMethod2(java.lang.String, !MyClass, ...)

The varargs type allows 0 or more arguments of any type from the point it is used. For example, varargsMethod1 above can have 0 or more arguments of any type, while varargsMethod2 must have a String argument, then a MyClass argument followed by 0 or more arguments of any type.

Method Bodies

Method bodies are very useful for identifying classes and their members. dexsearch allows you to match method bodies based on a number of different features (and many more to come).

Strings

String matchers are the simplest method body matchers. They indicate that the method must contain a const-string instruction with the given string value. The usage is simple:

void exampleMethod() {
    // Matches a full string equal to "example String"
    .string "example String";
    // Matches any string that contains "example string"
    .string contains "example string";
    // Matches any string that contains only numbers
    .string regex "[0-9]+";
}

Type References

References to types can also be used as matchers for method bodies. Any types defined in the Types section can be used.

void exampleMethod() {
    .type java.lang.String;
    .type !MyClass;
}

Constructor Calls

More specifically, if the method contains a construction of a new instance of an object of a type, the .new [type] matcher can be used.

void exampleMethod() {
    // This method must create a new instance of MyClass
    .new !MyClass;
}

Field and Method References

References to members of types can be used as matchers too. The syntax is simple .field|.method [type]->[memberName];.

class MyClass {
    android.net.Uri uri;

    void exampleMethod() {
        // This method must reference java.lang.Double.isNaN
        .method java.lang.Double->isNaN;
        // This method must reference whatever field uri was resolved to
        .field this->!uri;
        // This method must reference any method in java.util.Collections
        .method java.util.Collections->*;
    }
}

Note, resolved field names must be prefixed with the ! reference operator. Referencing resolved fields and methods from other types is also acceptable (e.g. .field !MyClass->!uri in the method body of another class schematic defined after MyClass).

Bytecode Expressions

Currently, a handful of more specific bytecode matchers are supported. The only ones at the moment are bytecode expressions that set fields to integers, strings, parameters and registers.

import android.content.Context;
import android.net.Uri;
import java.lang.String;

class MyClass {
    String stringField;
    int intField;
    Uri uri;

    void $<init>(String, int) {
        // This method sets this class' stringField to the first parameter (p0) of this method.
        .expr this->!stringField = .p0;
        // This method sets this class' intField to the second parameter (p1) of this method.
        .expr this->!intField = .p1;
    }

    void $reset() {
        // This method sets this class' stringField to "constant string";
        .expr this->!stringField = "constant string";
        // This method sets this class' intField to the integer constant 0.
        .expr this->!intField = 0;
    }

    void $setUri(Context, Uri) {
        // This method sets this class' uri field to the second parameter (p1).
        .expr this->!uri = .p1;
    }
}

The parameter expressions (.p0, .p1, ...) can be replaced with .r0, r1, ... to refer to registers (usually local variables) rather than parameters. Note that any field from any type can be referenced by these expressions, not just this and field references.

Binding Events

All method body matchers support binding events. Binding events are operations that occur when a matcher is successfully bound (matched) to an instruction or set of instructions. This is particularly useful when obfuscated code keeps its toString methods in tact, as well as for obfuscated enums.

Binding events are specified directly after a method body matcher, separating the matcher and binding event by a ,. The currently supported event operations are shown in the table below.

Operation Arguments Description
bind <target> <identifier> <modifier> <source> Binds the source to the given identifier with type target, after traversing the method's instructions using the specified modifier. Target is either method or field. Supported modifiers are previous, current and next. The only source is reference.

This only really makes sense with an example.

import java.lang.String;

class MyClass {
    [late] String usefulString;

    void $<init>(String, String, String, String, ...) {
        // Here, we want to late bind the usefulString field to whatever field gets set to the 4th parameter in this constructor
        .expr this->* = .p3, bind field usefulString current reference;
    }
}

Here, the target is a field with identifier usefulString (the late attributed field at the top of the class definition). The modifier is current as the usefulString field must be bound to the reference in the current bytecode expression (the expression which sets any field * to the 4th parameter .p3).

Sometimes enums are obfuscated and their field names are not what they look like when they are referenced. Instead the correct names for the enum values are declared in the static constructor for the enum class, and we can use this information to late bind the enum's fields to their nice names.

enum MyEnum {
    // Static constructor
    static void $<clinit>() {
        .string "ENUM_VALUE_1", bind field ENUM_VALUE_1 next reference;
        .string "ENUM_VALUE_2", bind field ENUM_VALUE_2 next reference;
        .string "ENUM_VALUE_3", bind field ENUM_VALUE_3 next reference;
        .string "ENUM_VALUE_4", bind field ENUM_VALUE_4 next reference;
    }
}

Here, the next modifier is used as the const-string instructions matched to the .string bytecode matchers do not contain a reference to the field we want to bind. Conveniently, the next instruction which references a field actually references the correct field we would like to bind.