Skip to content

Commit e583759

Browse files
Tweak and document more
1 parent b2a94d3 commit e583759

4 files changed

Lines changed: 281 additions & 39 deletions

File tree

README.md

Lines changed: 113 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,20 @@
11
# Flow Agent
22

3-
A Java agent that records a dynamic call graph of the chosen methods.
3+
A Java agent that records a dynamic call tree of the chosen methods.
44

5+
When attached to a JVM, the Flow Agent:
6+
- Instruments classes whose fully qualified names match configured prefixes
7+
- Records method entry and exit events
8+
- Writes one call tree file per thread
9+
- Generates a method ID mapping file (ids.properties)
10+
11+
Two output formats are supported:
12+
- binary: compact .flow files
13+
- jsonl: human-readable .jsonl files (JSON Lines, each line contains a JSON value)
14+
15+
The binary format is significantly smaller and recommended for large call trees.
16+
17+
---
518

619
## Build
720

@@ -10,14 +23,109 @@ A Java agent that records a dynamic call graph of the chosen methods.
1023
```
1124
The jar is generated in `build/libs/`.
1225

26+
---
1327

1428
## Usage
1529

30+
Attach the agent using the `-javaagent` option and provide arguments as comma-separated `key=value` pairs:
31+
1632
```sh
17-
java -javaagent:path/to/flow-agent.jar=prefix=<prefixList>,out=</path/to/file> \
33+
java -javaagent:path/to/flow-agent.jar=target=<prefix[+prefix...]>,out=<dir>[,format=binary|jsonl][,optimize=<dir>][,ids=<file>] \
1834
-jar your-application.jar
1935
```
2036

21-
- `prefix` takes a `+`-separated list of prefixes matching fully qualified class names. Some examples:
22-
- match all classes from the JaCoCo project: `org.jacoco.`
23-
- match all classes in the `fr.inria` AND `org.moosetechnology` packages: `fr.inria.+org.moosetechnology.`
37+
### Arguments
38+
39+
* **`target`** (required)
40+
A `+`-separated list of fully qualified class name prefixes to instrument.
41+
Only methods in classes whose names start with one of these prefixes will be recorded.
42+
43+
* **`out`** (required)
44+
Output directory where the agent will write:
45+
* `ids.properties`, the method ID mapping
46+
* One call tree file per thread, where invoked methods are referenced by ID
47+
48+
* **`format`** (optional)
49+
Output format for call tree files:
50+
* `binary` (default): compact and recommended
51+
* `jsonl`: human-readable JSON Lines
52+
53+
* **`optimize`** (optional)
54+
Path to an existing flow output directory.
55+
The agent will analyze previous trace data to generate an optimized method ID mapping.
56+
57+
* **`ids`** (optional)
58+
Path to an existing ID mapping file to reuse.
59+
60+
### Examples
61+
62+
Record calls using the default (binary) format:
63+
```sh
64+
java -javaagent:flow-agent.jar=target=com.myapp.,out=/tmp/flow/ \
65+
-jar myapp.jar
66+
```
67+
68+
Instrument multiple packages:
69+
```sh
70+
java -javaagent:flow-agent.jar=target=com.myapp.service+com.myapp.utils.+org.lib.,out=/tmp/flow/ \
71+
-jar myapp.jar
72+
```
73+
74+
### Optimization Workflow
75+
76+
When using the binary format, method IDs are encoded using a compact variable-length representation.
77+
Smaller method IDs produce smaller trace files.
78+
The agent can optimize method IDs based on observed call frequencies.
79+
80+
#### Why optimize?
81+
82+
* Frequently called methods receive smaller numeric IDs.
83+
* Smaller IDs result in smaller files.
84+
* Optimization can reduce storage size significantly in large call trees.
85+
86+
Optimization is useful in two scenarios:
87+
88+
1. **Minimizing a follow-up run of the same scenario**
89+
If you perform an initial trace to observe method usage, you can optimize the method IDs and then run the application a second time.
90+
The second run will produce a smaller trace file, which is more efficient to store and process.
91+
92+
2. **Best-effort optimization for similar runs**
93+
Even if the next run differs, previous runs can serve as a prediction of typical method usage.
94+
If the earlier run is representative, the optimized mapping will still reduce trace size.
95+
96+
#### How to optimize?
97+
98+
**Step 1**: Generate baseline trace
99+
```sh
100+
java -javaagent:flow-agent.jar=target=com.myapp.,out=/tmp/flow/ \
101+
-jar myapp.jar
102+
```
103+
104+
This produces:
105+
```
106+
/tmp/flow/
107+
ids.properties
108+
<thread>.flow
109+
```
110+
111+
**Step 2**: Generate optimized mapping
112+
113+
```sh
114+
java -javaagent:flow-agent.jar=target=com.myapp.,optimize=/tmp/flow/,out=/tmp/optimized-flow/ \
115+
-jar myapp.jar
116+
```
117+
118+
The agent will:
119+
* Scan files from `/tmp/flow/`
120+
* Count how often each method was called
121+
* Assign smaller IDs to more frequently called methods
122+
* Write a new optimized ID mapping in `/tmp/optimized-flow/`
123+
124+
**Step 3**: Reuse optimized IDs
125+
126+
```sh
127+
java -javaagent:flow-agent.jar=target=com.myapp.,ids=/tmp/optimized-flow/ids.properties,out=/tmp/flow-2/ \
128+
-jar myapp.jar
129+
```
130+
131+
Subsequent runs will produce smaller call tree files.

src/main/java/fr/bl/drit/flow/agent/AgentMain.java

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
import net.bytebuddy.matcher.ElementMatcher;
2222

2323
/**
24-
* The Java flow agent entry point. It records method call trees for target classes and writes them
24+
* The Java Flow agent entry point. It records method call trees for target classes and writes them
2525
* to files in a supplied directory. The directory will contain a method ID mapping file and a call
2626
* tree file for each thread. The call tree file format can be configured using the format argument,
2727
* which currently supports two formats: a compact binary format (.flow) and a more verbose JSON
@@ -229,12 +229,6 @@ private static Map<String, String> parseArgs(String args) {
229229
if (trimmed.isEmpty()) continue;
230230

231231
String[] kv = trimmed.split("=", 2);
232-
if (kv.length != 2) {
233-
System.err.println(
234-
"[flow-agent] Ignoring invalid agent argument entry (expected key=value): " + trimmed);
235-
continue;
236-
}
237-
238232
String key = kv[0].trim();
239233
String value = kv[1].trim();
240234

src/main/java/fr/bl/drit/flow/agent/BinaryThreadRecorder.java

Lines changed: 115 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,127 @@
88
import java.nio.file.StandardOpenOption;
99

1010
/**
11-
* Call tree binary recorder.
11+
* Records a call tree in a compact binary format.
1212
*
13-
* <pre>
14-
* [ENTER:0x80][METHOD_ID:packed_varint]
15-
* [EXIT:0x00][COUNT:packed_varint]
16-
* </pre>
13+
* <h2>Overview</h2>
1714
*
18-
* Where METHOD_ID refers to the ID of the method stored... TODO
15+
* <p>The generated {@code .flow} file is a linear sequence of events without any global header. Two
16+
* event types exist:
17+
*
18+
* <ul>
19+
* <li><b>ENTER</b>: a method entry event
20+
* <li><b>EXIT</b>: one or more consecutive method exits
21+
* </ul>
22+
*
23+
* <p>Method identifiers are numeric values assigned by a {@link MethodIdMapping} and stored in a
24+
* separate file. The mapping associates a method identifier with a fully qualified method
25+
* signature.
26+
*
27+
* <h2>Event Structure</h2>
28+
*
29+
* <p>Each event starts with a single byte that encodes both:
30+
*
31+
* <ul>
32+
* <li>the event type (ENTER or EXIT), and
33+
* <li>the beginning of an unsigned variable-length integer.
34+
* </ul>
35+
*
36+
* <p>The most significant bit (bit 7) determines the event type:
37+
*
38+
* <ul>
39+
* <li>{@code 1} ({@link #F_ENTER}): ENTER event
40+
* <li>{@code 0} ({@link #F_EXIT}): EXIT event
41+
* </ul>
42+
*
43+
* <h2>Packed First Byte Layout</h2>
44+
*
45+
* <p>The first byte of every event has the following structure:
46+
*
47+
* <pre><code class="language-text">
48+
* bit 7 : flag (1 = ENTER, 0 = EXIT)
49+
* bit 6 : continuation bit (1 = more bytes follow)
50+
* bits 5-0 : lowest 6 bits of the encoded value
51+
* </code></pre>
52+
*
53+
* <p>The encoded value depends on the event type:
54+
*
55+
* <ul>
56+
* <li>For ENTER events: the value is the method ID.
57+
* <li>For EXIT events: the value is the number of <i>additional</i> consecutive exits.
58+
* </ul>
59+
*
60+
* <h2>Variable-Length Integer Encoding</h2>
61+
*
62+
* <p>All integer values are encoded as unsigned variable-length integers in a format equivalent to
63+
* LEB128.
64+
*
65+
* <p>After the first byte, if the continuation bit (bit 6) is set, the remaining higher-order bits
66+
* of the value are encoded using standard 7-bit groups:
67+
*
68+
* <pre><code class="language-text">
69+
* bit 7 : continuation (1 = more bytes follow)
70+
* bits 6-0 : next 7 bits of the value
71+
* </code></pre>
72+
*
73+
* <p>Each subsequent byte contributes 7 additional bits to the value. Decoding proceeds by
74+
* accumulating payload bits while continuation bits are set.
75+
*
76+
* <h2>ENTER Event</h2>
77+
*
78+
* <p>An ENTER event encodes a single method entry as {@code [flag=1][methodId]}. The {@code
79+
* methodId} refers to the numeric identifier defined in the ID mapping file.
80+
*
81+
* <p>Example with {@code methodId = 1}: {@code 1000 0001}
82+
*
83+
* <p>Example with {@code methodId = 8192}: {@code 1100 0000 0000 0001}
84+
*
85+
* <h2>EXIT Event</h2>
86+
*
87+
* <p>EXIT events are run-length encoded.
88+
*
89+
* <p>Instead of writing one byte per exit, consecutive exits are accumulated internally and flushed
90+
* as a single event.
91+
*
92+
* <ul>
93+
* <li>If exactly one exit occurred: {@code [flag=0]} (no continuation, no payload)
94+
* <li>If {@code n > 1} consecutive exits occurred: {@code [flag=0][n - 1]}
95+
* </ul>
96+
*
97+
* <p>This means:
98+
*
99+
* <ul>
100+
* <li>{@code value == 0} represents a single exit.
101+
* <li>{@code value == k} represents {@code k + 1} consecutive exits.
102+
* </ul>
103+
*
104+
* <p>This run-length encoding significantly reduces file size when methods return in bursts.
105+
*
106+
* <h2>Decoding Algorithm (High-Level)</h2>
107+
*
108+
* <ol>
109+
* <li>Read first byte.
110+
* <li>Extract event type from bit 7.
111+
* <li>Extract continuation bit from bit 6.
112+
* <li>Extract lower 6 bits as initial value.
113+
* <li>If continuation is set, read additional LEB128 bytes and accumulate 7-bit groups.
114+
* <li>If ENTER: emit method entry with decoded {@code methodId}.
115+
* <li>If EXIT: emit {@code value + 1} exits.
116+
* </ol>
117+
*
118+
* @see #writeVarInt(long)
119+
* @see #writeFlagAndVarInt(int, long)
19120
*/
20121
public class BinaryThreadRecorder implements ThreadRecorder {
21122

22-
// === Event flags ===
123+
// ---------- Event flags ----------
23124

24125
/** Highest bit is 1 (0x80). */
25126
public static final byte F_ENTER = (byte) 0x80;
26127

27128
/** Highest bit is 0 (0x00). */
28129
public static final byte F_EXIT = 0x00;
29130

30-
// === Masks ===
131+
// ---------- Masks ----------
31132

32133
/** Flag bit, the highest bit (0x80). */
33134
public static final byte M_FLAG = (byte) 0x80;
@@ -47,14 +148,15 @@ public class BinaryThreadRecorder implements ThreadRecorder {
47148
/** Payload in a packed varint, the lowest 6 bits (0x3F). */
48149
public static final byte M_PACK_PAYLOAD = 0x3F;
49150

50-
// === State ===
151+
// ---------- State ----------
51152

52153
/** Output stream for the binary call tree data of the recorder's thread. */
53154
protected final OutputStream out;
54155

156+
/** The file containing the call tree data of the recorder's thread. */
55157
protected final String fileName;
56158

57-
/** Additional consecutive exits, 0 means exactly one exit. */
159+
/** Pending consecutive exits. */
58160
protected long pendingExits = 0L;
59161

60162
public BinaryThreadRecorder(Path outputDir) throws IOException {
@@ -96,7 +198,7 @@ protected void flushPendingExits() throws IOException {
96198
}
97199

98200
/**
99-
* Write an unsigned variable-length integer (LEB128 style).
201+
* Write an unsigned variable-length integer (LEB128-style).
100202
*
101203
* <pre><code class="language-text">
102204
* bit 7 : continuation (1 if more bytes follow)
@@ -112,12 +214,12 @@ protected void writeVarInt(long value) throws IOException {
112214
}
113215

114216
/**
115-
* Write unsigned variable-length integer with a 2-bit flag packed into the first byte.
217+
* Write unsigned variable-length integer with a 1-bit flag packed into the first byte.
116218
*
117219
* <p>First byte layout:
118220
*
119221
* <pre><code class="language-text">
120-
* bit 7 : flag
222+
* bit 7 : flag
121223
* bit 6 : continuation (1 if more bytes follow)
122224
* bits 5-0 : lowest 6 bits of value
123225
* </code></pre>

0 commit comments

Comments
 (0)