Call Graph Basics
This document explains the core concepts behind call graph generation in cg4j.
What is a Call Graph?
A call graph is a directed graph that represents calling relationships between methods in a program.
- Nodes: Methods in your program
- Edges: Method calls (A → B means "method A calls method B")
Example:
Why Are Call Graphs Useful?
- Program understanding: Visualize how code flows through your application
- Impact analysis: Find what breaks when you change a method
- Security analysis: Trace paths to sensitive operations
- Dead code detection: Find unreachable methods
- Testing: Identify test coverage gaps
Types of Method Calls in Java
Understanding how method calls work at the bytecode level is essential for call graph construction.
Static Calls (invokestatic)
Static calls invoke methods declared with the static keyword. These belong to the class itself rather than to instances. The target method is resolved at compile time because static methods cannot be overridden.
Virtual Calls (invokevirtual)
Virtual calls invoke non-static instance methods that can be overridden by subclasses. The JVM determines at runtime which implementation to execute based on the actual object type, not the declared type. This enables polymorphism - all instance methods in Java are virtual by default.
Exception cases (resolved statically):
- Private methods: Cannot be overridden
- Final methods: Cannot be overridden
- Super calls: super.method() always calls the superclass implementation
Special Calls (invokespecial)
Special calls invoke constructors, superclass methods, and private methods. They are resolved statically because the target is unambiguous.
new ArrayList<>(); // Constructor call
super.toString(); // Superclass method call
this.privateMethod(); // Private method call
Interface Calls (invokeinterface)
Interface calls invoke methods declared in interfaces. Similar to virtual calls, the actual implementation is resolved at runtime based on the object's type.
List<String> list = new ArrayList<>();
list.add("hello"); // Interface call - resolved to ArrayList.add()
Dynamic Calls (invokedynamic)
Dynamic calls use the invokedynamic bytecode instruction, introduced for dynamically-typed JVM languages. Java uses it for lambda expressions and method references. The cg4j ASM engine handles lambda metafactory calls by creating synthetic lambda classes.
list.forEach(x -> System.out.println(x)); // Lambda uses invokedynamic
Function<String, Integer> f = String::length; // Method reference uses invokedynamic
How cg4j handles lambdas:
- Detects LambdaMetafactory.metafactory bootstrap methods
- Creates synthetic classes matching WALA's wala/lambda$... naming convention
- Generates two-hop edges: caller → synthetic SAM method → lambda body
Why This Matters for Call Graphs
Static and special calls have a single target - easy to resolve. Virtual and interface calls are the challenge: the target depends on the runtime type. This is why call graph algorithms like CHA and RTA exist - they approximate which implementations might be called at virtual call sites.
Call Graph Construction Algorithms
There are two common algorithms for resolving virtual method calls: CHA and RTA. cg4j uses RTA for its analysis.
Class Hierarchy Analysis (CHA)
CHA is a fast algorithm that determines which methods might be called at a call site.
The Problem
In object-oriented programs, method calls can be dynamic:
How CHA Works
CHA uses the class hierarchy (inheritance tree) to find all possible targets:
- Look at the declared type (
Animal) - Find all subclasses in the hierarchy (
Dog,Cat,Bird) - Include all implementations of the method (
speak())
Result: Conservative approximation - includes all subtypes in the hierarchy, even if they're never instantiated.
CHA Trade-offs
✅ Pros: - Very fast to compute - Simple to understand - Sound (won't miss real calls) - No need to analyze method bodies
❌ Cons: - Over-approximates (includes impossible calls) - Creates edges to methods of classes never instantiated - Less precise than type-sensitive analyses
Rapid Type Analysis (RTA)
RTA is a more precise algorithm that only considers instantiated types.
How RTA Works
RTA tracks which classes are actually instantiated (created with new) during analysis:
- Start with entry points
- Track instantiations: Record every
new ClassName()encountered - Resolve virtual calls: Only consider subtypes that have been instantiated
- Repeat until no new types are discovered
Result: More precise than CHA - only includes methods of classes that are actually created.
RTA Example
public void main() {
Animal a = new Dog();
a.speak(); // Which speak() methods are possible?
}
class Dog extends Animal {
public void speak() { bark(); }
}
class Cat extends Animal {
public void speak() { meow(); }
}
class Bird extends Animal {
public void speak() { chirp(); }
}
CHA result: Includes Dog.speak(), Cat.speak(), Bird.speak() (all subtypes)
RTA result: Only includes Dog.speak() (only instantiated type)
RTA Trade-offs
✅ Pros: - More precise than CHA - Eliminates impossible calls to never-instantiated classes - Still fast and scalable - Sound (won't miss real calls)
❌ Cons: - Slower than CHA (must analyze method bodies) - Requires tracking instantiation sites - Still over-approximates in some cases
CHA vs RTA Comparison
| Aspect | CHA | RTA |
|---|---|---|
| Speed | Fastest | Fast |
| Precision | Lower (all subtypes) | Higher (only instantiated) |
| Analysis cost | Class hierarchy only | Bytecode scanning required |
| Call graph size | Larger | Smaller |
| Use case | Quick overview | More accurate analysis |
When to Use Each Algorithm
Use CHA when: - You need fast results - You want maximum soundness (catch everything) - Precision is less important
Use RTA when: - You need more accurate call graphs - You want to reduce false positives - Analysis time is acceptable
How cg4j Uses These Algorithms
cg4j provides two engines:
WALA Engine
- Uses RTA (Rapid Type Analysis)
- More mature and feature-rich
- Handles complex Java features
ASM Engine
- Uses RTA (Rapid Type Analysis) with fixed-point iteration
- Faster and more lightweight
- Handles lambda expressions and method references
- Good for accurate, efficient call graph construction
Usage:
# ASM engine (uses RTA)
java -jar cg4j.jar -j app.jar --engine=asm
# WALA engine (uses RTA)
java -jar cg4j.jar -j app.jar --engine=wala
Worklist Algorithm
cg4j uses a worklist-based approach for call graph construction:
-
Start with entry points (all public methods in cg4j)
-
Main worklist pass - for each reachable method:
- Scan method bytecode for call sites
- Track instantiations (
newexpressions) - Resolve virtual/interface calls using RTA
- Process lambda
invokedynamicsites (ASM engine) -
Add discovered methods to worklist
-
Fixed-point re-resolution (ASM engine only):
- After the main worklist completes, re-resolve all virtual/interface call sites
- Recovers edges missed due to worklist ordering (e.g., lambda classes registered late)
-
Iterates until no new edges are discovered (typically 1-2 passes)
-
Repeat until convergence
Why Fixed-Point Iteration?
The RTA worklist is single-pass: when method A calls an interface method, only types instantiated before A is processed are found as targets. Types instantiated after (e.g., lambda classes created later in the worklist) are missed because A is never revisited.
Example:
// Method A processes Function1.invoke() call
// Only finds lambda classes already registered
// Method B later creates a new lambda
// A never learns about B's lambda
Solution: After the main worklist, re-resolve all virtual/interface call sites against the complete set of instantiated types. This recovers edges to types that were registered late in the worklist. Converges quickly since types don't recursively create new types.
How cg4j Builds Call Graphs
- Load JAR file and dependencies
- Find entry points (all public methods)
- Build class hierarchy from loaded classes
- Run worklist algorithm:
- Process each reachable method
- Extract call sites from bytecode
- Track instantiations (
newexpressions) - Process lambda
invokedynamicsites (ASM engine) - Resolve virtual/interface calls using RTA
- Expand the call graph iteratively
- Fixed-point re-resolution (ASM engine):
- Re-resolve all virtual/interface calls with complete type set
- Recover edges to late-registered types
- Output CSV with source → target edges
Limitations
What cg4j Handles
✅ Direct method calls
✅ Virtual method calls (using RTA)
✅ Interface method calls
✅ Constructor calls
✅ Static method calls
✅ Lambda expressions and method references
✅ Static initializers (<clinit>)
What cg4j May Miss
❌ Reflection (Method.invoke(), Constructor.newInstance())
❌ Dynamic proxy calls (Proxy.newProxyInstance())
❌ Method handles (MethodHandle.invoke())
❌ Native methods
❌ Custom invokedynamic bootstraps (non-lambda)
Key Terminology
| Term | Meaning |
|---|---|
| Static call | Direct method call, target known at compile-time |
| Virtual call | Method call through object reference, resolved at runtime |
| Entry point | Starting method for analysis (e.g., main(), public methods) |
| Reachable | Method that can be called from an entry point |
| Call site | Location in code where a method call occurs |
| Receiver type | Type of object on which a method is called |
Further Reading
Foundational Papers
CHA (Class Hierarchy Analysis): - Dean, J., Grove, D., and Chambers, C. (1995). Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. ECOOP'95. - PDF - The original paper introducing CHA for call graph construction
RTA (Rapid Type Analysis): - Bacon, D. F. and Sweeney, P. F. (1996). Fast Static Analysis of C++ Virtual Function Calls. OOPSLA'96. - PDF - Introduces RTA as a more precise alternative to CHA by tracking instantiated types
Additional Resources
- WALA Documentation - Framework used by cg4j
- Call Graph Construction Algorithms - Comprehensive tutorial on points-to analysis
- Static Analysis Overview - Course materials on static program analysis
Quick Reference
| Feature | WALA (RTA) | ASM (RTA) |
|---|---|---|
| Speed | ✅ Fast | ✅ Very fast |
| Call graph coverage | ⚠️ Moderate | ✅ Higher |
| Lambda support | ✅ Yes | ✅ Yes |
| Handle reflection | ❌ Limited | ❌ No |
| Sound results | ✅ Yes | ✅ Yes |
| Large programs | ✅ Yes | ✅ Yes |
Note: Benchmarks show ASM typically finds more edges and nodes than WALA, with significant improvement in lambda and method reference coverage.
Recommendation: Use ASM for most analysis tasks (faster, better coverage). Use WALA if you need specific WALA framework features.