How TSZ Works
An architectural guide for contributors
TSZ (Project Zang) is a TypeScript compiler written in Rust, targeting exact parity
with tsc. This document explains
how source code flows through the compiler, what data structures it produces at each
stage, and how the type system works internally. This document is source-grounded:
struct names, constant values, and function signatures are real. Where implementation
is still migrating toward the target architecture, sections below call out
current state versus target state.
This page describes both the architecture target and the practical state of the codebase. Some checker paths still contain transitional logic that is being moved behind solver query boundaries.
Ongoing cleanup work is tracked in docs/todos/architecture-purity.md.
1. The Pipeline at a Glance
TSZ compiles TypeScript through six sequential stages. Each stage has a single responsibility, consumes the output of the previous stage, and is forbidden from reaching backward into a previous stage's internals.
The arrow between Checker and Solver is bidirectional in practice: the Checker walks the AST and asks the Solver questions ("is type A assignable to type B?"), then the Solver returns answers. The target architecture is that the Checker only decides where to ask and how to report results. Current implementation still has some checker-side semantic fast paths and narrowing logic while migration continues.
Responsibility Matrix
Every piece of code in the compiler should answer exactly one of these questions. If you're unsure where something belongs, this table is the decision guide.
| Question | Stage | What it does |
|---|---|---|
| WHO declares this? | Binder | Creates symbols, resolves scopes, builds the control-flow graph. Does not compute types. |
| WHAT is the type? | Solver | Target owner of all type relations, inference, evaluation, instantiation, and narrowing. Migration in progress for remaining checker-side semantics. |
| WHERE to report? | Checker | Walks the AST, picks the right diagnostic span, asks Solver for answers, emits diagnostics. |
| OUTPUT? | Emitter | Generates JavaScript and .d.ts declaration files. Should not own general semantic type validation. |
2. The Full Trace: let x: string = 42
To understand how the compiler works, let's follow a single line of TypeScript through every stage. This example is simple but exercises the full pipeline: it declares a variable with a type annotation and an initializer that doesn't match, producing a diagnostic.
let x: string = 42;
This should produce: TS2322: Type 'number' is not assignable to type 'string',
with the error span pointing at 42. Here's exactly how that happens.
The scanner (tsz-scanner) reads the source text character by character and
produces a stream of tokens. It is parser-driven: the parser calls
scan() to advance to the next token, rather than the scanner producing all
tokens eagerly.
As it scans, every identifier and string literal is interned into an
Atom(u32). This means the string "x" is stored once in a global
table, and everywhere it appears in the program it's referenced by the same integer. String
comparison becomes integer comparison: O(1).
For our input let x: string = 42;, the scanner produces:
| Source | SyntaxKind | Value (u16) | Atom |
|---|---|---|---|
let | LetKeyword | 121 | — (keyword) |
x | Identifier | 80 | Atom(47)* |
: | ColonToken | 59 | — |
string | StringKeyword | 154 | — (keyword) |
= | EqualsToken | 64 | — |
42 | NumericLiteral | 9 | — |
; | SemicolonToken | 27 | — |
*Atom values are assigned sequentially at runtime and depend on what strings have been
interned before. The value 47 is illustrative. Keywords like let and
string are recognized by the scanner directly and don't need atom interning
for identity—their SyntaxKind already distinguishes them.
The scanner also tracks position spans. Each token carries a start and
end byte offset into the source. For 42, that would be
start=18, end=20. These positions flow through the entire pipeline and
are ultimately used by the Checker to produce the diagnostic's source location.
The parser (tsz-parser) consumes tokens and builds an Abstract Syntax Tree
in a NodeArena. It performs no semantic analysis—it
doesn't know what types are, it doesn't resolve names, and it doesn't report type errors.
It only cares about syntactic structure.
For our input, the parser produces this AST:
The 16-byte node header
Every AST node in the arena has a fixed-size 16-byte header. This is critical for performance: four nodes fit in a single 64-byte cache line, and the header is small enough that the CPU prefetcher works efficiently.
Variable-sized payloads (child node lists, string data, etc.) are stored in
typed side pools indexed by data_index. There are 40+
pool types: one for identifiers, one for binary expressions, one for call expressions,
and so on. This keeps the hot node array compact while allowing heterogeneous payloads.
The binder (tsz-binder) walks the AST and builds three things: a
symbol table, a scope tree, and a control-flow
graph. It answers the question "WHO declares this name?" without doing any
type computation.
For our variable declaration let x, the binder:
-
Creates a new symbol in the symbol arena:
SymbolId(5)(the exact index depends on how many symbols have been created before). -
Sets the symbol's flags to
BLOCK_SCOPED_VARIABLE(bit 1, value0x2) becauseletis block-scoped. -
Records the declaration: the symbol's
value_declarationpoints back to the VariableDeclaration AST node. -
Registers the symbol in the current scope's symbol table under the name
"x"(stored as anAtom). -
Creates a flow node (
FlowNodeId) for the assignment, linking it to the previous flow state via an antecedent chain.
// What the binder produces for `let x: string = 42`
Symbol {
flags: 0x2, // BLOCK_SCOPED_VARIABLE
escaped_name: "x", // Atom-interned
declarations: [NodeIndex(3)], // the VariableDeclaration node
value_declaration: NodeIndex(3),
parent: SymbolId(0), // source file symbol
}
The binder does no type computation. It does not look at the type
annotation string or the initializer 42 for type information.
It only records that a symbol named x exists, where it was declared, and
what scope it belongs to. Types are computed later by the Checker and Solver.
The scope tree
Scopes are stored as a persistent tree (not a stack). Each scope has a parent pointer,
so the compiler can look up names by walking from the current scope toward the root.
For a simple file-level declaration like this one, x lives in the source
file scope directly.
The control-flow graph
The binder also constructs a flow graph used later for type narrowing. For our simple assignment, it creates an ASSIGNMENT flow node. In more complex code (if/else, loops, type guards), the flow graph tracks which branches are reachable and what type narrowings apply.
The checker (tsz-checker) is the orchestrator. It walks the AST top-down,
and when it reaches our VariableDeclaration node, it performs these steps:
-
Resolve the type annotation. The type annotation is a TypeReference node
pointing to
string. The checker resolves this to the built-in typeTypeId::STRING, which is a compile-time constantTypeId(10). -
Get the initializer's type. The initializer is the numeric literal
42. The checker callsget_type_of_node()on it, which (via the lowering layer) interns the literal type into the solver: the number typeTypeId::NUMBER=TypeId(9). -
Check assignability. The checker now asks the Solver: "is
TypeId(9)(number) assignable toTypeId(10)(string)?" This goes through thequery_boundariesassignability gateway. - Handle the result. The Solver returns "not assignable" with a structured failure reason. The checker uses this to produce the diagnostic.
// Pseudocode of what the checker does
fn check_variable_declaration(node: VariableDeclaration) {
let declared_type = resolve_type_annotation(node.type); // TypeId(10) = STRING
let init_type = get_type_of_node(node.initializer); // TypeId(9) = NUMBER
// Ask Solver via query_boundaries gateway
if !is_type_assignable_to(init_type, declared_type) {
let reason = get_assignability_failure(init_type, declared_type);
emit_diagnostic(
TS2322,
node.initializer.span(), // point at `42`
reason,
);
}
}
The checker never matches on TypeKey variants or does structural type
comparison itself. It asks the Solver and trusts the answer. If you find yourself
writing match types.lookup(type_id) { ... } in checker code, that logic
belongs in the Solver.
The solver (tsz-solver) is where the actual type computation happens. When
the checker asks "is TypeId(9) assignable to TypeId(10)?", the
solver runs its assignability algorithm:
-
Identity check. Is
TypeId(9) == TypeId(10)? No (9 ≠ 10). This is a single integer comparison—O(1). -
Top/bottom check. Is the source
never(assignable to everything)? No. Is the targetunknown(accepts everything)? No. -
Structural check. Look up the actual type data.
TypeId(9)resolves toIntrinsic(Number).TypeId(10)resolves toIntrinsic(String). Two different intrinsic types with no subtype relationship. -
Result: not assignable. The solver returns
falseand produces a structured failure reason that the checker will use for the diagnostic message.
// Solver's assignability check (simplified)
fn is_assignable_to(source: TypeId, target: TypeId) -> bool {
// 1. Identity: same TypeId? Done.
if source == target { return true; }
// 2. Top/bottom short-circuits
if source == TypeId::NEVER { return true; }
if target == TypeId::UNKNOWN { return true; }
// 3. Structural comparison
match (lookup(source), lookup(target)) {
(Intrinsic(Number), Intrinsic(String)) => false, // our case
// ... hundreds of other structural rules
}
}
For more complex cases (unions, intersections, generics, conditional types), the solver uses coinductive semantics with a cycle stack to handle recursive types, and memoization to avoid re-checking the same type pair. Depth limits prevent runaway checks: max subtype depth is 100, max total checks per compilation is 100,000.
The checker takes the solver's failure reason and renders the final diagnostic. It
selects the error span (pointing at the initializer 42, not the whole
declaration), formats the type names, and produces:
error TS2322: Type 'number' is not assignable to type 'string'.
let x: string = 42;
~~
The diagnostic includes the error code (TS2322), the file path, the line and column,
and the message. This must match tsc exactly—same code, same span,
same message text—for the conformance test to pass.
TS2322 ("Type 'X' is not assignable to type 'Y'") is the most common diagnostic.
Its path is strictly controlled: relation first (Solver determines
assignability), reason second (Solver explains why), location
last (Checker picks the span and formats the message). New TS2322/TS2345/TS2416
work must go through the query_boundaries assignability gateway, and
remaining legacy call sites are being migrated.
3. Memory Architecture
TSZ uses arena allocation for all major data structures. Instead of
allocating each AST node, symbol, or type on the heap individually, data is stored
in contiguous arrays and referenced by lightweight u32 handles. This
eliminates fragmentation, enables O(1) access, and plays well with CPU caches.
Identity Handles
Every major entity in the compiler has a u32-based identity handle. This
is the fundamental performance trick: comparing two types, two symbols, or two strings
is always a single integer comparison.
| Handle | Definition | Lives in | Purpose |
|---|---|---|---|
TypeId(u32) |
Index into TypeInterner | tsz-solver | Type identity. IDs 0–99 reserved for built-ins. O(1) equality. |
SymbolId(u32) |
Index into SymbolArena | tsz-binder | Symbol identity. Every variable, function, class gets one. |
Atom(u32) |
Index into Interner | tsz-common | Interned string. All identifiers and strings deduplicated. |
FlowNodeId(u32) |
Index into FlowNodeArena | tsz-binder | Control-flow graph node. Used for type narrowing. |
NodeIndex |
Index into NodeArena | tsz-parser | AST node reference. Points to a 16-byte node header. |
Arena Layout
// Memory layout (conceptual)
NodeArena [Node][Node][Node][Node][Node]... // 16 bytes each, contiguous
| | |
IdentifierPool [IdentData][IdentData]... // typed side pool
CallExprPool [CallData][CallData]... // typed side pool
... (40+ pools)
SymbolArena [Symbol][Symbol][Symbol]... // indexed by SymbolId
FlowNodeArena [FlowNode][FlowNode]... // indexed by FlowNodeId
TypeInterner [TypeData][TypeData][TypeData]... // indexed by TypeId
HashMap<TypeData, TypeId> // dedup: same structure = same ID
StringInterner ["x"]["string"]["number"]... // indexed by Atom
HashMap<&str, Atom> // dedup: same string = same Atom
Built-in TypeIds
The first 100 TypeIds are reserved for built-in types. These are compile-time constants,
so checking "is this type string?" is just type_id == TypeId(10).
| TypeId | Value | Type |
|---|---|---|
TypeId(0) | NONE | Sentinel / uninitialized |
TypeId(1) | ERROR | Error recovery type |
TypeId(2) | NEVER | Bottom type (assignable to everything) |
TypeId(3) | UNKNOWN | Top type (accepts everything) |
TypeId(4) | ANY | TypeScript's escape hatch |
TypeId(5) | VOID | void |
TypeId(6) | UNDEFINED | undefined |
TypeId(7) | NULL | null |
TypeId(8) | BOOLEAN | boolean |
TypeId(9) | NUMBER | number |
TypeId(10) | STRING | string |
TypeId(11) | BIGINT | bigint |
TypeId(12) | SYMBOL | symbol |
TypeId(13) | OBJECT | object |
TypeId(14) | BOOLEAN_TRUE | Literal true |
TypeId(15) | BOOLEAN_FALSE | Literal false |
TypeId(16) | FUNCTION | Function |
TypeId(100+) | FIRST_USER | User-defined types start here |
4. The Type System
The Solver represents every TypeScript type as a TypeData variant, interned
into the global TypeInterner. When two different parts of the program create
the same type, the interner returns the same TypeId. This deduplication means
the compiler never has two copies of the same type in memory, and equality is always O(1).
TypeData Variants
The full TypeData (internally TypeKey) enum covers the complete
TypeScript type surface:
| Variant | TypeScript Example | Description |
|---|---|---|
Intrinsic | string, number, any | Primitive and special types |
Literal | "hello", 42, true | Literal types (string/number/boolean/bigint) |
Object | { x: number, y: string } | Structural object types |
Union | string | number | Union of types |
Intersection | A & B | Intersection of types |
Array | number[] | Array types |
Tuple | [string, number] | Tuple types (with optional/rest) |
Function | (x: number) => string | Single-signature callable |
Callable | Overloaded function | Multiple call signatures |
TypeParameter | T extends Constraint | Generic type parameters |
Lazy(DefId) | Forward reference | Deferred resolution; avoids cycles |
Application | Map<string, number> | Generic type application |
Conditional | T extends U ? X : Y | Conditional types |
Mapped | { [K in keyof T]: V } | Mapped types |
IndexAccess | T[K] | Indexed access types |
KeyOf | keyof T | Key-of operator |
TemplateLiteral | `hello ${T}` | Template literal types |
Infer | infer R | Inference variable in conditional types |
Error | — | Error recovery; silences cascading diagnostics |
Concrete Example
Consider this TypeScript:
interface User {
name: string;
age: number;
}
type Admin = User & { role: "admin" };
The solver interns these as:
// User becomes an Object type
TypeId(102) = Object {
properties: [
("name", TypeId::STRING), // TypeId(10)
("age", TypeId::NUMBER), // TypeId(9)
]
}
// { role: "admin" } becomes another Object type
TypeId(103) = Object {
properties: [
("role", TypeId(104)), // Literal("admin")
]
}
// Admin = User & { role: "admin" }
TypeId(105) = Intersection([TypeId(102), TypeId(103)])
Lazy Resolution and DefId
Types can reference each other circularly (e.g., a linked list node that contains a
pointer to another node of the same type). To avoid infinite loops during type construction,
the Solver uses Lazy(DefId) as a placeholder. A DefId is a
stable identifier for a type definition. The TypeEnvironment resolves
DefId → TypeId on demand during relation checks and evaluation.
This is the same pattern as lazy evaluation in functional languages: don't compute the type until someone actually asks for it, and when they do, cache the result.
5. The Judge & Lawyer Model
TypeScript is intentionally unsound. It allows things like bivariant function parameters,
any silencing structural mismatches, and excess property checking that depends
on object literal "freshness." These aren't bugs—they're deliberate design choices
that make the language more ergonomic at the cost of type safety.
TSZ separates these concerns into two layers:
The Judge: SubtypeChecker
The Judge implements strict structural subtyping based on set theory. It knows nothing about TypeScript's legacy quirks. Given two types, it determines whether one is structurally a subtype of the other using only mathematical rules: covariance for properties, contravariance for function parameters, width subtyping for objects.
If the Judge says "not a subtype," it provides a structured failure reason explaining exactly which property or parameter caused the mismatch.
The Lawyer: CompatChecker + AnyPropagationRules
The Lawyer takes the Judge's strict answer and applies TypeScript-specific compatibility
rules. These are the places where tsc intentionally deviates from soundness:
| Quirk | Judge (Sound) | Lawyer (tsc-Compatible) |
|---|---|---|
any assignability |
Treated as strict set | Both top and bottom type (assignable to and from everything) |
| Function params | Contravariant | Bivariant for methods (strictFunctionTypes=false) |
| Object literals | Width subtyping (extra props OK) | Excess property check on "fresh" literals |
| Void returns | Normal checking | () => void accepts any return type |
| Weak types (TS2559) | Normal structural check | Objects with only optional properties require at least one overlap |
This separation means TSZ can reason about actual type safety (what the Judge
thinks) while still producing the same diagnostics as tsc (what the Lawyer
decides after applying compatibility rules). It also makes it possible to implement a
hypothetical "strict mode" in the future that uses only the Judge.
6. Crate Map
TSZ is organized as a Cargo workspace with 12 crates. The dependency direction is strictly one-way: downstream crates depend on upstream ones, never the reverse. This is enforced by CI.
Dependency Graph
Crate Purposes
| Crate | LOC | Purpose |
|---|---|---|
tsz-common | ~20K | Foundation: string interner, diagnostics data, shared types |
tsz-scanner | ~4K | Lexical analysis, tokenization |
tsz-parser | ~24K | Recursive descent parser, AST node arena |
tsz-binder | ~9K | Symbols, scopes, control-flow graph |
tsz-solver | ~70K | Type relations, inference, instantiation, narrowing |
tsz-lowering | ~3K | AST-to-type bridge (converts AST type nodes to TypeIds) |
tsz-checker | ~107K | AST traversal, diagnostics orchestration |
tsz-emitter | ~54K | JavaScript and declaration file generation |
tsz-lsp | ~30K | Language Server Protocol implementation |
tsz-cli | ~26K | Command-line interface, project driver |
tsz-wasm | ~4.5K | WebAssembly bindings |
conformance | — | Conformance test runner and infrastructure |
Boundary Rules
The following cross-layer imports are forbidden and enforced by CI:
- Checker must not import
TypeKeyor use raw interner APIs from Solver - Binder must not import Solver for semantic type decisions
- Emitter must not import Checker internals for semantic validation
- No layer may bypass the canonical query APIs to reach into another layer's private state
7. Design Principles
Solver-First
If code computes type semantics—subtyping, inference, evaluation, narrowing, instantiation—it belongs in the Solver. The Checker is a thin orchestration layer that walks the AST and asks the Solver questions. This is the architectural target; remaining checker-side semantic logic is technical debt being paid down. Keeping semantics in Solver prevents logic duplication and ensures there is one source of truth for "what does this type mean?"
Arena Everything
AST nodes, symbols, flow graph nodes, and types are all arena-allocated and referenced
by u32 handles. No per-operation heap allocation on hot paths. This gives
cache-friendly linear memory layouts, zero fragmentation, and O(1) access by index.
The tradeoff is that individual items can't be freed until the entire arena is dropped,
but compiler passes are batch operations so this is fine.
One Type Universe
TSZ uses one global canonical TypeInterner for persistent semantic types.
The system also supports local ephemeral TypeIds for scoped operations, but
cross-file semantic identity and relation caches are anchored to the global type universe.
This keeps type equality as a fast integer comparison.
tsc Parity Above All
The absolute target is matching tsc behavior exactly. This includes diagnostics
(same error code, same span, same message), inference results, compatibility decisions,
narrowing behavior, and edge cases. When there's a conflict between "what makes sense" and
"what tsc does," tsc wins. Correctness is measured by the conformance test suite, which
compares every diagnostic fingerprint against the official TypeScript output.
Lazy Resolution
TypeData::Lazy(DefId) defers type resolution until someone actually needs the
type. This is essential for handling circular references (e.g., a linked list type that
refers to itself), mutual recursion between type aliases, and cross-file type dependencies.
The TypeEnvironment resolves DefId → TypeId on demand and
caches the result.
No Cross-Layer Leakage
Each pipeline stage has strict boundaries that are enforced by CI. The Binder can't import the Solver. The Emitter can't import the Checker. The Checker can't construct raw solver types. This one-way dependency direction prevents architectural drift: new code must fit into the existing responsibility model rather than taking shortcuts.
8. Glossary
This glossary explains terms used throughout the codebase and this document. If you're new to compilers or to this project, start here when you encounter an unfamiliar term.
Antecedent
An incoming edge in the control-flow graph. Every flow node tracks which flow nodes
come before it—those are its antecedents. A simple assignment has one
antecedent (the previous statement). A merge point after an if/else has
two: one from the true branch and one from the false branch. The checker walks
antecedent chains backward to determine what type narrowings are in effect at any
given point in the code.
Arena (Arena Allocation)
A memory management strategy where objects are allocated sequentially in a large
contiguous block of memory, and freed all at once when the arena is dropped. Instead
of calling malloc for each AST node individually (which fragments memory
and is slow), the parser allocates nodes into a pre-sized Vec and hands
back an index (NodeIndex). This has three benefits:
- Speed: Allocation is just bumping an index. No allocator overhead.
- Cache friendliness: Nodes are contiguous in memory, so the CPU prefetcher loads them efficiently.
- No fragmentation: One big block instead of thousands of small ones scattered across the heap.
The tradeoff is that you can't free individual items—only the whole arena at once. This is fine for compilers, which are batch operations: parse a file, check it, emit it, then throw away all the data.
AST (Abstract Syntax Tree)
A tree representation of the syntactic structure of source code. The parser reads
let x: string = 42 and produces a tree where a VariableStatement
node contains a VariableDeclarationList, which contains a
VariableDeclaration with children for the name, type annotation, and
initializer. "Abstract" means it doesn't include every character (like semicolons or
parentheses used only for grouping)—it captures the structure, not the
exact characters. In TSZ, the AST is stored in a NodeArena as compact
16-byte node headers.
Atom
A u32 handle representing an interned string. When the scanner sees the
identifier x, it doesn't store the string "x" directly—it stores it
once in a global string table and returns Atom(47) (or whatever index it
gets). Every subsequent occurrence of x anywhere in the program gets the
same Atom(47). This means comparing two identifiers for equality is just
comparing two integers, which is O(1) regardless of string length. Defined in
tsz-common/src/interner.rs.
Bivariance
A variance mode where type parameters are checked in both directions. If
Dog extends Animal, then under bivariance, (x: Dog) => void
is assignable to (x: Animal) => void and vice versa. This is
unsound (it allows calling a function with the wrong argument type) but TypeScript uses
it by default for method parameters for backwards compatibility. See also
Variance.
Bottom Type
A type that is a subtype of every other type. In TypeScript, this is never
(TypeId(2)). A value of type never can be assigned to any type,
but no value can actually be of type never. It represents impossible
states, like the return type of a function that always throws. Opposite of
Top Type.
Cache Line
The smallest unit of data the CPU transfers between main memory and the CPU cache. On most modern processors, a cache line is 64 bytes. When the CPU reads one byte, it actually loads the entire 64-byte block containing that byte. This is why TSZ's 16-byte AST node headers matter: four nodes fit in one cache line, so iterating over nodes is very fast because the CPU loads four at a time.
Coinductive Semantics (Greatest Fixed Point)
A strategy for handling recursive types during subtype checking. The problem: when
checking whether interface List { next: List } is a subtype of itself, a
naive algorithm would loop forever (to check List <: List, it needs to
check List <: List, and so on). Coinductive semantics solves this by
keeping a cycle stack of type pairs currently being checked. When the solver
encounters a pair that's already on the stack, it returns "yes" (assumes the check
succeeds). This is mathematically called the Greatest Fixed Point (GFP): assume
everything works until proven otherwise. The alternative (Least Fixed Point, or
inductive) would assume failure, which gives wrong results for recursive types.
Conformance
The measure of how closely TSZ matches tsc's output. A conformance test
passes when TSZ produces exactly the same set of diagnostics as tsc for a
given input file—same error code (e.g., TS2322), same file, same line, same column,
same message text. The official TypeScript conformance test suite has ~12,500 tests.
As of 2026-02-22, TSZ passes 60.8% (7,650/12,574). See the README for the latest figure.
Contravariance
A variance mode where the subtype relationship is reversed. Function parameters
are contravariant: if Dog extends Animal, then (x: Animal) => void
is assignable to (x: Dog) => void (the broader parameter type is
the subtype). This is sound because a function that accepts any Animal can safely be called
where a function accepting only Dogs is expected—it can handle Dogs and more.
See also Variance.
Covariance
A variance mode where the subtype relationship goes in the same direction.
Return types and property types are covariant: if Dog extends Animal, then
() => Dog is assignable to () => Animal. This is sound
because anywhere you expect an Animal, receiving a Dog is fine.
See also Variance.
DefId
A stable identifier for a type definition (not a type value). When the checker
encounters a type reference like User that hasn't been fully resolved yet,
it creates a DefId and the solver stores it as Lazy(DefId). Later,
when someone actually needs the type (e.g., during a subtype check), the
TypeEnvironment resolves the DefId to a concrete TypeId.
This lazy approach prevents infinite loops when types reference each other circularly.
Diagnostic
A compiler message reported to the user. In TypeScript, each diagnostic has an error
code (e.g., TS2322), a severity (error, warning, suggestion), a source location (file,
line, column), and a message. Example: error TS2322: Type 'number' is not assignable
to type 'string'. TSZ must produce diagnostics identical to tsc for
conformance tests to pass.
Discriminated Union
A union type where each member has a shared property with a unique literal type that
identifies it. Example: { kind: "circle", radius: number } | { kind: "square",
side: number }. The kind property is the "discriminant." When you
check if (shape.kind === "circle"), the solver can narrow the type to just
the circle variant. This is a common pattern in TypeScript.
Excess Property Checking
A TypeScript-specific check that reports an error when an object literal has properties not declared in the target type. Normally, structural typing allows extra properties (width subtyping), but when you write a literal directly:
const x: { a: number } = { a: 1, b: 2 }; // Error: 'b' does not exist on type
TypeScript reports the extra b. This only happens when the object is "fresh"
(a literal at the assignment site). If you assign through a variable, the freshness is lost
and extra properties are silently allowed. See also Freshness.
Flow Graph (Control-Flow Graph / CFG)
A graph built by the binder that models how execution flows through the program. Nodes
represent program points (assignments, conditions, function entries). Edges represent
possible execution paths. The checker uses this graph for type
narrowing: after if (x !== null), the true branch knows x
is not null by tracing backward through the flow graph. Each flow node is identified by
a FlowNodeId(u32) and tracks its antecedents.
Freshness
A flag on object types that indicates the type was created from an object literal at the current assignment site. Fresh objects trigger excess property checking. Once the object is assigned to a variable or passed through a function, the freshness flag is removed and excess property checking no longer applies. This is managed by the Lawyer (CompatChecker) layer in the solver.
Hash-Consing
A technique for deduplicating data structures by storing each unique value exactly once.
When you want to create a new value, you first hash it and check if an identical value
already exists. If so, return the existing handle; if not, store the new value and return
a fresh handle. In TSZ, the TypeInterner hash-conses types: creating the type
string | number twice returns the same TypeId both times. The
string interner does the same for identifiers (returning the same Atom).
This is the core mechanism behind O(1) equality checks.
Hoisting
A JavaScript/TypeScript behavior where certain declarations are treated as if they appear
at the top of their scope, even if they're written later in the code. var
declarations and function declarations are hoisted—you can reference
them before their textual position. let and const are not
hoisted (they exist in a "temporal dead zone" until their declaration). The binder handles
this with a multi-pass strategy: first pass collects hoisted declarations, second pass
processes block-scoped declarations.
Instantiation
The process of replacing generic type parameters with concrete types. When you write
Array<string>, the solver takes the generic Array<T>
definition and substitutes T with string throughout the entire
type structure. This happens recursively: if the array's methods reference T,
those get substituted too. The solver caches instantiation results so that
Array<string> used in two different places produces the same
TypeId. Implemented in tsz-solver/src/instantiation/.
Interner (String Interner / Type Interner)
A data structure that stores unique values and returns lightweight handles to them. Think of it as a two-way lookup table:
- Forward: Given a string like
"getElementById", return itsAtom(u32). - Reverse: Given an
Atom(u32), return the original string.
The first time you intern a string, it's stored and you get back an integer handle. Every subsequent time you intern the same string, you get back the same integer without storing it again. TSZ has two interners:
- String interner (
tsz-common): Maps strings toAtom(u32). Used for all identifiers, keywords, and string content. - Type interner (
tsz-solver): Maps type structures (TypeData) toTypeId(u32). Used for all types.
The key benefit: once interned, equality is just integer comparison. Checking
whether two 100-character identifiers are the same string is a single ==
on two u32 values.
IR (Intermediate Representation)
A structured tree used by the emitter as a middle step between the TypeScript AST and
the final JavaScript text output. When the emitter needs to transform TypeScript-specific
syntax into JavaScript (e.g., converting an ES2015+ class to ES5 prototype chains, or
converting async/await to state machines), it doesn't manipulate strings
directly. Instead:
- Transform phase: A transform function reads the AST and produces an IR tree made of
IRNodevariants (FunctionExpr,CallExpr,VarDecl,BinaryExpr,Block, etc.). - Print phase: The
IRPrinterwalks the IR tree and emits JavaScript text with correct formatting, indentation, and semicolons.
This two-phase approach keeps transforms clean (they describe what the output
should look like structurally) and keeps formatting concerns separate (the printer handles
how to render it). The IR lives in tsz-emitter/src/transforms/ir.rs.
Lowering
The process of converting AST type syntax nodes into solver TypeIds. When
the checker sees a TypeReference node for string in the AST,
the lowering layer (tsz-lowering) converts that AST node into
TypeId::STRING. For more complex types like { name: string, age: number },
lowering recursively converts each part and interns the resulting Object type.
This is a bridge between the parser's syntax world and the solver's type world.
Memoization
Caching the results of expensive computations so they don't need to be repeated. The solver maintains several caches:
- Relation cache: "Is
Aassignable toB?" Once checked, the result is cached. The next time the same pair is queried, it's an O(1) lookup. - Evaluation cache: "What does
T extends string ? 'yes' : 'no'evaluate to?" Cached after first evaluation. - Property cache: "What type is the
nameproperty of typeT?" Cached per (type, property-name) pair. - Instantiation cache: "What is
Array<string>?" Cached per (generic, arguments) tuple.
Without memoization, checking a complex program would repeat the same type computations millions of times. With it, each unique computation happens once.
Narrowing (Type Narrowing)
The process of refining a type based on runtime checks. When you write:
function f(x: string | number) {
if (typeof x === "string") {
// Here, x is narrowed from string | number to just string
x.toUpperCase(); // OK: string has toUpperCase
} else {
// Here, x is narrowed to just number
x.toFixed(); // OK: number has toFixed
}
}
The checker extracts "type guards" from the AST (like typeof x === "string")
and passes them to the solver, which computes the narrowed type for each branch. The
solver handles union filtering, instanceof, discriminated unions, truthiness
checks, in operator, and user-defined type predicates. The flow graph built
by the binder tells the checker which guards are active at each point in the code.
NodeArena
The arena that stores all AST nodes. Each node has a 16-byte header (kind,
flags, pos, end, data_index) stored
contiguously. Variable-sized payloads (child lists, string data) are stored in typed
side pools, one per AST node kind. For example, all IdentifierData payloads
are stored together in one pool, all CallExpressionData in another. The
data_index field in the header points into the right pool. This keeps the
main node array compact (4 nodes per cache line).
NodeIndex
A handle (index) into the NodeArena. Instead of using a pointer to reference
an AST node, the parser returns a NodeIndex which is just an offset into the
arena's array. Looking up a node by its index is O(1) (simple array indexing). Using
indices instead of pointers also makes serialization trivial and avoids dangling pointer
bugs.
Query Boundaries
The gateway API in the checker (tsz-checker/src/query_boundaries/) that
routes type-checking questions to the solver. Instead of the checker calling solver
methods directly all over the codebase, assignability checks for TS2322/TS2345/TS2416
diagnostics must go through these boundary functions. This centralization ensures
consistent behavior: the boundary handles relation checking, failure reason extraction,
suppression policy, and diagnostic rendering in one place.
Recursive Descent Parser
A parsing strategy where each grammar rule becomes a function, and parsing works by
calling these functions recursively. For example, parsing an expression calls
parse_binary_expression(), which calls parse_unary_expression(),
which calls parse_primary_expression(). It's called "recursive" because
these functions call each other, and "descent" because it starts from the top-level rule
(source file) and works its way down into nested sub-expressions. This is the same
approach used by tsc itself.
Scope Tree
A tree of lexical scopes built by the binder. Each scope knows its parent scope and
contains a symbol table mapping names to SymbolIds. When the checker needs
to resolve a name like x, it looks in the current scope first, then walks
up the parent chain until it finds it. Unlike a stack-based approach (push scope on
entry, pop on exit), TSZ uses a persistent tree: scopes are never destroyed. This is
important for incremental compilation—you can query a scope later without
re-building it.
Side Pool (Typed Side Pool)
A secondary array in the NodeArena that stores variable-sized data for a
specific AST node kind. The 16-byte node header is too small to hold things like a
function's parameter list or a class's member list. So the header has a
data_index field that points into a typed side pool. There are 40+ pools:
IdentifierData, BinaryExpressionData,
FunctionDeclarationData, etc. Each pool only stores data for its specific
node kind, keeping things tightly packed and cache-friendly.
Structural Typing
TypeScript's approach to type compatibility: types are compatible based on their
structure (what properties and methods they have), not their name
or declaration site. If interface A has { name: string } and
interface B also has { name: string }, they are compatible
even though they're different declarations. This is different from "nominal typing"
(used by Java, C#) where two classes with identical structures are different types if
they have different names.
Symbol / SymbolId
A named entity in the program: a variable, function, class, interface, enum member, etc.
The binder creates a Symbol record for each declaration and assigns it a
SymbolId(u32) handle in the symbol arena. The symbol stores the entity's
name, kind flags, declaration nodes, parent symbol, and optional member/export tables.
The checker uses SymbolId to look up what type corresponds to a given name.
SyntaxKind
A u16 enum that identifies what kind of token or AST node something is.
There are ~400 variants covering all of TypeScript: LetKeyword = 121,
Identifier = 80, VariableDeclaration = 261,
IfStatement = 246, etc. Every node header and every token carries a
SyntaxKind. This is how the compiler knows what it's looking at without needing
inheritance hierarchies or trait objects.
Top Type
A type that is a supertype of every other type. In TypeScript, unknown
(TypeId(3)) is the true top type: any value is assignable to
unknown. The type any (TypeId(4)) is special: it
acts as both a top type and a bottom type (assignable to and from everything),
which is intentionally unsound but serves as TypeScript's escape hatch.
Opposite of Bottom Type.
TypeEnvironment
The runtime context that resolves DefId → TypeId during type evaluation
and relation checks. When the solver encounters a Lazy(DefId) type, it asks
the TypeEnvironment to look up (or compute and cache) the concrete TypeId.
The environment also carries type parameter bindings during generic instantiation, so
that T resolves to the right concrete type within a given context.
TypeId
A u32 handle into the global type interner. The most important type in the
compiler. Every type in the program—from string to
Map<string, Promise<number[]>>—is represented by a
single TypeId. Comparing two types is comparing two integers:
type_a == type_b. The first 100 IDs (0–99) are reserved for built-in
types like string, number, any, never,
etc. User-defined types start at ID 100.
TypeData / TypeKey
The actual structural representation of a type. TypeData is the public name
used at API boundaries; TypeKey is the crate-private internal enum in the
solver. It has variants for every kind of TypeScript type: Intrinsic (primitives),
Object (structural objects), Union, Intersection,
Function, Conditional, Mapped, etc. Checker code
must not match on TypeKey directly—it should use solver query APIs instead.
Type Guard
A runtime expression that the compiler uses to narrow a type. Examples:
typeof x === "string", x instanceof Array,
"prop" in x, or a user-defined predicate function like
function isString(x: unknown): x is string. The checker extracts type guards
from the AST and converts them to solver TypeGuard structs, then the solver
computes the narrowed type.
Variance
How a type parameter's subtype relationship affects the subtype relationship of the containing type. There are four variance modes:
- Covariant: Same direction.
Dog extends AnimalimpliesArray<Dog> extends Array<Animal>. - Contravariant: Opposite direction. Function parameters go this way for soundness.
- Bivariant: Both directions. TypeScript's (unsound) default for method parameters.
- Invariant: Neither direction. Mutable references need this for soundness.
In TSZ, the Judge (SubtypeChecker) uses strict variance rules, and the Lawyer (CompatChecker) overrides them where TypeScript's legacy behavior requires it (e.g., bivariant method parameters).
Visitor (TypeVisitor)
A design pattern for traversing the type graph systematically. Instead of writing a
match on all 20+ TypeKey variants at every call site, the
solver provides a TypeVisitor trait. You implement a few methods (e.g.,
"what to do when visiting a Union"), and the visitor handles walking the complete
type structure, including recursion into nested types, cycle detection, and so on. This
prevents bugs where a new type variant is added but some call sites forget to handle it.
Width Subtyping
The rule that a type with more properties is a subtype of a type with
fewer properties. { a: number, b: string } is assignable to
{ a: number } because it has everything { a: number } requires
(the a property) plus an extra b. This is the default behavior
of structural typing. Excess property checking
(triggered by freshness) is the exception: it rejects the
extra properties for object literals to catch typos.