How TSZ Works

An architectural guide for contributors

TSZ (Project Zang) is a TypeScript compiler written in Rust, targeting exact parity with tsc. This document explains how source code flows through the compiler, what data structures it produces at each stage, and how the type system works internally. This document is source-grounded: struct names, constant values, and function signatures are real. Where implementation is still migrating toward the target architecture, sections below call out current state versus target state.

Current State vs Target State

This page describes both the architecture target and the practical state of the codebase. Some checker paths still contain transitional logic that is being moved behind solver query boundaries.

Ongoing cleanup work is tracked in docs/todos/architecture-purity.md.

1. The Pipeline at a Glance

TSZ compiles TypeScript through six sequential stages. Each stage has a single responsibility, consumes the output of the previous stage, and is forbidden from reaching backward into a previous stage's internals.

The arrow between Checker and Solver is bidirectional in practice: the Checker walks the AST and asks the Solver questions ("is type A assignable to type B?"), then the Solver returns answers. The target architecture is that the Checker only decides where to ask and how to report results. Current implementation still has some checker-side semantic fast paths and narrowing logic while migration continues.

Responsibility Matrix

Every piece of code in the compiler should answer exactly one of these questions. If you're unsure where something belongs, this table is the decision guide.

Question	Stage	What it does
WHO declares this?	Binder	Creates symbols, resolves scopes, builds the control-flow graph. Does not compute types.
WHAT is the type?	Solver	Target owner of all type relations, inference, evaluation, instantiation, and narrowing. Migration in progress for remaining checker-side semantics.
WHERE to report?	Checker	Walks the AST, picks the right diagnostic span, asks Solver for answers, emits diagnostics.
OUTPUT?	Emitter	Generates JavaScript and `.d.ts` declaration files. Should not own general semantic type validation.

2. The Full Trace: `let x: string = 42`

To understand how the compiler works, let's follow a single line of TypeScript through every stage. This example is simple but exercises the full pipeline: it declares a variable with a type annotation and an initializer that doesn't match, producing a diagnostic.

let x: string = 42;

This should produce: TS2322: Type 'number' is not assignable to type 'string', with the error span pointing at 42. Here's exactly how that happens.

Step 1 — Scanner

The scanner (tsz-scanner) reads the source text character by character and produces a stream of tokens. It is parser-driven: the parser calls scan() to advance to the next token, rather than the scanner producing all tokens eagerly.

As it scans, every identifier and string literal is interned into an Atom(u32). This means the string "x" is stored once in a global table, and everywhere it appears in the program it's referenced by the same integer. String comparison becomes integer comparison: O(1).

For our input let x: string = 42;, the scanner produces:

Source	SyntaxKind	Value (u16)	Atom
`let`	LetKeyword	121	— (keyword)
`x`	Identifier	80	Atom(47)*
`:`	ColonToken	59	—
`string`	StringKeyword	154	— (keyword)
`=`	EqualsToken	64	—
`42`	NumericLiteral	9	—
`;`	SemicolonToken	27	—

Note

*Atom values are assigned sequentially at runtime and depend on what strings have been interned before. The value 47 is illustrative. Keywords like let and string are recognized by the scanner directly and don't need atom interning for identity—their SyntaxKind already distinguishes them.

The scanner also tracks position spans. Each token carries a start and end byte offset into the source. For 42, that would be start=18, end=20. These positions flow through the entire pipeline and are ultimately used by the Checker to produce the diagnostic's source location.

Step 2 — Parser

The parser (tsz-parser) consumes tokens and builds an Abstract Syntax Tree in a NodeArena. It performs no semantic analysis—it doesn't know what types are, it doesn't resolve names, and it doesn't report type errors. It only cares about syntactic structure.

For our input, the parser produces this AST:

The 16-byte node header

Every AST node in the arena has a fixed-size 16-byte header. This is critical for performance: four nodes fit in a single 64-byte cache line, and the header is small enough that the CPU prefetcher works efficiently.

kindu16 · 2B

flagsu16 · 2B

posu32 · 4B

endu32 · 4B

data_indexu32 · 4B — index into typed side pool (u32::MAX = no data)

Variable-sized payloads (child node lists, string data, etc.) are stored in typed side pools indexed by data_index. There are 40+ pool types: one for identifiers, one for binary expressions, one for call expressions, and so on. This keeps the hot node array compact while allowing heterogeneous payloads.

Step 3 — Binder

The binder (tsz-binder) walks the AST and builds three things: a symbol table, a scope tree, and a control-flow graph. It answers the question "WHO declares this name?" without doing any type computation.

For our variable declaration let x, the binder:

Creates a new symbol in the symbol arena: SymbolId(5) (the exact index depends on how many symbols have been created before).
Sets the symbol's flags to BLOCK_SCOPED_VARIABLE (bit 1, value 0x2) because let is block-scoped.
Records the declaration: the symbol's value_declaration points back to the VariableDeclaration AST node.
Registers the symbol in the current scope's symbol table under the name "x" (stored as an Atom).
Creates a flow node (FlowNodeId) for the assignment, linking it to the previous flow state via an antecedent chain.

// What the binder produces for `let x: string = 42`
Symbol {
    flags: 0x2,              // BLOCK_SCOPED_VARIABLE
    escaped_name: "x",       // Atom-interned
    declarations: [NodeIndex(3)],  // the VariableDeclaration node
    value_declaration: NodeIndex(3),
    parent: SymbolId(0),     // source file symbol
}

Key rule

The binder does no type computation. It does not look at the type annotation string or the initializer 42 for type information. It only records that a symbol named x exists, where it was declared, and what scope it belongs to. Types are computed later by the Checker and Solver.

The scope tree

Scopes are stored as a persistent tree (not a stack). Each scope has a parent pointer, so the compiler can look up names by walking from the current scope toward the root. For a simple file-level declaration like this one, x lives in the source file scope directly.

The control-flow graph

The binder also constructs a flow graph used later for type narrowing. For our simple assignment, it creates an ASSIGNMENT flow node. In more complex code (if/else, loops, type guards), the flow graph tracks which branches are reachable and what type narrowings apply.

Step 4 — Checker

The checker (tsz-checker) is the orchestrator. It walks the AST top-down, and when it reaches our VariableDeclaration node, it performs these steps:

Resolve the type annotation. The type annotation is a TypeReference node pointing to string. The checker resolves this to the built-in type TypeId::STRING, which is a compile-time constant TypeId(10).
Get the initializer's type. The initializer is the numeric literal 42. The checker calls get_type_of_node() on it, which (via the lowering layer) interns the literal type into the solver: the number type TypeId::NUMBER = TypeId(9).
Check assignability. The checker now asks the Solver: "is TypeId(9) (number) assignable to TypeId(10) (string)?" This goes through the query_boundaries assignability gateway.
Handle the result. The Solver returns "not assignable" with a structured failure reason. The checker uses this to produce the diagnostic.

// Pseudocode of what the checker does
fn check_variable_declaration(node: VariableDeclaration) {
    let declared_type = resolve_type_annotation(node.type);  // TypeId(10) = STRING
    let init_type = get_type_of_node(node.initializer);     // TypeId(9)  = NUMBER

    // Ask Solver via query_boundaries gateway
    if !is_type_assignable_to(init_type, declared_type) {
        let reason = get_assignability_failure(init_type, declared_type);
        emit_diagnostic(
            TS2322,
            node.initializer.span(),  // point at `42`
            reason,
        );
    }
}

Architecture rule

The checker never matches on TypeKey variants or does structural type comparison itself. It asks the Solver and trusts the answer. If you find yourself writing match types.lookup(type_id) { ... } in checker code, that logic belongs in the Solver.

Step 5 — Solver

The solver (tsz-solver) is where the actual type computation happens. When the checker asks "is TypeId(9) assignable to TypeId(10)?", the solver runs its assignability algorithm:

Identity check. Is TypeId(9) == TypeId(10)? No (9 ≠ 10). This is a single integer comparison—O(1).
Top/bottom check. Is the source never (assignable to everything)? No. Is the target unknown (accepts everything)? No.
Structural check. Look up the actual type data. TypeId(9) resolves to Intrinsic(Number). TypeId(10) resolves to Intrinsic(String). Two different intrinsic types with no subtype relationship.
Result: not assignable. The solver returns false and produces a structured failure reason that the checker will use for the diagnostic message.

// Solver's assignability check (simplified)
fn is_assignable_to(source: TypeId, target: TypeId) -> bool {
    // 1. Identity: same TypeId? Done.
    if source == target { return true; }

    // 2. Top/bottom short-circuits
    if source == TypeId::NEVER { return true; }
    if target == TypeId::UNKNOWN { return true; }

    // 3. Structural comparison
    match (lookup(source), lookup(target)) {
        (Intrinsic(Number), Intrinsic(String)) => false,  // our case
        // ... hundreds of other structural rules
    }
}

For more complex cases (unions, intersections, generics, conditional types), the solver uses coinductive semantics with a cycle stack to handle recursive types, and memoization to avoid re-checking the same type pair. Depth limits prevent runaway checks: max subtype depth is 100, max total checks per compilation is 100,000.

Step 6 — Diagnostic Output

The checker takes the solver's failure reason and renders the final diagnostic. It selects the error span (pointing at the initializer 42, not the whole declaration), formats the type names, and produces:

error TS2322: Type 'number' is not assignable to type 'string'.

  let x: string = 42;
                    ~~

The diagnostic includes the error code (TS2322), the file path, the line and column, and the message. This must match tsc exactly—same code, same span, same message text—for the conformance test to pass.

The TS2322 pipeline

TS2322 ("Type 'X' is not assignable to type 'Y'") is the most common diagnostic. Its path is strictly controlled: relation first (Solver determines assignability), reason second (Solver explains why), location last (Checker picks the span and formats the message). New TS2322/TS2345/TS2416 work must go through the query_boundaries assignability gateway, and remaining legacy call sites are being migrated.

3. Memory Architecture

TSZ uses arena allocation for all major data structures. Instead of allocating each AST node, symbol, or type on the heap individually, data is stored in contiguous arrays and referenced by lightweight u32 handles. This eliminates fragmentation, enables O(1) access, and plays well with CPU caches.

Identity Handles

Every major entity in the compiler has a u32-based identity handle. This is the fundamental performance trick: comparing two types, two symbols, or two strings is always a single integer comparison.

Handle	Definition	Lives in	Purpose
`TypeId(u32)`	Index into TypeInterner	tsz-solver	Type identity. IDs 0–99 reserved for built-ins. O(1) equality.
`SymbolId(u32)`	Index into SymbolArena	tsz-binder	Symbol identity. Every variable, function, class gets one.
`Atom(u32)`	Index into Interner	tsz-common	Interned string. All identifiers and strings deduplicated.
`FlowNodeId(u32)`	Index into FlowNodeArena	tsz-binder	Control-flow graph node. Used for type narrowing.
`NodeIndex`	Index into NodeArena	tsz-parser	AST node reference. Points to a 16-byte node header.

Arena Layout

// Memory layout (conceptual)

NodeArena         [Node][Node][Node][Node][Node]...     // 16 bytes each, contiguous
                      |     |     |
IdentifierPool    [IdentData][IdentData]...             // typed side pool
CallExprPool      [CallData][CallData]...               // typed side pool
  ...               (40+ pools)

SymbolArena       [Symbol][Symbol][Symbol]...            // indexed by SymbolId
FlowNodeArena     [FlowNode][FlowNode]...               // indexed by FlowNodeId

TypeInterner      [TypeData][TypeData][TypeData]...      // indexed by TypeId
                  HashMap<TypeData, TypeId>              // dedup: same structure = same ID

StringInterner    ["x"]["string"]["number"]...         // indexed by Atom
                  HashMap<&str, Atom>                    // dedup: same string = same Atom

Built-in TypeIds

The first 100 TypeIds are reserved for built-in types. These are compile-time constants, so checking "is this type string?" is just type_id == TypeId(10).

TypeId	Value	Type
`TypeId(0)`	NONE	Sentinel / uninitialized
`TypeId(1)`	ERROR	Error recovery type
`TypeId(2)`	NEVER	Bottom type (assignable to everything)
`TypeId(3)`	UNKNOWN	Top type (accepts everything)
`TypeId(4)`	ANY	TypeScript's escape hatch
`TypeId(5)`	VOID	void
`TypeId(6)`	UNDEFINED	undefined
`TypeId(7)`	NULL	null
`TypeId(8)`	BOOLEAN	boolean
`TypeId(9)`	NUMBER	number
`TypeId(10)`	STRING	string
`TypeId(11)`	BIGINT	bigint
`TypeId(12)`	SYMBOL	symbol
`TypeId(13)`	OBJECT	object
`TypeId(14)`	BOOLEAN_TRUE	Literal `true`
`TypeId(15)`	BOOLEAN_FALSE	Literal `false`
`TypeId(16)`	FUNCTION	Function
`TypeId(100+)`	FIRST_USER	User-defined types start here

4. The Type System

The Solver represents every TypeScript type as a TypeData variant, interned into the global TypeInterner. When two different parts of the program create the same type, the interner returns the same TypeId. This deduplication means the compiler never has two copies of the same type in memory, and equality is always O(1).

TypeData Variants

The full TypeData (internally TypeKey) enum covers the complete TypeScript type surface:

Variant	TypeScript Example	Description
`Intrinsic`	`string`, `number`, `any`	Primitive and special types
`Literal`	`"hello"`, `42`, `true`	Literal types (string/number/boolean/bigint)
`Object`	`{ x: number, y: string }`	Structural object types
`Union`	`string \| number`	Union of types
`Intersection`	`A & B`	Intersection of types
`Array`	`number[]`	Array types
`Tuple`	`[string, number]`	Tuple types (with optional/rest)
`Function`	`(x: number) => string`	Single-signature callable
`Callable`	Overloaded function	Multiple call signatures
`TypeParameter`	`T extends Constraint`	Generic type parameters
`Lazy(DefId)`	Forward reference	Deferred resolution; avoids cycles
`Application`	`Map<string, number>`	Generic type application
`Conditional`	`T extends U ? X : Y`	Conditional types
`Mapped`	`{ [K in keyof T]: V }`	Mapped types
`IndexAccess`	`T[K]`	Indexed access types
`KeyOf`	`keyof T`	Key-of operator
`TemplateLiteral`	`hello ${T}`	Template literal types
`Infer`	`infer R`	Inference variable in conditional types
`Error`	—	Error recovery; silences cascading diagnostics

Concrete Example

Consider this TypeScript:

interface User {
  name: string;
  age: number;
}
type Admin = User & { role: "admin" };

The solver interns these as:

// User becomes an Object type
TypeId(102) = Object {
    properties: [
        ("name", TypeId::STRING),    // TypeId(10)
        ("age",  TypeId::NUMBER),    // TypeId(9)
    ]
}

// { role: "admin" } becomes another Object type
TypeId(103) = Object {
    properties: [
        ("role", TypeId(104)),       // Literal("admin")
    ]
}

// Admin = User & { role: "admin" }
TypeId(105) = Intersection([TypeId(102), TypeId(103)])

Lazy Resolution and DefId

Types can reference each other circularly (e.g., a linked list node that contains a pointer to another node of the same type). To avoid infinite loops during type construction, the Solver uses Lazy(DefId) as a placeholder. A DefId is a stable identifier for a type definition. The TypeEnvironment resolves DefId → TypeId on demand during relation checks and evaluation.

This is the same pattern as lazy evaluation in functional languages: don't compute the type until someone actually asks for it, and when they do, cache the result.

5. The Judge & Lawyer Model

TypeScript is intentionally unsound. It allows things like bivariant function parameters, any silencing structural mismatches, and excess property checking that depends on object literal "freshness." These aren't bugs—they're deliberate design choices that make the language more ergonomic at the cost of type safety.

TSZ separates these concerns into two layers:

The Judge: `SubtypeChecker`

The Judge implements strict structural subtyping based on set theory. It knows nothing about TypeScript's legacy quirks. Given two types, it determines whether one is structurally a subtype of the other using only mathematical rules: covariance for properties, contravariance for function parameters, width subtyping for objects.

If the Judge says "not a subtype," it provides a structured failure reason explaining exactly which property or parameter caused the mismatch.

The Lawyer: `CompatChecker` + `AnyPropagationRules`

The Lawyer takes the Judge's strict answer and applies TypeScript-specific compatibility rules. These are the places where tsc intentionally deviates from soundness:

Quirk	Judge (Sound)	Lawyer (tsc-Compatible)
`any` assignability	Treated as strict set	Both top and bottom type (assignable to and from everything)
Function params	Contravariant	Bivariant for methods (strictFunctionTypes=false)
Object literals	Width subtyping (extra props OK)	Excess property check on "fresh" literals
Void returns	Normal checking	`() => void` accepts any return type
Weak types (TS2559)	Normal structural check	Objects with only optional properties require at least one overlap

This separation means TSZ can reason about actual type safety (what the Judge thinks) while still producing the same diagnostics as tsc (what the Lawyer decides after applying compatibility rules). It also makes it possible to implement a hypothetical "strict mode" in the future that uses only the Judge.

6. Crate Map

TSZ is organized as a Cargo workspace with 12 crates. The dependency direction is strictly one-way: downstream crates depend on upstream ones, never the reverse. This is enforced by CI.

Dependency Graph

Crate Purposes

Crate	LOC	Purpose
`tsz-common`	~20K	Foundation: string interner, diagnostics data, shared types
`tsz-scanner`	~4K	Lexical analysis, tokenization
`tsz-parser`	~24K	Recursive descent parser, AST node arena
`tsz-binder`	~9K	Symbols, scopes, control-flow graph
`tsz-solver`	~70K	Type relations, inference, instantiation, narrowing
`tsz-lowering`	~3K	AST-to-type bridge (converts AST type nodes to TypeIds)
`tsz-checker`	~107K	AST traversal, diagnostics orchestration
`tsz-emitter`	~54K	JavaScript and declaration file generation
`tsz-lsp`	~30K	Language Server Protocol implementation
`tsz-cli`	~26K	Command-line interface, project driver
`tsz-wasm`	~4.5K	WebAssembly bindings
`conformance`	—	Conformance test runner and infrastructure

Boundary Rules

The following cross-layer imports are forbidden and enforced by CI:

Checker must not import TypeKey or use raw interner APIs from Solver
Binder must not import Solver for semantic type decisions
Emitter must not import Checker internals for semantic validation
No layer may bypass the canonical query APIs to reach into another layer's private state

7. Design Principles

Solver-First

If code computes type semantics—subtyping, inference, evaluation, narrowing, instantiation—it belongs in the Solver. The Checker is a thin orchestration layer that walks the AST and asks the Solver questions. This is the architectural target; remaining checker-side semantic logic is technical debt being paid down. Keeping semantics in Solver prevents logic duplication and ensures there is one source of truth for "what does this type mean?"

Arena Everything

AST nodes, symbols, flow graph nodes, and types are all arena-allocated and referenced by u32 handles. No per-operation heap allocation on hot paths. This gives cache-friendly linear memory layouts, zero fragmentation, and O(1) access by index. The tradeoff is that individual items can't be freed until the entire arena is dropped, but compiler passes are batch operations so this is fine.

One Type Universe

TSZ uses one global canonical TypeInterner for persistent semantic types. The system also supports local ephemeral TypeIds for scoped operations, but cross-file semantic identity and relation caches are anchored to the global type universe. This keeps type equality as a fast integer comparison.

tsc Parity Above All

The absolute target is matching tsc behavior exactly. This includes diagnostics (same error code, same span, same message), inference results, compatibility decisions, narrowing behavior, and edge cases. When there's a conflict between "what makes sense" and "what tsc does," tsc wins. Correctness is measured by the conformance test suite, which compares every diagnostic fingerprint against the official TypeScript output.

Lazy Resolution

TypeData::Lazy(DefId) defers type resolution until someone actually needs the type. This is essential for handling circular references (e.g., a linked list type that refers to itself), mutual recursion between type aliases, and cross-file type dependencies. The TypeEnvironment resolves DefId → TypeId on demand and caches the result.

No Cross-Layer Leakage

Each pipeline stage has strict boundaries that are enforced by CI. The Binder can't import the Solver. The Emitter can't import the Checker. The Checker can't construct raw solver types. This one-way dependency direction prevents architectural drift: new code must fit into the existing responsibility model rather than taking shortcuts.

8. Glossary

This glossary explains terms used throughout the codebase and this document. If you're new to compilers or to this project, start here when you encounter an unfamiliar term.

Antecedent

An incoming edge in the control-flow graph. Every flow node tracks which flow nodes come before it—those are its antecedents. A simple assignment has one antecedent (the previous statement). A merge point after an if/else has two: one from the true branch and one from the false branch. The checker walks antecedent chains backward to determine what type narrowings are in effect at any given point in the code.

Arena (Arena Allocation)

A memory management strategy where objects are allocated sequentially in a large contiguous block of memory, and freed all at once when the arena is dropped. Instead of calling malloc for each AST node individually (which fragments memory and is slow), the parser allocates nodes into a pre-sized Vec and hands back an index (NodeIndex). This has three benefits:

Speed: Allocation is just bumping an index. No allocator overhead.
Cache friendliness: Nodes are contiguous in memory, so the CPU prefetcher loads them efficiently.
No fragmentation: One big block instead of thousands of small ones scattered across the heap.

The tradeoff is that you can't free individual items—only the whole arena at once. This is fine for compilers, which are batch operations: parse a file, check it, emit it, then throw away all the data.

AST (Abstract Syntax Tree)

A tree representation of the syntactic structure of source code. The parser reads let x: string = 42 and produces a tree where a VariableStatement node contains a VariableDeclarationList, which contains a VariableDeclaration with children for the name, type annotation, and initializer. "Abstract" means it doesn't include every character (like semicolons or parentheses used only for grouping)—it captures the structure, not the exact characters. In TSZ, the AST is stored in a NodeArena as compact 16-byte node headers.

Atom

A u32 handle representing an interned string. When the scanner sees the identifier x, it doesn't store the string "x" directly—it stores it once in a global string table and returns Atom(47) (or whatever index it gets). Every subsequent occurrence of x anywhere in the program gets the same Atom(47). This means comparing two identifiers for equality is just comparing two integers, which is O(1) regardless of string length. Defined in tsz-common/src/interner.rs.

Bivariance

A variance mode where type parameters are checked in both directions. If Dog extends Animal, then under bivariance, (x: Dog) => void is assignable to (x: Animal) => void and vice versa. This is unsound (it allows calling a function with the wrong argument type) but TypeScript uses it by default for method parameters for backwards compatibility. See also Variance.

Bottom Type

A type that is a subtype of every other type. In TypeScript, this is never (TypeId(2)). A value of type never can be assigned to any type, but no value can actually be of type never. It represents impossible states, like the return type of a function that always throws. Opposite of Top Type.

Cache Line

The smallest unit of data the CPU transfers between main memory and the CPU cache. On most modern processors, a cache line is 64 bytes. When the CPU reads one byte, it actually loads the entire 64-byte block containing that byte. This is why TSZ's 16-byte AST node headers matter: four nodes fit in one cache line, so iterating over nodes is very fast because the CPU loads four at a time.

Coinductive Semantics (Greatest Fixed Point)

A strategy for handling recursive types during subtype checking. The problem: when checking whether interface List { next: List } is a subtype of itself, a naive algorithm would loop forever (to check List <: List, it needs to check List <: List, and so on). Coinductive semantics solves this by keeping a cycle stack of type pairs currently being checked. When the solver encounters a pair that's already on the stack, it returns "yes" (assumes the check succeeds). This is mathematically called the Greatest Fixed Point (GFP): assume everything works until proven otherwise. The alternative (Least Fixed Point, or inductive) would assume failure, which gives wrong results for recursive types.

Conformance

The measure of how closely TSZ matches tsc's output. A conformance test passes when TSZ produces exactly the same set of diagnostics as tsc for a given input file—same error code (e.g., TS2322), same file, same line, same column, same message text. The official TypeScript conformance test suite has ~12,500 tests. As of 2026-02-22, TSZ passes 60.8% (7,650/12,574). See the README for the latest figure.

Contravariance

A variance mode where the subtype relationship is reversed. Function parameters are contravariant: if Dog extends Animal, then (x: Animal) => void is assignable to (x: Dog) => void (the broader parameter type is the subtype). This is sound because a function that accepts any Animal can safely be called where a function accepting only Dogs is expected—it can handle Dogs and more. See also Variance.

Covariance

A variance mode where the subtype relationship goes in the same direction. Return types and property types are covariant: if Dog extends Animal, then () => Dog is assignable to () => Animal. This is sound because anywhere you expect an Animal, receiving a Dog is fine. See also Variance.

DefId

A stable identifier for a type definition (not a type value). When the checker encounters a type reference like User that hasn't been fully resolved yet, it creates a DefId and the solver stores it as Lazy(DefId). Later, when someone actually needs the type (e.g., during a subtype check), the TypeEnvironment resolves the DefId to a concrete TypeId. This lazy approach prevents infinite loops when types reference each other circularly.

Diagnostic

A compiler message reported to the user. In TypeScript, each diagnostic has an error code (e.g., TS2322), a severity (error, warning, suggestion), a source location (file, line, column), and a message. Example: error TS2322: Type 'number' is not assignable to type 'string'. TSZ must produce diagnostics identical to tsc for conformance tests to pass.

Discriminated Union

A union type where each member has a shared property with a unique literal type that identifies it. Example: { kind: "circle", radius: number } | { kind: "square", side: number }. The kind property is the "discriminant." When you check if (shape.kind === "circle"), the solver can narrow the type to just the circle variant. This is a common pattern in TypeScript.

Excess Property Checking

A TypeScript-specific check that reports an error when an object literal has properties not declared in the target type. Normally, structural typing allows extra properties (width subtyping), but when you write a literal directly:

const x: { a: number } = { a: 1, b: 2 };  // Error: 'b' does not exist on type

TypeScript reports the extra b. This only happens when the object is "fresh" (a literal at the assignment site). If you assign through a variable, the freshness is lost and extra properties are silently allowed. See also Freshness.

Flow Graph (Control-Flow Graph / CFG)

A graph built by the binder that models how execution flows through the program. Nodes represent program points (assignments, conditions, function entries). Edges represent possible execution paths. The checker uses this graph for type narrowing: after if (x !== null), the true branch knows x is not null by tracing backward through the flow graph. Each flow node is identified by a FlowNodeId(u32) and tracks its antecedents.

Freshness

A flag on object types that indicates the type was created from an object literal at the current assignment site. Fresh objects trigger excess property checking. Once the object is assigned to a variable or passed through a function, the freshness flag is removed and excess property checking no longer applies. This is managed by the Lawyer (CompatChecker) layer in the solver.

Hash-Consing

A technique for deduplicating data structures by storing each unique value exactly once. When you want to create a new value, you first hash it and check if an identical value already exists. If so, return the existing handle; if not, store the new value and return a fresh handle. In TSZ, the TypeInterner hash-conses types: creating the type string | number twice returns the same TypeId both times. The string interner does the same for identifiers (returning the same Atom). This is the core mechanism behind O(1) equality checks.

Hoisting

A JavaScript/TypeScript behavior where certain declarations are treated as if they appear at the top of their scope, even if they're written later in the code. var declarations and function declarations are hoisted—you can reference them before their textual position. let and const are not hoisted (they exist in a "temporal dead zone" until their declaration). The binder handles this with a multi-pass strategy: first pass collects hoisted declarations, second pass processes block-scoped declarations.

Instantiation

The process of replacing generic type parameters with concrete types. When you write Array<string>, the solver takes the generic Array<T> definition and substitutes T with string throughout the entire type structure. This happens recursively: if the array's methods reference T, those get substituted too. The solver caches instantiation results so that Array<string> used in two different places produces the same TypeId. Implemented in tsz-solver/src/instantiation/.

Interner (String Interner / Type Interner)

A data structure that stores unique values and returns lightweight handles to them. Think of it as a two-way lookup table:

Forward: Given a string like "getElementById", return its Atom(u32).
Reverse: Given an Atom(u32), return the original string.

The first time you intern a string, it's stored and you get back an integer handle. Every subsequent time you intern the same string, you get back the same integer without storing it again. TSZ has two interners:

String interner (tsz-common): Maps strings to Atom(u32). Used for all identifiers, keywords, and string content.
Type interner (tsz-solver): Maps type structures (TypeData) to TypeId(u32). Used for all types.

The key benefit: once interned, equality is just integer comparison. Checking whether two 100-character identifiers are the same string is a single == on two u32 values.

IR (Intermediate Representation)

A structured tree used by the emitter as a middle step between the TypeScript AST and the final JavaScript text output. When the emitter needs to transform TypeScript-specific syntax into JavaScript (e.g., converting an ES2015+ class to ES5 prototype chains, or converting async/await to state machines), it doesn't manipulate strings directly. Instead:

Transform phase: A transform function reads the AST and produces an IR tree made of IRNode variants (FunctionExpr, CallExpr, VarDecl, BinaryExpr, Block, etc.).
Print phase: The IRPrinter walks the IR tree and emits JavaScript text with correct formatting, indentation, and semicolons.

This two-phase approach keeps transforms clean (they describe what the output should look like structurally) and keeps formatting concerns separate (the printer handles how to render it). The IR lives in tsz-emitter/src/transforms/ir.rs.

Lowering

The process of converting AST type syntax nodes into solver TypeIds. When the checker sees a TypeReference node for string in the AST, the lowering layer (tsz-lowering) converts that AST node into TypeId::STRING. For more complex types like { name: string, age: number }, lowering recursively converts each part and interns the resulting Object type. This is a bridge between the parser's syntax world and the solver's type world.

Memoization

Caching the results of expensive computations so they don't need to be repeated. The solver maintains several caches:

Relation cache: "Is A assignable to B?" Once checked, the result is cached. The next time the same pair is queried, it's an O(1) lookup.
Evaluation cache: "What does T extends string ? 'yes' : 'no' evaluate to?" Cached after first evaluation.
Property cache: "What type is the name property of type T?" Cached per (type, property-name) pair.
Instantiation cache: "What is Array<string>?" Cached per (generic, arguments) tuple.

Without memoization, checking a complex program would repeat the same type computations millions of times. With it, each unique computation happens once.

Narrowing (Type Narrowing)

The process of refining a type based on runtime checks. When you write:

function f(x: string | number) {
  if (typeof x === "string") {
    // Here, x is narrowed from string | number to just string
    x.toUpperCase();  // OK: string has toUpperCase
  } else {
    // Here, x is narrowed to just number
    x.toFixed();      // OK: number has toFixed
  }
}

The checker extracts "type guards" from the AST (like typeof x === "string") and passes them to the solver, which computes the narrowed type for each branch. The solver handles union filtering, instanceof, discriminated unions, truthiness checks, in operator, and user-defined type predicates. The flow graph built by the binder tells the checker which guards are active at each point in the code.

NodeArena

The arena that stores all AST nodes. Each node has a 16-byte header (kind, flags, pos, end, data_index) stored contiguously. Variable-sized payloads (child lists, string data) are stored in typed side pools, one per AST node kind. For example, all IdentifierData payloads are stored together in one pool, all CallExpressionData in another. The data_index field in the header points into the right pool. This keeps the main node array compact (4 nodes per cache line).

NodeIndex

A handle (index) into the NodeArena. Instead of using a pointer to reference an AST node, the parser returns a NodeIndex which is just an offset into the arena's array. Looking up a node by its index is O(1) (simple array indexing). Using indices instead of pointers also makes serialization trivial and avoids dangling pointer bugs.

Query Boundaries

The gateway API in the checker (tsz-checker/src/query_boundaries/) that routes type-checking questions to the solver. Instead of the checker calling solver methods directly all over the codebase, assignability checks for TS2322/TS2345/TS2416 diagnostics must go through these boundary functions. This centralization ensures consistent behavior: the boundary handles relation checking, failure reason extraction, suppression policy, and diagnostic rendering in one place.

Recursive Descent Parser

A parsing strategy where each grammar rule becomes a function, and parsing works by calling these functions recursively. For example, parsing an expression calls parse_binary_expression(), which calls parse_unary_expression(), which calls parse_primary_expression(). It's called "recursive" because these functions call each other, and "descent" because it starts from the top-level rule (source file) and works its way down into nested sub-expressions. This is the same approach used by tsc itself.

Scope Tree

A tree of lexical scopes built by the binder. Each scope knows its parent scope and contains a symbol table mapping names to SymbolIds. When the checker needs to resolve a name like x, it looks in the current scope first, then walks up the parent chain until it finds it. Unlike a stack-based approach (push scope on entry, pop on exit), TSZ uses a persistent tree: scopes are never destroyed. This is important for incremental compilation—you can query a scope later without re-building it.

Side Pool (Typed Side Pool)

A secondary array in the NodeArena that stores variable-sized data for a specific AST node kind. The 16-byte node header is too small to hold things like a function's parameter list or a class's member list. So the header has a data_index field that points into a typed side pool. There are 40+ pools: IdentifierData, BinaryExpressionData, FunctionDeclarationData, etc. Each pool only stores data for its specific node kind, keeping things tightly packed and cache-friendly.

Structural Typing

TypeScript's approach to type compatibility: types are compatible based on their structure (what properties and methods they have), not their name or declaration site. If interface A has { name: string } and interface B also has { name: string }, they are compatible even though they're different declarations. This is different from "nominal typing" (used by Java, C#) where two classes with identical structures are different types if they have different names.

Symbol / SymbolId

A named entity in the program: a variable, function, class, interface, enum member, etc. The binder creates a Symbol record for each declaration and assigns it a SymbolId(u32) handle in the symbol arena. The symbol stores the entity's name, kind flags, declaration nodes, parent symbol, and optional member/export tables. The checker uses SymbolId to look up what type corresponds to a given name.

SyntaxKind

A u16 enum that identifies what kind of token or AST node something is. There are ~400 variants covering all of TypeScript: LetKeyword = 121, Identifier = 80, VariableDeclaration = 261, IfStatement = 246, etc. Every node header and every token carries a SyntaxKind. This is how the compiler knows what it's looking at without needing inheritance hierarchies or trait objects.

Top Type

A type that is a supertype of every other type. In TypeScript, unknown (TypeId(3)) is the true top type: any value is assignable to unknown. The type any (TypeId(4)) is special: it acts as both a top type and a bottom type (assignable to and from everything), which is intentionally unsound but serves as TypeScript's escape hatch. Opposite of Bottom Type.

TypeEnvironment

The runtime context that resolves DefId → TypeId during type evaluation and relation checks. When the solver encounters a Lazy(DefId) type, it asks the TypeEnvironment to look up (or compute and cache) the concrete TypeId. The environment also carries type parameter bindings during generic instantiation, so that T resolves to the right concrete type within a given context.

TypeId

A u32 handle into the global type interner. The most important type in the compiler. Every type in the program—from string to Map<string, Promise<number[]>>—is represented by a single TypeId. Comparing two types is comparing two integers: type_a == type_b. The first 100 IDs (0–99) are reserved for built-in types like string, number, any, never, etc. User-defined types start at ID 100.

TypeData / TypeKey

The actual structural representation of a type. TypeData is the public name used at API boundaries; TypeKey is the crate-private internal enum in the solver. It has variants for every kind of TypeScript type: Intrinsic (primitives), Object (structural objects), Union, Intersection, Function, Conditional, Mapped, etc. Checker code must not match on TypeKey directly—it should use solver query APIs instead.

Type Guard

A runtime expression that the compiler uses to narrow a type. Examples: typeof x === "string", x instanceof Array, "prop" in x, or a user-defined predicate function like function isString(x: unknown): x is string. The checker extracts type guards from the AST and converts them to solver TypeGuard structs, then the solver computes the narrowed type.

Variance

How a type parameter's subtype relationship affects the subtype relationship of the containing type. There are four variance modes:

Covariant: Same direction. Dog extends Animal implies Array<Dog> extends Array<Animal>.
Contravariant: Opposite direction. Function parameters go this way for soundness.
Bivariant: Both directions. TypeScript's (unsound) default for method parameters.
Invariant: Neither direction. Mutable references need this for soundness.

In TSZ, the Judge (SubtypeChecker) uses strict variance rules, and the Lawyer (CompatChecker) overrides them where TypeScript's legacy behavior requires it (e.g., bivariant method parameters).

Visitor (TypeVisitor)

A design pattern for traversing the type graph systematically. Instead of writing a match on all 20+ TypeKey variants at every call site, the solver provides a TypeVisitor trait. You implement a few methods (e.g., "what to do when visiting a Union"), and the visitor handles walking the complete type structure, including recursion into nested types, cycle detection, and so on. This prevents bugs where a new type variant is added but some call sites forget to handle it.

Width Subtyping

The rule that a type with more properties is a subtype of a type with fewer properties. { a: number, b: string } is assignable to { a: number } because it has everything { a: number } requires (the a property) plus an extra b. This is the default behavior of structural typing. Excess property checking (triggered by freshness) is the exception: it rejects the extra properties for object literals to catch typos.