Add expression-evaluator: DAGs & state machines tutorial project

Educational calculator teaching FSMs (explicit transition table tokenizer) and DAGs (recursive descent parser with AST evaluation). Includes CLI with REPL, graphviz visualization, and 61 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix bugs, N+1 queries, and wire settings in persian-tutor
2026-02-08 18:09:42 +00:00 · 2026-02-08 15:40:24 +00:00
18 changed files with 1614 additions and 12 deletions
--- a/python/expression-evaluator/CLAUDE.md
+++ b/python/expression-evaluator/CLAUDE.md
@@ -0,0 +1,42 @@
+# Expression Evaluator
+
+## Overview
+Educational project teaching DAGs and state machines through a calculator.
+Pure Python, no external dependencies.
+
+## Running
+```bash
+python main.py "3 + 4 * 2"                          # single expression
+python main.py                                        # REPL mode
+python main.py --show-tokens --show-ast --trace "expr" # show internals
+python main.py --dot "3+4*2" | dot -Tpng -o ast.png   # AST diagram
+python main.py --dot-fsm | dot -Tpng -o fsm.png       # FSM diagram
+```
+
+## Testing
+```bash
+python -m pytest tests/ -v
+```
+
+## Architecture
+- `tokenizer.py` -- Explicit finite state machine (Mealy machine) tokenizer
+- `parser.py` -- Recursive descent parser building an AST (DAG)
+- `evaluator.py` -- Post-order tree walker (topological sort evaluation)
+- `visualize.py` -- Graphviz dot generation for AST and FSM diagrams
+- `main.py` -- CLI entry point with argparse, REPL mode
+
+## Key Design Decisions
+- State machine uses an explicit transition table (dict), not implicit if/else
+- Unary minus resolved by examining previous token context
+- Power operator (`^`) is right-associative (grammar uses right-recursion)
+- AST nodes are dataclasses; evaluation uses structural pattern matching
+- Graphviz output is raw dot strings (no graphviz Python package needed)
+
+## Grammar
+```
+expression ::= term ((PLUS | MINUS) term)*
+term       ::= unary ((MULTIPLY | DIVIDE) unary)*
+unary      ::= UNARY_MINUS unary | power
+power      ::= atom (POWER power)?
+atom       ::= NUMBER | LPAREN expression RPAREN
+```
--- a/python/expression-evaluator/README.md
+++ b/python/expression-evaluator/README.md
@@ -0,0 +1,87 @@
+# Expression Evaluator -- DAGs & State Machines Tutorial
+
+A calculator that teaches two fundamental CS patterns by building them from scratch:
+
+1. **Finite State Machine** -- the tokenizer processes input character-by-character using an explicit transition table
+2. **Directed Acyclic Graph (DAG)** -- the parser builds an expression tree, evaluated bottom-up in topological order
+
+## What You'll Learn
+
+| File | CS Concept | What it does |
+|------|-----------|-------------|
+| `tokenizer.py` | **State Machine** (Mealy machine) | Converts `"3 + 4 * 2"` into tokens using a transition table |
+| `parser.py` | **DAG construction** | Builds an expression tree with operator precedence |
+| `evaluator.py` | **Topological evaluation** | Walks the tree bottom-up (leaves before parents) |
+| `visualize.py` | **Visualization** | Generates graphviz diagrams of both the FSM and AST |
+
+## Quick Start
+
+```bash
+# Evaluate an expression
+python main.py "3 + 4 * 2"
+# => 11
+
+# Interactive REPL
+python main.py
+
+# See how the state machine tokenizes
+python main.py --show-tokens "(2 + 3) * -4"
+
+# See the expression tree (DAG)
+python main.py --show-ast "(2 + 3) * 4"
+# *
+# +-- +
+# |   +-- 2
+# |   `-- 3
+# `-- 4
+
+# Watch evaluation in topological order
+python main.py --trace "(2 + 3) * 4"
+#     Step 1: 2 => 2
+#     Step 2: 3 => 3
+#   Step 3: 2 + 3 => 5
+#   Step 4: 4 => 4
+# Step 5: 5 * 4 => 20
+
+# Generate graphviz diagrams
+python main.py --dot "(2 + 3) * 4" | dot -Tpng -o ast.png
+python main.py --dot-fsm | dot -Tpng -o fsm.png
+```
+
+## Features
+
+- Arithmetic: `+`, `-`, `*`, `/`, `^` (power)
+- Parentheses: `(2 + 3) * 4`
+- Unary minus: `-3`, `-(2 + 1)`, `2 * -3`
+- Decimals: `3.14`, `.5`
+- Standard precedence: parens > `^` > `*`/`/` > `+`/`-`
+- Right-associative power: `2^3^4` = `2^(3^4)`
+- Correct unary minus: `-3^2` = `-(3^2)` = `-9`
+
+## Running Tests
+
+```bash
+python -m pytest tests/ -v
+```
+
+## How the State Machine Works
+
+The tokenizer in `tokenizer.py` uses an **explicit transition table** -- a dictionary mapping `(current_state, character_class)` to `(next_state, action)`. This is the same pattern used in network protocol parsers, regex engines, and compiler lexers.
+
+The three states are:
+- `START` -- between tokens, dispatching based on the next character
+- `INTEGER` -- accumulating digits (e.g., `"12"` so far)
+- `DECIMAL` -- accumulating digits after a decimal point (e.g., `"12.3"`)
+
+Use `--dot-fsm` to generate a visual diagram of the state machine.
+
+## How the DAG Works
+
+The parser in `parser.py` builds an **expression tree** (AST) where:
+- **Leaf nodes** are numbers (no dependencies)
+- **Interior nodes** are operators with edges to their operands
+- **Edges** represent "depends on" relationships
+
+Evaluation in `evaluator.py` walks this tree **bottom-up** -- children before parents. This is exactly a **topological sort** of the DAG: you can only compute a node after all its dependencies are resolved.
+
+Use `--show-ast` to see the tree structure, or `--dot` to generate a graphviz diagram.
--- a/python/expression-evaluator/evaluator.py
+++ b/python/expression-evaluator/evaluator.py
@@ -0,0 +1,147 @@
+"""
+Part 3: DAG Evaluation -- Tree Walker
+=======================================
+Evaluating the AST bottom-up is equivalent to topological-sort
+evaluation of a DAG. We must evaluate a node's children before
+the node itself -- just like in any dependency graph.
+
+For a tree, post-order traversal gives a topological ordering.
+The recursive evaluate() function naturally does this:
+  1. Recursively evaluate all children (dependencies)
+  2. Combine the results (compute this node's value)
+  3. Return the result (make it available to the parent)
+
+This is the same pattern as:
+  - make: build dependencies before the target
+  - pip/npm install: install dependencies before the package
+  - Spreadsheet recalculation: compute referenced cells first
+"""
+
+from parser import NumberNode, BinOpNode, UnaryOpNode, Node
+from tokenizer import TokenType
+
+
+# ---------- Errors ----------
+
+class EvalError(Exception):
+    pass
+
+
+# ---------- Evaluator ----------
+
+OP_SYMBOLS = {
+    TokenType.PLUS: '+',
+    TokenType.MINUS: '-',
+    TokenType.MULTIPLY: '*',
+    TokenType.DIVIDE: '/',
+    TokenType.POWER: '^',
+    TokenType.UNARY_MINUS: 'neg',
+}
+
+
+def evaluate(node):
+    """
+    Evaluate an AST by walking it bottom-up (post-order traversal).
+
+    This is a recursive function that mirrors the DAG structure:
+    each recursive call follows a DAG edge to a child node.
+    Children are evaluated before parents -- topological order.
+    """
+    match node:
+        case NumberNode(value=v):
+            return v
+
+        case UnaryOpNode(op=TokenType.UNARY_MINUS, operand=child):
+            return -evaluate(child)
+
+        case BinOpNode(op=op, left=left, right=right):
+            left_val = evaluate(left)
+            right_val = evaluate(right)
+            match op:
+                case TokenType.PLUS:
+                    return left_val + right_val
+                case TokenType.MINUS:
+                    return left_val - right_val
+                case TokenType.MULTIPLY:
+                    return left_val * right_val
+                case TokenType.DIVIDE:
+                    if right_val == 0:
+                        raise EvalError("division by zero")
+                    return left_val / right_val
+                case TokenType.POWER:
+                    return left_val ** right_val
+
+    raise EvalError(f"unknown node type: {type(node)}")
+
+
+def evaluate_traced(node):
+    """
+    Like evaluate(), but records each step for educational display.
+    Returns (result, list_of_trace_lines).
+
+    The trace shows the topological evaluation order -- how the DAG
+    is evaluated from leaves to root. Each step shows a node being
+    evaluated after all its dependencies are resolved.
+    """
+    steps = []
+    counter = [0]  # mutable counter for step numbering
+
+    def _walk(node, depth):
+        indent = "  " * depth
+        counter[0] += 1
+        step = counter[0]
+
+        match node:
+            case NumberNode(value=v):
+                result = v
+                display = _format_number(v)
+                steps.append(f"{indent}Step {step}: {display} => {_format_number(result)}")
+                return result
+
+            case UnaryOpNode(op=TokenType.UNARY_MINUS, operand=child):
+                child_val = _walk(child, depth + 1)
+                result = -child_val
+                counter[0] += 1
+                step = counter[0]
+                steps.append(
+                    f"{indent}Step {step}: neg({_format_number(child_val)}) "
+                    f"=> {_format_number(result)}"
+                )
+                return result
+
+            case BinOpNode(op=op, left=left, right=right):
+                left_val = _walk(left, depth + 1)
+                right_val = _walk(right, depth + 1)
+                sym = OP_SYMBOLS[op]
+                match op:
+                    case TokenType.PLUS:
+                        result = left_val + right_val
+                    case TokenType.MINUS:
+                        result = left_val - right_val
+                    case TokenType.MULTIPLY:
+                        result = left_val * right_val
+                    case TokenType.DIVIDE:
+                        if right_val == 0:
+                            raise EvalError("division by zero")
+                        result = left_val / right_val
+                    case TokenType.POWER:
+                        result = left_val ** right_val
+                counter[0] += 1
+                step = counter[0]
+                steps.append(
+                    f"{indent}Step {step}: {_format_number(left_val)} {sym} "
+                    f"{_format_number(right_val)} => {_format_number(result)}"
+                )
+                return result
+
+        raise EvalError(f"unknown node type: {type(node)}")
+
+    result = _walk(node, 0)
+    return result, steps
+
+
+def _format_number(v):
+    """Display a number as integer when possible."""
+    if isinstance(v, float) and v == int(v):
+        return str(int(v))
+    return str(v)
--- a/python/expression-evaluator/main.py
+++ b/python/expression-evaluator/main.py
@@ -0,0 +1,163 @@
+"""
+Expression Evaluator -- Learn DAGs & State Machines
+====================================================
+CLI entry point and interactive REPL.
+
+Usage:
+  python main.py "3 + 4 * 2"                          # evaluate
+  python main.py                                        # REPL mode
+  python main.py --show-tokens --show-ast --trace "expr" # show internals
+  python main.py --dot "3 + 4 * 2" | dot -Tpng -o ast.png
+  python main.py --dot-fsm | dot -Tpng -o fsm.png
+"""
+
+import argparse
+import sys
+
+from tokenizer import tokenize, TokenError
+from parser import Parser, ParseError
+from evaluator import evaluate, evaluate_traced, EvalError
+from visualize import ast_to_dot, fsm_to_dot, ast_to_text
+
+
+def process_expression(expr, args):
+    """Tokenize, parse, and evaluate a single expression."""
+    try:
+        tokens = tokenize(expr)
+    except TokenError as e:
+        _print_error(expr, e)
+        return
+
+    if args.show_tokens:
+        print("\nTokens:")
+        for tok in tokens:
+            print(f"  {tok}")
+
+    try:
+        ast = Parser(tokens).parse()
+    except ParseError as e:
+        _print_error(expr, e)
+        return
+
+    if args.show_ast:
+        print("\nAST (text tree):")
+        print(ast_to_text(ast))
+
+    if args.dot:
+        print(ast_to_dot(ast))
+        return  # dot output goes to stdout, skip numeric result
+
+    if args.trace:
+        try:
+            result, steps = evaluate_traced(ast)
+        except EvalError as e:
+            print(f"Eval error: {e}")
+            return
+        print("\nEvaluation trace (topological order):")
+        for step in steps:
+            print(step)
+        print(f"\nResult: {_format_result(result)}")
+    else:
+        try:
+            result = evaluate(ast)
+        except EvalError as e:
+            print(f"Eval error: {e}")
+            return
+        print(_format_result(result))
+
+
+def repl(args):
+    """Interactive read-eval-print loop."""
+    print("Expression Evaluator REPL")
+    print("Type an expression, or 'quit' to exit.")
+    flags = []
+    if args.show_tokens:
+        flags.append("--show-tokens")
+    if args.show_ast:
+        flags.append("--show-ast")
+    if args.trace:
+        flags.append("--trace")
+    if flags:
+        print(f"Active flags: {' '.join(flags)}")
+    print()
+
+    while True:
+        try:
+            line = input(">>> ").strip()
+        except (EOFError, KeyboardInterrupt):
+            print()
+            break
+        if line.lower() in ("quit", "exit", "q"):
+            break
+        if not line:
+            continue
+        process_expression(line, args)
+        print()
+
+
+def _print_error(expr, error):
+    """Print an error with a caret pointing to the position."""
+    print(f"Error: {error}")
+    if hasattr(error, 'position') and error.position is not None:
+        print(f"  {expr}")
+        print(f"  {' ' * error.position}^")
+
+
+def _format_result(v):
+    """Format a numeric result: show as int when possible."""
+    if isinstance(v, float) and v == int(v) and abs(v) < 1e15:
+        return str(int(v))
+    return str(v)
+
+
+def main():
+    arg_parser = argparse.ArgumentParser(
+        description="Expression Evaluator -- learn DAGs and state machines",
+        epilog="Examples:\n"
+               "  python main.py '3 + 4 * 2'\n"
+               "  python main.py --show-tokens --trace '-(3 + 4) ^ 2'\n"
+               "  python main.py --dot '(2+3)*4' | dot -Tpng -o ast.png\n"
+               "  python main.py --dot-fsm | dot -Tpng -o fsm.png",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    arg_parser.add_argument(
+        "expression", nargs="?",
+        help="Expression to evaluate (omit for REPL mode)",
+    )
+    arg_parser.add_argument(
+        "--show-tokens", action="store_true",
+        help="Display tokenizer output",
+    )
+    arg_parser.add_argument(
+        "--show-ast", action="store_true",
+        help="Display AST as indented text tree",
+    )
+    arg_parser.add_argument(
+        "--trace", action="store_true",
+        help="Show step-by-step evaluation trace",
+    )
+    arg_parser.add_argument(
+        "--dot", action="store_true",
+        help="Output AST as graphviz dot (pipe to: dot -Tpng -o ast.png)",
+    )
+    arg_parser.add_argument(
+        "--dot-fsm", action="store_true",
+        help="Output tokenizer FSM as graphviz dot",
+    )
+
+    args = arg_parser.parse_args()
+
+    # Special mode: just print the FSM diagram and exit
+    if args.dot_fsm:
+        print(fsm_to_dot())
+        return
+
+    # REPL mode if no expression given
+    if args.expression is None:
+        repl(args)
+    else:
+        process_expression(args.expression, args)
+
+
+if __name__ == "__main__":
+    main()
--- a/python/expression-evaluator/parser.py
+++ b/python/expression-evaluator/parser.py
@@ -0,0 +1,217 @@
+"""
+Part 2: DAG Construction -- Recursive Descent Parser
+=====================================================
+A parser converts a flat list of tokens into a tree structure (AST).
+The AST is a DAG (Directed Acyclic Graph) where:
+
+  - Nodes are operations (BinOpNode) or values (NumberNode)
+  - Edges point from parent operations to their operands
+  - The graph is acyclic because an operation's inputs are always
+    "simpler" sub-expressions (no circular dependencies)
+  - It is a tree (a special case of DAG) because no node is shared
+
+This is the same structure as:
+  - Spreadsheet dependency graphs (cell A1 depends on B1, B2...)
+  - Build systems (Makefile targets depend on other targets)
+  - Task scheduling (some tasks must finish before others start)
+  - Neural network computation graphs (forward pass is a DAG)
+
+Key DAG concepts demonstrated:
+  - Nodes: operations and values
+  - Directed edges: from operation to its inputs (dependencies)
+  - Acyclic: no circular dependencies
+  - Topological ordering: natural evaluation order (leaves first)
+
+Grammar (BNF) -- precedence is encoded by nesting depth:
+  expression ::= term ((PLUS | MINUS) term)*          # lowest precedence
+  term       ::= unary ((MULTIPLY | DIVIDE) unary)*
+  unary      ::= UNARY_MINUS unary | power
+  power      ::= atom (POWER power)?                  # right-associative
+  atom       ::= NUMBER | LPAREN expression RPAREN    # highest precedence
+
+Call chain: expression -> term -> unary -> power -> atom
+This means: +/- binds loosest, then *//, then unary -, then ^, then parens.
+So -3^2 = -(3^2) = -9, matching standard math convention.
+"""
+
+from dataclasses import dataclass
+
+from tokenizer import Token, TokenType
+
+
+# ---------- AST node types ----------
+# These are the nodes of our DAG. Each node is either a leaf (NumberNode)
+# or an interior node with edges pointing to its children (operands).
+
+@dataclass
+class NumberNode:
+    """Leaf node: a numeric literal. In DAG terms, a node with no outgoing edges."""
+    value: float
+
+    def __repr__(self):
+        if self.value == int(self.value):
+            return f"NumberNode({int(self.value)})"
+        return f"NumberNode({self.value})"
+
+
+@dataclass
+class BinOpNode:
+    """
+    Interior node: a binary operation with two children.
+
+    DAG edges: this node -> left, this node -> right
+    The edges represent "depends on": to compute this node's value,
+    we must first compute left and right.
+    """
+    op: TokenType
+    left: 'NumberNode | BinOpNode | UnaryOpNode'
+    right: 'NumberNode | BinOpNode | UnaryOpNode'
+
+    def __repr__(self):
+        return f"BinOpNode({self.op.name}, {self.left}, {self.right})"
+
+
+@dataclass
+class UnaryOpNode:
+    """Interior node: a unary operation (negation) with one child."""
+    op: TokenType
+    operand: 'NumberNode | BinOpNode | UnaryOpNode'
+
+    def __repr__(self):
+        return f"UnaryOpNode({self.op.name}, {self.operand})"
+
+
+# Union type for any AST node
+Node = NumberNode | BinOpNode | UnaryOpNode
+
+
+# ---------- Errors ----------
+
+class ParseError(Exception):
+    def __init__(self, message, position=None):
+        self.position = position
+        pos_info = f" at position {position}" if position is not None else ""
+        super().__init__(f"Parse error{pos_info}: {message}")
+
+
+# ---------- Recursive descent parser ----------
+
+class Parser:
+    """
+    Converts a list of tokens into an AST (expression tree / DAG).
+
+    Each grammar rule becomes a method. The call tree mirrors the shape
+    of the AST being built. When a deeper method returns a node, it
+    becomes a child of the node built by the caller -- this is how
+    the DAG edges form.
+
+    Precedence is encoded by nesting: lower-precedence operators are
+    parsed at higher (outer) levels, so they become closer to the root
+    of the tree and are evaluated last.
+    """
+
+    def __init__(self, tokens):
+        self.tokens = tokens
+        self.pos = 0
+
+    def peek(self):
+        """Look at the current token without consuming it."""
+        return self.tokens[self.pos]
+
+    def consume(self, expected=None):
+        """Consume and return the current token, optionally asserting its type."""
+        token = self.tokens[self.pos]
+        if expected is not None and token.type != expected:
+            raise ParseError(
+                f"expected {expected.name}, got {token.type.name}",
+                token.position,
+            )
+        self.pos += 1
+        return token
+
+    def parse(self):
+        """Entry point: parse the full expression and verify we consumed everything."""
+        if self.peek().type == TokenType.EOF:
+            raise ParseError("empty expression")
+        node = self.expression()
+        self.consume(TokenType.EOF)
+        return node
+
+    # --- Grammar rules ---
+    # Each method corresponds to one production in the grammar.
+    # The nesting encodes operator precedence.
+
+    def expression(self):
+        """expression ::= term ((PLUS | MINUS) term)*"""
+        node = self.term()
+        while self.peek().type in (TokenType.PLUS, TokenType.MINUS):
+            op_token = self.consume()
+            right = self.term()
+            # Build a new BinOpNode -- this creates a DAG edge from
+            # the new node to both 'node' (left) and 'right'
+            node = BinOpNode(op_token.type, node, right)
+        return node
+
+    def term(self):
+        """term ::= unary ((MULTIPLY | DIVIDE) unary)*"""
+        node = self.unary()
+        while self.peek().type in (TokenType.MULTIPLY, TokenType.DIVIDE):
+            op_token = self.consume()
+            right = self.unary()
+            node = BinOpNode(op_token.type, node, right)
+        return node
+
+    def unary(self):
+        """
+        unary ::= UNARY_MINUS unary | power
+
+        Unary minus is parsed here, between term and power, so it binds
+        looser than ^ but tighter than * and /. This gives the standard
+        math behavior: -3^2 = -(3^2) = -9.
+
+        The recursion (unary calls itself) handles double negation: --3 = 3.
+        """
+        if self.peek().type == TokenType.UNARY_MINUS:
+            op_token = self.consume()
+            operand = self.unary()
+            return UnaryOpNode(op_token.type, operand)
+        return self.power()
+
+    def power(self):
+        """
+        power ::= atom (POWER power)?
+
+        Right-recursive for right-associativity: 2^3^4 = 2^(3^4) = 2^81.
+        Compare with term() which uses a while loop for LEFT-associativity.
+        """
+        node = self.atom()
+        if self.peek().type == TokenType.POWER:
+            op_token = self.consume()
+            right = self.power()  # recurse (not loop) for right-associativity
+            node = BinOpNode(op_token.type, node, right)
+        return node
+
+    def atom(self):
+        """
+        atom ::= NUMBER | LPAREN expression RPAREN
+
+        The base case: either a literal number or a parenthesized
+        sub-expression. Parentheses work by recursing back to
+        expression(), which restarts precedence parsing from the top.
+        """
+        token = self.peek()
+
+        if token.type == TokenType.NUMBER:
+            self.consume()
+            return NumberNode(float(token.value))
+
+        if token.type == TokenType.LPAREN:
+            self.consume()
+            node = self.expression()
+            self.consume(TokenType.RPAREN)
+            return node
+
+        raise ParseError(
+            f"expected number or '(', got {token.type.name}",
+            token.position,
+        )
--- a/python/expression-evaluator/tests/init.py
+++ b/python/expression-evaluator/tests/init.py
--- a/python/expression-evaluator/tests/test_evaluator.py
+++ b/python/expression-evaluator/tests/test_evaluator.py
@@ -0,0 +1,120 @@
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+import pytest
+from tokenizer import tokenize
+from parser import Parser
+from evaluator import evaluate, evaluate_traced, EvalError
+
+
+def eval_expr(expr):
+    """Helper: tokenize -> parse -> evaluate in one step."""
+    tokens = tokenize(expr)
+    ast = Parser(tokens).parse()
+    return evaluate(ast)
+
+
+# ---------- Basic arithmetic ----------
+
+def test_addition():
+    assert eval_expr("3 + 4") == 7.0
+
+def test_subtraction():
+    assert eval_expr("10 - 3") == 7.0
+
+def test_multiplication():
+    assert eval_expr("3 * 4") == 12.0
+
+def test_division():
+    assert eval_expr("10 / 4") == 2.5
+
+def test_power():
+    assert eval_expr("2 ^ 10") == 1024.0
+
+
+# ---------- Precedence ----------
+
+def test_standard_precedence():
+    assert eval_expr("3 + 4 * 2") == 11.0
+
+def test_parentheses():
+    assert eval_expr("(3 + 4) * 2") == 14.0
+
+def test_power_precedence():
+    assert eval_expr("2 * 3 ^ 2") == 18.0
+
+def test_right_associative_power():
+    # 2^(2^3) = 2^8 = 256
+    assert eval_expr("2 ^ 2 ^ 3") == 256.0
+
+
+# ---------- Unary minus ----------
+
+def test_negation():
+    assert eval_expr("-5") == -5.0
+
+def test_double_negation():
+    assert eval_expr("--5") == 5.0
+
+def test_negation_with_power():
+    # -(3^2) = -9, not (-3)^2 = 9
+    assert eval_expr("-3 ^ 2") == -9.0
+
+def test_negation_in_parens():
+    assert eval_expr("(-3) ^ 2") == 9.0
+
+
+# ---------- Decimals ----------
+
+def test_decimal_addition():
+    assert eval_expr("0.1 + 0.2") == pytest.approx(0.3)
+
+def test_leading_dot():
+    assert eval_expr(".5 + .5") == 1.0
+
+
+# ---------- Edge cases ----------
+
+def test_nested_parens():
+    assert eval_expr("((((3))))") == 3.0
+
+def test_complex_expression():
+    assert eval_expr("(2 + 3) * (7 - 2) / 5 ^ 1") == 5.0
+
+def test_long_chain():
+    assert eval_expr("1 + 2 + 3 + 4 + 5") == 15.0
+
+def test_mixed_operations():
+    assert eval_expr("2 + 3 * 4 - 6 / 2") == 11.0
+
+
+# ---------- Division by zero ----------
+
+def test_division_by_zero():
+    with pytest.raises(EvalError):
+        eval_expr("1 / 0")
+
+def test_division_by_zero_in_expression():
+    with pytest.raises(EvalError):
+        eval_expr("5 + 3 / (2 - 2)")
+
+
+# ---------- Traced evaluation ----------
+
+def test_traced_returns_correct_result():
+    tokens = tokenize("3 + 4 * 2")
+    ast = Parser(tokens).parse()
+    result, steps = evaluate_traced(ast)
+    assert result == 11.0
+    assert len(steps) > 0
+
+def test_traced_step_count():
+    """A simple binary op has 3 evaluation events: left, right, combine."""
+    tokens = tokenize("3 + 4")
+    ast = Parser(tokens).parse()
+    result, steps = evaluate_traced(ast)
+    assert result == 7.0
+    # NumberNode(3), NumberNode(4), BinOp(+)
+    assert len(steps) == 3
--- a/python/expression-evaluator/tests/test_parser.py
+++ b/python/expression-evaluator/tests/test_parser.py
@@ -0,0 +1,136 @@
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+import pytest
+from tokenizer import tokenize, TokenType
+from parser import Parser, ParseError, NumberNode, BinOpNode, UnaryOpNode
+
+
+def parse(expr):
+    """Helper: tokenize and parse in one step."""
+    return Parser(tokenize(expr)).parse()
+
+
+# ---------- Basic parsing ----------
+
+def test_parse_number():
+    ast = parse("42")
+    assert isinstance(ast, NumberNode)
+    assert ast.value == 42.0
+
+def test_parse_decimal():
+    ast = parse("3.14")
+    assert isinstance(ast, NumberNode)
+    assert ast.value == 3.14
+
+def test_parse_addition():
+    ast = parse("3 + 4")
+    assert isinstance(ast, BinOpNode)
+    assert ast.op == TokenType.PLUS
+    assert isinstance(ast.left, NumberNode)
+    assert isinstance(ast.right, NumberNode)
+
+
+# ---------- Precedence ----------
+
+def test_multiply_before_add():
+    """3 + 4 * 2 should parse as 3 + (4 * 2)."""
+    ast = parse("3 + 4 * 2")
+    assert ast.op == TokenType.PLUS
+    assert isinstance(ast.right, BinOpNode)
+    assert ast.right.op == TokenType.MULTIPLY
+
+def test_power_before_multiply():
+    """2 * 3 ^ 4 should parse as 2 * (3 ^ 4)."""
+    ast = parse("2 * 3 ^ 4")
+    assert ast.op == TokenType.MULTIPLY
+    assert isinstance(ast.right, BinOpNode)
+    assert ast.right.op == TokenType.POWER
+
+def test_parentheses_override_precedence():
+    """(3 + 4) * 2 should parse as (3 + 4) * 2."""
+    ast = parse("(3 + 4) * 2")
+    assert ast.op == TokenType.MULTIPLY
+    assert isinstance(ast.left, BinOpNode)
+    assert ast.left.op == TokenType.PLUS
+
+
+# ---------- Associativity ----------
+
+def test_left_associative_subtraction():
+    """10 - 3 - 2 should parse as (10 - 3) - 2."""
+    ast = parse("10 - 3 - 2")
+    assert ast.op == TokenType.MINUS
+    assert isinstance(ast.left, BinOpNode)
+    assert ast.left.op == TokenType.MINUS
+    assert isinstance(ast.right, NumberNode)
+
+def test_power_right_associative():
+    """2 ^ 3 ^ 4 should parse as 2 ^ (3 ^ 4)."""
+    ast = parse("2 ^ 3 ^ 4")
+    assert ast.op == TokenType.POWER
+    assert isinstance(ast.left, NumberNode)
+    assert isinstance(ast.right, BinOpNode)
+    assert ast.right.op == TokenType.POWER
+
+
+# ---------- Unary minus ----------
+
+def test_unary_minus():
+    ast = parse("-3")
+    assert isinstance(ast, UnaryOpNode)
+    assert ast.operand.value == 3.0
+
+def test_double_negation():
+    ast = parse("--3")
+    assert isinstance(ast, UnaryOpNode)
+    assert isinstance(ast.operand, UnaryOpNode)
+    assert ast.operand.operand.value == 3.0
+
+def test_unary_minus_precedence():
+    """-3^2 should parse as -(3^2), not (-3)^2."""
+    ast = parse("-3 ^ 2")
+    assert isinstance(ast, UnaryOpNode)
+    assert isinstance(ast.operand, BinOpNode)
+    assert ast.operand.op == TokenType.POWER
+
+def test_unary_minus_in_expression():
+    """2 * -3 should parse as 2 * (-(3))."""
+    ast = parse("2 * -3")
+    assert ast.op == TokenType.MULTIPLY
+    assert isinstance(ast.right, UnaryOpNode)
+
+
+# ---------- Nested parentheses ----------
+
+def test_nested_parens():
+    ast = parse("((3))")
+    assert isinstance(ast, NumberNode)
+    assert ast.value == 3.0
+
+def test_complex_nesting():
+    """((2 + 3) * (7 - 2))"""
+    ast = parse("((2 + 3) * (7 - 2))")
+    assert isinstance(ast, BinOpNode)
+    assert ast.op == TokenType.MULTIPLY
+
+
+# ---------- Errors ----------
+
+def test_missing_rparen():
+    with pytest.raises(ParseError):
+        parse("(3 + 4")
+
+def test_empty_expression():
+    with pytest.raises(ParseError):
+        parse("")
+
+def test_trailing_operator():
+    with pytest.raises(ParseError):
+        parse("3 +")
+
+def test_empty_parens():
+    with pytest.raises(ParseError):
+        parse("()")
--- a/python/expression-evaluator/tests/test_tokenizer.py
+++ b/python/expression-evaluator/tests/test_tokenizer.py
@@ -0,0 +1,139 @@
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+import pytest
+from tokenizer import tokenize, TokenType, Token, TokenError
+
+
+# ---------- Basic tokens ----------
+
+def test_single_integer():
+    tokens = tokenize("42")
+    assert tokens[0].type == TokenType.NUMBER
+    assert tokens[0].value == "42"
+
+def test_decimal_number():
+    tokens = tokenize("3.14")
+    assert tokens[0].type == TokenType.NUMBER
+    assert tokens[0].value == "3.14"
+
+def test_leading_dot():
+    tokens = tokenize(".5")
+    assert tokens[0].type == TokenType.NUMBER
+    assert tokens[0].value == ".5"
+
+def test_all_operators():
+    """Operators between numbers are all binary."""
+    tokens = tokenize("1 + 1 - 1 * 1 / 1 ^ 1")
+    ops = [t.type for t in tokens if t.type not in (TokenType.NUMBER, TokenType.EOF)]
+    assert ops == [
+        TokenType.PLUS, TokenType.MINUS, TokenType.MULTIPLY,
+        TokenType.DIVIDE, TokenType.POWER,
+    ]
+
+def test_operators_between_numbers():
+    tokens = tokenize("1 + 2 - 3 * 4 / 5 ^ 6")
+    ops = [t.type for t in tokens if t.type not in (TokenType.NUMBER, TokenType.EOF)]
+    assert ops == [
+        TokenType.PLUS, TokenType.MINUS, TokenType.MULTIPLY,
+        TokenType.DIVIDE, TokenType.POWER,
+    ]
+
+def test_parentheses():
+    tokens = tokenize("()")
+    assert tokens[0].type == TokenType.LPAREN
+    assert tokens[1].type == TokenType.RPAREN
+
+
+# ---------- Unary minus ----------
+
+def test_unary_minus_at_start():
+    tokens = tokenize("-3")
+    assert tokens[0].type == TokenType.UNARY_MINUS
+    assert tokens[1].type == TokenType.NUMBER
+
+def test_unary_minus_after_lparen():
+    tokens = tokenize("(-3)")
+    assert tokens[1].type == TokenType.UNARY_MINUS
+
+def test_unary_minus_after_operator():
+    tokens = tokenize("2 * -3")
+    assert tokens[2].type == TokenType.UNARY_MINUS
+
+def test_binary_minus():
+    tokens = tokenize("5 - 3")
+    assert tokens[1].type == TokenType.MINUS
+
+def test_double_unary_minus():
+    tokens = tokenize("--3")
+    assert tokens[0].type == TokenType.UNARY_MINUS
+    assert tokens[1].type == TokenType.UNARY_MINUS
+    assert tokens[2].type == TokenType.NUMBER
+
+
+# ---------- Whitespace handling ----------
+
+def test_no_spaces():
+    tokens = tokenize("3+4")
+    non_eof = [t for t in tokens if t.type != TokenType.EOF]
+    assert len(non_eof) == 3
+
+def test_extra_spaces():
+    tokens = tokenize("  3  +  4  ")
+    non_eof = [t for t in tokens if t.type != TokenType.EOF]
+    assert len(non_eof) == 3
+
+
+# ---------- Position tracking ----------
+
+def test_positions():
+    tokens = tokenize("3 + 4")
+    assert tokens[0].position == 0  # '3'
+    assert tokens[1].position == 2  # '+'
+    assert tokens[2].position == 4  # '4'
+
+
+# ---------- Errors ----------
+
+def test_invalid_character():
+    with pytest.raises(TokenError):
+        tokenize("3 & 4")
+
+def test_double_dot():
+    with pytest.raises(TokenError):
+        tokenize("3.14.15")
+
+
+# ---------- EOF token ----------
+
+def test_eof_always_present():
+    tokens = tokenize("42")
+    assert tokens[-1].type == TokenType.EOF
+
+def test_empty_input():
+    tokens = tokenize("")
+    assert len(tokens) == 1
+    assert tokens[0].type == TokenType.EOF
+
+
+# ---------- Complex expressions ----------
+
+def test_complex_expression():
+    tokens = tokenize("(3 + 4.5) * -2 ^ 3")
+    types = [t.type for t in tokens if t.type != TokenType.EOF]
+    assert types == [
+        TokenType.LPAREN, TokenType.NUMBER, TokenType.PLUS,
+        TokenType.NUMBER, TokenType.RPAREN, TokenType.MULTIPLY,
+        TokenType.UNARY_MINUS, TokenType.NUMBER, TokenType.POWER,
+        TokenType.NUMBER,
+    ]
+
+def test_adjacent_parens():
+    tokens = tokenize("(3)(4)")
+    types = [t.type for t in tokens if t.type != TokenType.EOF]
+    assert types == [
+        TokenType.LPAREN, TokenType.NUMBER, TokenType.RPAREN,
+        TokenType.LPAREN, TokenType.NUMBER, TokenType.RPAREN,
+    ]
--- a/python/expression-evaluator/tokenizer.py
+++ b/python/expression-evaluator/tokenizer.py
@@ -0,0 +1,306 @@
+"""
+Part 1: State Machine Tokenizer
+================================
+A tokenizer (lexer) converts raw text into a stream of tokens.
+This implementation uses an EXPLICIT finite state machine (FSM):
+
+  - States are named values (an enum), not implicit control flow
+  - A transition table maps (current_state, input_class) -> (next_state, action)
+  - The main loop reads one character at a time and consults the table
+
+This is the same pattern used in:
+  - Network protocol parsers (HTTP, TCP state machines)
+  - Regular expression engines
+  - Compiler front-ends (lexers for C, Python, etc.)
+  - Game AI (enemy behavior states)
+
+Key FSM concepts demonstrated:
+  - States: the "memory" of what we're currently building
+  - Transitions: rules for moving between states based on input
+  - Actions: side effects (emit a token, accumulate a character)
+  - Mealy machine: outputs depend on both state AND input
+"""
+
+from dataclasses import dataclass
+from enum import Enum
+
+
+# ---------- Token types ----------
+
+class TokenType(Enum):
+    NUMBER      = "NUMBER"
+    PLUS        = "PLUS"
+    MINUS       = "MINUS"
+    MULTIPLY    = "MULTIPLY"
+    DIVIDE      = "DIVIDE"
+    POWER       = "POWER"
+    LPAREN      = "LPAREN"
+    RPAREN      = "RPAREN"
+    UNARY_MINUS = "UNARY_MINUS"
+    EOF         = "EOF"
+
+
+@dataclass
+class Token:
+    type: TokenType
+    value: str          # raw text: "42", "+", "(", etc.
+    position: int       # character offset in original expression
+
+    def __repr__(self):
+        return f"Token({self.type.name}, {self.value!r}, pos={self.position})"
+
+
+OPERATOR_MAP = {
+    '+': TokenType.PLUS,
+    '-': TokenType.MINUS,
+    '*': TokenType.MULTIPLY,
+    '/': TokenType.DIVIDE,
+    '^': TokenType.POWER,
+}
+
+
+# ---------- FSM state definitions ----------
+
+class State(Enum):
+    """
+    The tokenizer's finite set of states.
+
+    START   -- idle / between tokens, deciding what comes next
+    INTEGER -- accumulating digits of an integer (e.g. "12" so far)
+    DECIMAL -- accumulating digits after a decimal point (e.g. "12.3" so far)
+    """
+    START   = "START"
+    INTEGER = "INTEGER"
+    DECIMAL = "DECIMAL"
+
+
+class CharClass(Enum):
+    """
+    Character classification -- groups raw characters into categories
+    so the transition table stays small and readable.
+    """
+    DIGIT    = "DIGIT"
+    DOT      = "DOT"
+    OPERATOR = "OPERATOR"
+    LPAREN   = "LPAREN"
+    RPAREN   = "RPAREN"
+    SPACE    = "SPACE"
+    EOF      = "EOF"
+    UNKNOWN  = "UNKNOWN"
+
+
+class Action(Enum):
+    """
+    What the FSM does on a transition. In a Mealy machine, the output
+    (action) depends on both the current state AND the input.
+    """
+    ACCUMULATE              = "ACCUMULATE"
+    EMIT_NUMBER             = "EMIT_NUMBER"
+    EMIT_OPERATOR           = "EMIT_OPERATOR"
+    EMIT_LPAREN             = "EMIT_LPAREN"
+    EMIT_RPAREN             = "EMIT_RPAREN"
+    EMIT_NUMBER_THEN_OP     = "EMIT_NUMBER_THEN_OP"
+    EMIT_NUMBER_THEN_LPAREN = "EMIT_NUMBER_THEN_LPAREN"
+    EMIT_NUMBER_THEN_RPAREN = "EMIT_NUMBER_THEN_RPAREN"
+    EMIT_NUMBER_THEN_DONE   = "EMIT_NUMBER_THEN_DONE"
+    SKIP                    = "SKIP"
+    DONE                    = "DONE"
+    ERROR                   = "ERROR"
+
+
+@dataclass(frozen=True)
+class Transition:
+    next_state: State
+    action: Action
+
+
+# ---------- Transition table ----------
+# This is the heart of the state machine. Every (state, char_class) pair
+# maps to exactly one transition: a next state and an action to perform.
+# Making this a data structure (not nested if/else) means we can:
+#   1. Inspect it programmatically (e.g. to generate a diagram)
+#   2. Verify completeness (every combination is covered)
+#   3. Understand the FSM at a glance
+
+TRANSITIONS = {
+    # --- START: between tokens, dispatch based on character class ---
+    (State.START, CharClass.DIGIT):    Transition(State.INTEGER, Action.ACCUMULATE),
+    (State.START, CharClass.DOT):      Transition(State.DECIMAL, Action.ACCUMULATE),
+    (State.START, CharClass.OPERATOR): Transition(State.START,   Action.EMIT_OPERATOR),
+    (State.START, CharClass.LPAREN):   Transition(State.START,   Action.EMIT_LPAREN),
+    (State.START, CharClass.RPAREN):   Transition(State.START,   Action.EMIT_RPAREN),
+    (State.START, CharClass.SPACE):    Transition(State.START,   Action.SKIP),
+    (State.START, CharClass.EOF):      Transition(State.START,   Action.DONE),
+
+    # --- INTEGER: accumulating digits like "123" ---
+    (State.INTEGER, CharClass.DIGIT):    Transition(State.INTEGER, Action.ACCUMULATE),
+    (State.INTEGER, CharClass.DOT):      Transition(State.DECIMAL, Action.ACCUMULATE),
+    (State.INTEGER, CharClass.OPERATOR): Transition(State.START,   Action.EMIT_NUMBER_THEN_OP),
+    (State.INTEGER, CharClass.LPAREN):   Transition(State.START,   Action.EMIT_NUMBER_THEN_LPAREN),
+    (State.INTEGER, CharClass.RPAREN):   Transition(State.START,   Action.EMIT_NUMBER_THEN_RPAREN),
+    (State.INTEGER, CharClass.SPACE):    Transition(State.START,   Action.EMIT_NUMBER),
+    (State.INTEGER, CharClass.EOF):      Transition(State.START,   Action.EMIT_NUMBER_THEN_DONE),
+
+    # --- DECIMAL: accumulating digits after "." like "123.45" ---
+    (State.DECIMAL, CharClass.DIGIT):    Transition(State.DECIMAL, Action.ACCUMULATE),
+    (State.DECIMAL, CharClass.DOT):      Transition(State.START,   Action.ERROR),
+    (State.DECIMAL, CharClass.OPERATOR): Transition(State.START,   Action.EMIT_NUMBER_THEN_OP),
+    (State.DECIMAL, CharClass.LPAREN):   Transition(State.START,   Action.EMIT_NUMBER_THEN_LPAREN),
+    (State.DECIMAL, CharClass.RPAREN):   Transition(State.START,   Action.EMIT_NUMBER_THEN_RPAREN),
+    (State.DECIMAL, CharClass.SPACE):    Transition(State.START,   Action.EMIT_NUMBER),
+    (State.DECIMAL, CharClass.EOF):      Transition(State.START,   Action.EMIT_NUMBER_THEN_DONE),
+}
+
+
+# ---------- Errors ----------
+
+class TokenError(Exception):
+    def __init__(self, message, position):
+        self.position = position
+        super().__init__(f"Token error at position {position}: {message}")
+
+
+# ---------- Character classification ----------
+
+def classify(ch):
+    """Map a single character to its CharClass."""
+    if ch.isdigit():
+        return CharClass.DIGIT
+    if ch == '.':
+        return CharClass.DOT
+    if ch in OPERATOR_MAP:
+        return CharClass.OPERATOR
+    if ch == '(':
+        return CharClass.LPAREN
+    if ch == ')':
+        return CharClass.RPAREN
+    if ch.isspace():
+        return CharClass.SPACE
+    return CharClass.UNKNOWN
+
+
+# ---------- Main tokenize function ----------
+
+def tokenize(expression):
+    """
+    Process an expression string through the state machine, producing tokens.
+
+    The main loop:
+      1. Classify the current character
+      2. Look up (state, char_class) in the transition table
+      3. Execute the action (accumulate, emit, skip, etc.)
+      4. Move to the next state
+      5. Advance to the next character
+
+    After all tokens are emitted, a post-processing step resolves
+    unary minus: if a MINUS token appears at the start, after an operator,
+    or after LPAREN, it is re-classified as UNARY_MINUS.
+    """
+    state = State.START
+    buffer = []             # characters accumulated for the current token
+    buffer_start = 0        # position where the current buffer started
+    tokens = []
+    pos = 0
+
+    # Append a sentinel so EOF is handled uniformly in the loop
+    chars = expression + '\0'
+
+    while pos <= len(expression):
+        ch = chars[pos]
+        char_class = CharClass.EOF if pos == len(expression) else classify(ch)
+
+        if char_class == CharClass.UNKNOWN:
+            raise TokenError(f"unexpected character {ch!r}", pos)
+
+        # Look up the transition
+        key = (state, char_class)
+        transition = TRANSITIONS.get(key)
+        if transition is None:
+            raise TokenError(f"no transition for state={state.name}, input={char_class.name}", pos)
+
+        action = transition.action
+        next_state = transition.next_state
+
+        # --- Execute the action ---
+
+        if action == Action.ACCUMULATE:
+            if not buffer:
+                buffer_start = pos
+            buffer.append(ch)
+
+        elif action == Action.EMIT_NUMBER:
+            tokens.append(Token(TokenType.NUMBER, ''.join(buffer), buffer_start))
+            buffer.clear()
+
+        elif action == Action.EMIT_OPERATOR:
+            tokens.append(Token(OPERATOR_MAP[ch], ch, pos))
+
+        elif action == Action.EMIT_LPAREN:
+            tokens.append(Token(TokenType.LPAREN, ch, pos))
+
+        elif action == Action.EMIT_RPAREN:
+            tokens.append(Token(TokenType.RPAREN, ch, pos))
+
+        elif action == Action.EMIT_NUMBER_THEN_OP:
+            tokens.append(Token(TokenType.NUMBER, ''.join(buffer), buffer_start))
+            buffer.clear()
+            tokens.append(Token(OPERATOR_MAP[ch], ch, pos))
+
+        elif action == Action.EMIT_NUMBER_THEN_LPAREN:
+            tokens.append(Token(TokenType.NUMBER, ''.join(buffer), buffer_start))
+            buffer.clear()
+            tokens.append(Token(TokenType.LPAREN, ch, pos))
+
+        elif action == Action.EMIT_NUMBER_THEN_RPAREN:
+            tokens.append(Token(TokenType.NUMBER, ''.join(buffer), buffer_start))
+            buffer.clear()
+            tokens.append(Token(TokenType.RPAREN, ch, pos))
+
+        elif action == Action.EMIT_NUMBER_THEN_DONE:
+            tokens.append(Token(TokenType.NUMBER, ''.join(buffer), buffer_start))
+            buffer.clear()
+
+        elif action == Action.SKIP:
+            pass
+
+        elif action == Action.DONE:
+            pass
+
+        elif action == Action.ERROR:
+            raise TokenError(f"unexpected {ch!r} in state {state.name}", pos)
+
+        state = next_state
+        pos += 1
+
+    # --- Post-processing: resolve unary minus ---
+    # A MINUS is unary if it appears:
+    #   - at the very start of the token stream
+    #   - immediately after an operator (+, -, *, /, ^) or LPAREN
+    # This context-sensitivity cannot be captured by the FSM alone --
+    # it requires looking at previously emitted tokens.
+    _resolve_unary_minus(tokens)
+
+    tokens.append(Token(TokenType.EOF, '', len(expression)))
+    return tokens
+
+
+def _resolve_unary_minus(tokens):
+    """
+    Convert binary MINUS tokens to UNARY_MINUS where appropriate.
+
+    Why this isn't in the FSM: the FSM processes characters one at a time
+    and only tracks what kind of token it's currently building (its state).
+    But whether '-' is unary or binary depends on the PREVIOUS TOKEN --
+    information the FSM doesn't track. This is a common real-world pattern:
+    the lexer handles most work, then a lightweight post-pass adds context.
+    """
+    unary_predecessor = {
+        TokenType.PLUS, TokenType.MINUS, TokenType.MULTIPLY,
+        TokenType.DIVIDE, TokenType.POWER, TokenType.LPAREN,
+        TokenType.UNARY_MINUS,
+    }
+    for i, token in enumerate(tokens):
+        if token.type != TokenType.MINUS:
+            continue
+        if i == 0 or tokens[i - 1].type in unary_predecessor:
+            tokens[i] = Token(TokenType.UNARY_MINUS, token.value, token.position)
--- a/python/expression-evaluator/visualize.py
+++ b/python/expression-evaluator/visualize.py
@@ -0,0 +1,200 @@
+"""
+Part 4: Visualization -- Graphviz Dot Output
+==============================================
+Generate graphviz dot-format strings for:
+  1. The tokenizer's finite state machine (FSM)
+  2. Any expression's AST (DAG)
+  3. Text-based tree rendering for the terminal
+
+No external dependencies -- outputs raw dot strings that can be piped
+to the 'dot' command: python main.py --dot "3+4*2" | dot -Tpng -o ast.png
+"""
+
+from parser import NumberNode, BinOpNode, UnaryOpNode, Node
+from tokenizer import TRANSITIONS, State, CharClass, Action, TokenType
+
+
+# ---------- FSM diagram ----------
+
+# Human-readable labels for character classes
+_CHAR_LABELS = {
+    CharClass.DIGIT:    "digit",
+    CharClass.DOT:      "'.'",
+    CharClass.OPERATOR: "op",
+    CharClass.LPAREN:   "'('",
+    CharClass.RPAREN:   "')'",
+    CharClass.SPACE:    "space",
+    CharClass.EOF:      "EOF",
+}
+
+# Short labels for actions
+_ACTION_LABELS = {
+    Action.ACCUMULATE:              "accum",
+    Action.EMIT_NUMBER:             "emit num",
+    Action.EMIT_OPERATOR:           "emit op",
+    Action.EMIT_LPAREN:             "emit '('",
+    Action.EMIT_RPAREN:             "emit ')'",
+    Action.EMIT_NUMBER_THEN_OP:     "emit num+op",
+    Action.EMIT_NUMBER_THEN_LPAREN: "emit num+'('",
+    Action.EMIT_NUMBER_THEN_RPAREN: "emit num+')'",
+    Action.EMIT_NUMBER_THEN_DONE:   "emit num, done",
+    Action.SKIP:                    "skip",
+    Action.DONE:                    "done",
+    Action.ERROR:                   "ERROR",
+}
+
+
+def fsm_to_dot():
+    """
+    Generate a graphviz dot diagram of the tokenizer's state machine.
+
+    Reads the TRANSITIONS table directly -- because the FSM is data (a dict),
+    we can programmatically inspect and visualize it. This is a key advantage
+    of explicit state machines over implicit if/else control flow.
+    """
+    lines = [
+        'digraph FSM {',
+        '    rankdir=LR;',
+        '    node [shape=circle, fontname="Helvetica"];',
+        '    edge [fontname="Helvetica", fontsize=10];',
+        '',
+        '    // Start indicator',
+        '    __start__ [shape=point, width=0.2];',
+        '    __start__ -> START;',
+        '',
+    ]
+
+    # Collect edges grouped by (src, dst) to merge labels
+    edge_labels = {}
+    for (state, char_class), transition in TRANSITIONS.items():
+        src = state.name
+        dst = transition.next_state.name
+        char_label = _CHAR_LABELS.get(char_class, char_class.name)
+        action_label = _ACTION_LABELS.get(transition.action, transition.action.name)
+        label = f"{char_label} / {action_label}"
+        edge_labels.setdefault((src, dst), []).append(label)
+
+    # Emit edges
+    for (src, dst), labels in sorted(edge_labels.items()):
+        combined = "\\n".join(labels)
+        lines.append(f'    {src} -> {dst} [label="{combined}"];')
+
+    lines.append('}')
+    return '\n'.join(lines)
+
+
+# ---------- AST diagram ----------
+
+_OP_LABELS = {
+    TokenType.PLUS: '+',
+    TokenType.MINUS: '-',
+    TokenType.MULTIPLY: '*',
+    TokenType.DIVIDE: '/',
+    TokenType.POWER: '^',
+    TokenType.UNARY_MINUS: 'neg',
+}
+
+
+def ast_to_dot(node):
+    """
+    Generate a graphviz dot diagram of an AST (expression tree / DAG).
+
+    Each node gets a unique ID. Edges go from parent to children,
+    showing the directed acyclic structure. Leaves are boxed,
+    operators are ellipses.
+    """
+    lines = [
+        'digraph AST {',
+        '    node [fontname="Helvetica"];',
+        '    edge [fontname="Helvetica"];',
+        '',
+    ]
+    counter = [0]
+
+    def _visit(node):
+        nid = f"n{counter[0]}"
+        counter[0] += 1
+
+        match node:
+            case NumberNode(value=v):
+                label = _format_number(v)
+                lines.append(f'    {nid} [label="{label}", shape=box, style=rounded];')
+                return nid
+
+            case UnaryOpNode(op=op, operand=child):
+                label = _OP_LABELS.get(op, op.name)
+                lines.append(f'    {nid} [label="{label}", shape=ellipse];')
+                child_id = _visit(child)
+                lines.append(f'    {nid} -> {child_id};')
+                return nid
+
+            case BinOpNode(op=op, left=left, right=right):
+                label = _OP_LABELS.get(op, op.name)
+                lines.append(f'    {nid} [label="{label}", shape=ellipse];')
+                left_id = _visit(left)
+                right_id = _visit(right)
+                lines.append(f'    {nid} -> {left_id} [label="L"];')
+                lines.append(f'    {nid} -> {right_id} [label="R"];')
+                return nid
+
+    _visit(node)
+    lines.append('}')
+    return '\n'.join(lines)
+
+
+# ---------- Text-based tree ----------
+
+def ast_to_text(node, prefix="", connector=""):
+    """
+    Render the AST as an indented text tree for terminal display.
+
+    Example output for (2 + 3) * 4:
+      *
+      +-- +
+      |   +-- 2
+      |   +-- 3
+      +-- 4
+    """
+    match node:
+        case NumberNode(value=v):
+            label = _format_number(v)
+        case UnaryOpNode(op=op):
+            label = _OP_LABELS.get(op, op.name)
+        case BinOpNode(op=op):
+            label = _OP_LABELS.get(op, op.name)
+
+    lines = [f"{prefix}{connector}{label}"]
+
+    children = _get_children(node)
+    for i, child in enumerate(children):
+        is_last_child = (i == len(children) - 1)
+        if connector:
+            # Extend the prefix: if we used "+-- " then next children
+            # see "|   " (continuing) or "    " (last child)
+            child_prefix = prefix + ("|   " if connector == "+-- " else "    ")
+        else:
+            child_prefix = prefix
+        child_connector = "+-- " if is_last_child else "+-- "
+        # Use a different lead for non-last: the vertical bar continues
+        child_connector = "`-- " if is_last_child else "+-- "
+        child_lines = ast_to_text(child, child_prefix, child_connector)
+        lines.append(child_lines)
+
+    return '\n'.join(lines)
+
+
+def _get_children(node):
+    match node:
+        case NumberNode():
+            return []
+        case UnaryOpNode(operand=child):
+            return [child]
+        case BinOpNode(left=left, right=right):
+            return [left, right]
+    return []
+
+
+def _format_number(v):
+    if isinstance(v, float) and v == int(v):
+        return str(int(v))
+    return str(v)
--- a/python/persian-tutor/ai.py
+++ b/python/persian-tutor/ai.py
@@ -6,9 +6,18 @@ import ollama

 DEFAULT_OLLAMA_MODEL = "qwen2.5:7b"

+_ollama_model = DEFAULT_OLLAMA_MODEL

-def ask_ollama(prompt, system=None, model=DEFAULT_OLLAMA_MODEL):
+
+def set_ollama_model(model):
+    """Change the Ollama model used for fast queries."""
+    global _ollama_model
+    _ollama_model = model
+
+
+def ask_ollama(prompt, system=None, model=None):
    """Query Ollama with an optional system prompt."""
+    model = model or _ollama_model
    messages = []
    if system:
        messages.append({"role": "system", "content": system})
@@ -24,6 +33,8 @@ def ask_claude(prompt):
        capture_output=True,
        text=True,
    )
+    if result.returncode != 0:
+        raise RuntimeError(f"Claude CLI failed (exit {result.returncode}): {result.stderr.strip()}")
    return result.stdout.strip()


@@ -34,8 +45,9 @@ def ask(prompt, system=None, quality="fast"):
    return ask_ollama(prompt, system=system)


-def chat_ollama(messages, system=None, model=DEFAULT_OLLAMA_MODEL):
+def chat_ollama(messages, system=None, model=None):
    """Multi-turn conversation with Ollama."""
+    model = model or _ollama_model
    all_messages = []
    if system:
        all_messages.append({"role": "system", "content": system})
--- a/python/persian-tutor/anki_export.py
+++ b/python/persian-tutor/anki_export.py
@@ -1,7 +1,6 @@
 """Generate Anki .apkg decks from vocabulary data."""

 import genanki
-import random

 # Stable model/deck IDs (generated once, kept constant)
 _MODEL_ID = 1607392319
--- a/python/persian-tutor/app.py
+++ b/python/persian-tutor/app.py
@@ -7,6 +7,7 @@ import time

 import gradio as gr

+import ai
 import db
 from modules import vocab, dashboard, essay, tutor, idioms
 from modules.essay import GCSE_THEMES
@@ -214,6 +215,15 @@ def do_anki_export(cats_selected):
    return path


+def update_ollama_model(model):
+    ai.set_ollama_model(model)
+
+
+def update_whisper_size(size):
+    from stt import set_whisper_size
+    set_whisper_size(size)
+
+
 def reset_progress():
    conn = db.get_connection()
    conn.execute("DELETE FROM word_progress")
@@ -491,6 +501,10 @@ with gr.Blocks(title="Persian Language Tutor") as app:

            export_btn.click(fn=do_anki_export, inputs=[export_cats], outputs=[export_file])

+            # Wire model settings
+            ollama_model.change(fn=update_ollama_model, inputs=[ollama_model])
+            whisper_size.change(fn=update_whisper_size, inputs=[whisper_size])
+
            gr.Markdown("### Reset")
            reset_btn = gr.Button("Reset All Progress", variant="stop")
            reset_status = gr.Markdown("")
--- a/python/persian-tutor/db.py
+++ b/python/persian-tutor/db.py
@@ -2,7 +2,7 @@

 import json
 import sqlite3
-from datetime import datetime, timezone
+from datetime import datetime, timedelta, timezone
 from pathlib import Path

 import fsrs
@@ -148,6 +148,13 @@ def get_word_counts(total_vocab_size=0):
    }


+def get_all_word_progress():
+    """Return all word progress as a dict of word_id -> progress dict."""
+    conn = get_connection()
+    rows = conn.execute("SELECT * FROM word_progress").fetchall()
+    return {row["word_id"]: dict(row) for row in rows}
+
+
 def record_quiz_session(category, total_questions, correct, duration_seconds):
    """Log a completed flashcard session."""
    conn = get_connection()
@@ -203,7 +210,7 @@ def get_stats():
    today = datetime.now(timezone.utc).date()
    for i, row in enumerate(days):
        day = datetime.fromisoformat(row["d"]).date() if isinstance(row["d"], str) else row["d"]
-        expected = today - __import__("datetime").timedelta(days=i)
+        expected = today - timedelta(days=i)
        if day == expected:
            streak += 1
        else:
--- a/python/persian-tutor/modules/dashboard.py
+++ b/python/persian-tutor/modules/dashboard.py
@@ -19,17 +19,17 @@ def get_category_breakdown():
    """Return progress per category as list of dicts."""
    vocab = load_vocab()
    categories = get_categories()
+    all_progress = db.get_all_word_progress()

    breakdown = []
    for cat in categories:
        cat_words = [e for e in vocab if e["category"] == cat]
-        cat_ids = {e["id"] for e in cat_words}
        total = len(cat_words)

        seen = 0
        mastered = 0
-        for wid in cat_ids:
-            progress = db.get_word_progress(wid)
+        for e in cat_words:
+            progress = all_progress.get(e["id"])
            if progress:
                seen += 1
                if progress["stability"] and progress["stability"] > 10:
--- a/python/persian-tutor/modules/vocab.py
+++ b/python/persian-tutor/modules/vocab.py
@@ -84,8 +84,9 @@ def get_flashcard_batch(count=10, category=None):
    remaining = count - len(due_entries)
    if remaining > 0:
        seen_ids = {e["id"] for e in due_entries}
+        all_progress = db.get_all_word_progress()
        # Prefer unseen words
-        unseen = [e for e in pool if e["id"] not in seen_ids and not db.get_word_progress(e["id"])]
+        unseen = [e for e in pool if e["id"] not in seen_ids and e["id"] not in all_progress]
        if len(unseen) >= remaining:
            fill = random.sample(unseen, remaining)
        else:
--- a/python/persian-tutor/stt.py
+++ b/python/persian-tutor/stt.py
@@ -1,13 +1,17 @@
 """Persian speech-to-text wrapper using sttlib."""

 import sys
+from pathlib import Path

 import numpy as np

-sys.path.insert(0, "/home/ys/family-repo/Code/python/tool-speechtotext")
+# sttlib lives in sibling project tool-speechtotext
+_sttlib_path = str(Path(__file__).resolve().parent.parent / "tool-speechtotext")
+sys.path.insert(0, _sttlib_path)
 from sttlib import load_whisper_model, transcribe, is_hallucination

 _model = None
+_whisper_size = "medium"

 # Common Whisper hallucinations in Persian/silence
 PERSIAN_HALLUCINATIONS = [
@@ -18,11 +22,19 @@ PERSIAN_HALLUCINATIONS = [
 ]


-def get_model(size="medium"):
+def set_whisper_size(size):
+    """Change the Whisper model size. Reloads on next transcription."""
+    global _whisper_size, _model
+    if size != _whisper_size:
+        _whisper_size = size
+        _model = None
+
+
+def get_model():
    """Load Whisper model (cached singleton)."""
    global _model
    if _model is None:
-        _model = load_whisper_model(size)
+        _model = load_whisper_model(_whisper_size)
    return _model