# Expression Evaluator -- DAGs & State Machines Tutorial A calculator that teaches two fundamental CS patterns by building them from scratch: 1. **Finite State Machine** -- the tokenizer processes input character-by-character using an explicit transition table 2. **Directed Acyclic Graph (DAG)** -- the parser builds an expression tree, evaluated bottom-up in topological order ## What You'll Learn | File | CS Concept | What it does | |------|-----------|-------------| | `tokenizer.py` | **State Machine** (Mealy machine) | Converts `"3 + 4 * 2"` into tokens using a transition table | | `parser.py` | **DAG construction** | Builds an expression tree with operator precedence | | `evaluator.py` | **Topological evaluation** | Walks the tree bottom-up (leaves before parents) | | `visualize.py` | **Visualization** | Generates graphviz diagrams of both the FSM and AST | ## Quick Start ```bash # Evaluate an expression python main.py "3 + 4 * 2" # => 11 # Interactive REPL python main.py # See how the state machine tokenizes python main.py --show-tokens "(2 + 3) * -4" # See the expression tree (DAG) python main.py --show-ast "(2 + 3) * 4" # * # +-- + # | +-- 2 # | `-- 3 # `-- 4 # Watch evaluation in topological order python main.py --trace "(2 + 3) * 4" # Step 1: 2 => 2 # Step 2: 3 => 3 # Step 3: 2 + 3 => 5 # Step 4: 4 => 4 # Step 5: 5 * 4 => 20 # Generate graphviz diagrams python main.py --dot "(2 + 3) * 4" | dot -Tpng -o ast.png python main.py --dot-fsm | dot -Tpng -o fsm.png ``` ## Features - Arithmetic: `+`, `-`, `*`, `/`, `^` (power) - Parentheses: `(2 + 3) * 4` - Unary minus: `-3`, `-(2 + 1)`, `2 * -3` - Decimals: `3.14`, `.5` - Standard precedence: parens > `^` > `*`/`/` > `+`/`-` - Right-associative power: `2^3^4` = `2^(3^4)` - Correct unary minus: `-3^2` = `-(3^2)` = `-9` ## Running Tests ```bash python -m pytest tests/ -v ``` ## How the State Machine Works The tokenizer in `tokenizer.py` uses an **explicit transition table** -- a dictionary mapping `(current_state, character_class)` to `(next_state, action)`. This is the same pattern used in network protocol parsers, regex engines, and compiler lexers. The three states are: - `START` -- between tokens, dispatching based on the next character - `INTEGER` -- accumulating digits (e.g., `"12"` so far) - `DECIMAL` -- accumulating digits after a decimal point (e.g., `"12.3"`) Use `--dot-fsm` to generate a visual diagram of the state machine. ## How the DAG Works The parser in `parser.py` builds an **expression tree** (AST) where: - **Leaf nodes** are numbers (no dependencies) - **Interior nodes** are operators with edges to their operands - **Edges** represent "depends on" relationships Evaluation in `evaluator.py` walks this tree **bottom-up** -- children before parents. This is exactly a **topological sort** of the DAG: you can only compute a node after all its dependencies are resolved. Use `--show-ast` to see the tree structure, or `--dot` to generate a graphviz diagram.