Add expression-evaluator: DAGs & state machines tutorial project

Educational calculator teaching FSMs (explicit transition table tokenizer)
and DAGs (recursive descent parser with AST evaluation). Includes CLI with
REPL, graphviz visualization, and 61 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
dl92
2026-02-08 18:09:42 +00:00
parent 3a8705ece8
commit 01d5532823
11 changed files with 1557 additions and 0 deletions

View File

@@ -0,0 +1,87 @@
# Expression Evaluator -- DAGs & State Machines Tutorial
A calculator that teaches two fundamental CS patterns by building them from scratch:
1. **Finite State Machine** -- the tokenizer processes input character-by-character using an explicit transition table
2. **Directed Acyclic Graph (DAG)** -- the parser builds an expression tree, evaluated bottom-up in topological order
## What You'll Learn
| File | CS Concept | What it does |
|------|-----------|-------------|
| `tokenizer.py` | **State Machine** (Mealy machine) | Converts `"3 + 4 * 2"` into tokens using a transition table |
| `parser.py` | **DAG construction** | Builds an expression tree with operator precedence |
| `evaluator.py` | **Topological evaluation** | Walks the tree bottom-up (leaves before parents) |
| `visualize.py` | **Visualization** | Generates graphviz diagrams of both the FSM and AST |
## Quick Start
```bash
# Evaluate an expression
python main.py "3 + 4 * 2"
# => 11
# Interactive REPL
python main.py
# See how the state machine tokenizes
python main.py --show-tokens "(2 + 3) * -4"
# See the expression tree (DAG)
python main.py --show-ast "(2 + 3) * 4"
# *
# +-- +
# | +-- 2
# | `-- 3
# `-- 4
# Watch evaluation in topological order
python main.py --trace "(2 + 3) * 4"
# Step 1: 2 => 2
# Step 2: 3 => 3
# Step 3: 2 + 3 => 5
# Step 4: 4 => 4
# Step 5: 5 * 4 => 20
# Generate graphviz diagrams
python main.py --dot "(2 + 3) * 4" | dot -Tpng -o ast.png
python main.py --dot-fsm | dot -Tpng -o fsm.png
```
## Features
- Arithmetic: `+`, `-`, `*`, `/`, `^` (power)
- Parentheses: `(2 + 3) * 4`
- Unary minus: `-3`, `-(2 + 1)`, `2 * -3`
- Decimals: `3.14`, `.5`
- Standard precedence: parens > `^` > `*`/`/` > `+`/`-`
- Right-associative power: `2^3^4` = `2^(3^4)`
- Correct unary minus: `-3^2` = `-(3^2)` = `-9`
## Running Tests
```bash
python -m pytest tests/ -v
```
## How the State Machine Works
The tokenizer in `tokenizer.py` uses an **explicit transition table** -- a dictionary mapping `(current_state, character_class)` to `(next_state, action)`. This is the same pattern used in network protocol parsers, regex engines, and compiler lexers.
The three states are:
- `START` -- between tokens, dispatching based on the next character
- `INTEGER` -- accumulating digits (e.g., `"12"` so far)
- `DECIMAL` -- accumulating digits after a decimal point (e.g., `"12.3"`)
Use `--dot-fsm` to generate a visual diagram of the state machine.
## How the DAG Works
The parser in `parser.py` builds an **expression tree** (AST) where:
- **Leaf nodes** are numbers (no dependencies)
- **Interior nodes** are operators with edges to their operands
- **Edges** represent "depends on" relationships
Evaluation in `evaluator.py` walks this tree **bottom-up** -- children before parents. This is exactly a **topological sort** of the DAG: you can only compute a node after all its dependencies are resolved.
Use `--show-ast` to see the tree structure, or `--dot` to generate a graphviz diagram.