Methodology
How the AdventureWorks chat-based reporting engine works, what design choices it made, and which ones were forced by the subscription constraints it was built within.
1. The shape of the problem
Translating natural language to executable SQL is the "white whale" of analytics: it's been demoed for two decades and not generalised because real enterprise schemas are messy, dirty, and unstable. AdventureWorksDW is the easy case — a fixed, well-documented star schema that every frontier model has seen in training. That makes it the right scope for a portfolio piece: the architecture is the artefact, not the SQL accuracy.
2. Architecture
A TypeScript Azure Function on Flex Consumption orchestrates two stages per chat turn. Stage A generates T-SQL from a ~2K-token schema digest plus five canonical few-shot examples; stage B re-prompts the same model with the result rows for a narrative and a Plotly figure spec. The Function's system-assigned Managed Identity is the principal everywhere: it authenticates to Azure SQL (added as a contained db_datareader user), to Key Vault (which holds the Anthropic API key), and to Azure Table Storage (chat logs + per-IP rate limit counters). Nothing is a long-lived secret in code.
3. SQL safety
Even with a read-only database user, every generated query goes through an AST check before execution: the statement must parse as a single SELECT against the AdventureWorksDW allowlist of dimension and fact tables, must include a TOP/LIMIT cap of 500 rows, and is denied if it touches xp_*, sys.*, OPENROWSET, or chains multiple statements.node-sql-parserwith the T-SQL dialect powers the check. If execution still fails (e.g. a CTE alias mistake — see §6), the Function sends the error back to the model and asks for a one-shot repair before surfacing the failure.
4. Visualization
On the second prompt the model emits a Plotly figure spec — a JSON object with data traces and a layout — chosen for the column types of the result set. The React UI renders it with react-plotly.js, which gives hover, click-zoom, pan, and legend toggling for free. The table beside it is sortable and free-text filterable. The chart prompt constrains the model to a small palette and forbids dual y-axes with mismatched scales (see §6 for why), so the visual story stays legible even on adversarial result sets.
5. Cost engineering
Azure SQL is on the Serverless GP tier with a 60-minute auto-pause: idle storage costs ~$2.50/mo, compute is billed only while the warehouse is responding to queries. Functions are Flex Consumption — scale-to-zero with working HTTP streaming. The total monthly bill at portfolio traffic (~20 queries/day) lands around $5–8, covered for ~12 months by the $100 Azure for Students credit. A budget alert at $20/mo is wired through Action Groups as a kill-switch trigger.
6. Bug taxonomy from the build
Three real classes of LLM-emitted SQL bug surfaced during development and were fixed via prompt + validator changes:
- CTE name confused with allowlist.
node-sql-parser'stable extractor returns CTE references with the same shape as real tables. The validator now extracts CTE names from the AST and exempts them from the allowlist check. - CTE column reference broken downstream. The model would write
WITH yearly AS (SELECT d.CalendarYear AS Year ...)and then a subquery referring toCalendarYearon the CTE — which has no such column, onlyYear. Hardened the SQL prompt with an explicit CTE-alias-discipline section + a right/wrong example. - Outlier-distorted chart.Year-over-year queries on a partial first year (~$43K) made the 2011 YoY growth +16,000% and dominated the dual-axis chart. The chart prompt now forbids dual y-axes with >30:1 scale mismatches, and instructs the model to either drop the percentage trace or filter incomplete-period rows.
7. Two-model A/B (planned)
The project was scoped for a head-to-head between Azure OpenAI and Anthropic Claude on the same workload. Azure OpenAI quota also came back as 0 across every gpt-4o-mini SKU on the Students subscription, so the live build runs Claude Sonnet 4.6 only for now. Every chat turn still logs model id, latency, token counts, cost estimate, validation outcome, row count, and success flag to Azure Table Storage — once the Azure OpenAI quota lifts and the second model is wired in, the methodology page will publish:
- SQL execution success rate, by model
- Schema-grounded vs. hallucinated columns, by model
- Cost-per-correct-answer (tokens × $/M ÷ success rate)
- End-to-end p50/p95 latency, by model
- Failure taxonomy: wrong table, missing JOIN, missing date filter, missing GROUP BY, syntax error, timeout
8. Why Azure, not AWS
The rest of this portfolio runs on AWS — Bedrock, Amplify, CDK Python. This project is deliberately Microsoft-native end to end, because clients are multi-cloud and the credible argument is which workload belongs where, not which cloud is "best." Azure SQL is the right home for tabular relational analytics; Azure Functions with Managed Identity is a clean fit for a thin orchestration layer with zero-secret data-plane access. The frontend continues to live on AWS Amplify (same Next.js monorepo as the rest of the site) and calls the Azure Function URL cross-origin. Two clouds, one portfolio.
9. Stack
Azure SQL Database Serverless · Azure Functions (Flex Consumption, Node 22, TypeScript) · Anthropic Claude Sonnet 4.6 (via SDK from a Key-Vault-managed key) · Azure Key Vault · Azure Table Storage · Bicep · mssql · node-sql-parser · Plotly.js via react-plotly.js · Cloudflare Turnstile. Frontend: Next.js 16 on AWS Amplify, calling the Azure Function URL cross-origin.