Try Text-to-SQL on Real Data - Multi-Million Rows & GB+ Sizes

Amar Harolikar
Dec 5, 2025
4 min read

Two Clicks. Zero Setup. No Database, No Server, No Login needed. With 9 LLM options, Python & SQL to torture your data till it confesses.

App live here: app.tigzig.com/analyzer

See PDF for step-by-step guide

What's New

I've enhanced the sample datasets in my Database AI app (DATS-4). Previously, the test files were tiny, 50-60 rows. Now there's a full range: 64 rows to 11.8 million rows. File sizes from 14 KB to 1.6 GB.

For the 1.6 GB file, setup takes around 9 minutes. Fully automated: database creation, file upload, agent ready.

The Datasets:

▸ RBI Cards & ATM Statistics: 14 KB, 64 rows. July 2025 data covering 60-70 banks.

▸ Tour de France - Riders History: 974 KB, 10K rows. Race rankings from 1903 to 2025. Over 120 years of cycling history.

▸ IPL - Indian Premier League: 41 MB, 278K rows. Ball-by-ball data from 2003 to Sep 2025.

▸ ODI - One Day International: 206 MB, 1.6 million rows. Ball-by-ball records, 2003 to Sep 2025.

▸ Cricket Combined (ODI, T20, County, IPL): 697 MB, 5.2 million rows. 2003 to Sep 2025.

▸ Cricket Extended (all formats including Test, T20 Blast): 1.6 GB, 11.8 million rows. 2003 to Sep 2025.

Two Clicks to Analytics-Ready

Setup is two clicks:

Go to Datasets. Pick one.
Select 'Use Temporary Database'

That's it. The app creates a temporary database, uploads your data, extracts the schema, and connects it to the AI agent. You're ready to query. For small files, setup takes 20-30 seconds. For the largest file, 2-3 minutes. Backend is neon.com - which provisions a Postgres database in less than a second via an API call.

How to Explore

Once setup completes, you're in the chat interface. Use the pre-built prompts. Each dataset has a sample prompt. Hit the copy icon, paste, run. These are structured queries: ranking systems, derived metrics, comparisons. The more specific the better. Avoid generic 'analyze this'. AI can't read your mind yet.

Or explore data with:

"Show 5 sample rows in table format"
"Have advanced analyst run EDA: univariates and categorical freqs, share results as table and charts"

Check Agent Reasoning

Click to see the SQL the agent generated. Useful for validation and learning.

LLM Options

9 models available for advanced analysis. Choose based on quality needs and cost tolerance.

Model	Type	Quality	Cost
Gemini 2.0 Flash	Best Value	75	Lowest
Qwen3 Max	Good	80	Low
Gemini 2.5 Flash	Good	85	Low
KIMI K2 Thinking	High Variances	85	High
Deepseek-R1-0158	Great Quality	90	Med
GPT-4.1	Great Quality	90	Med
Gemini 3 Pro	Good	95	High
GPT-5.1	Top Quality	100	High
Claude 4.5 Sonnet	Topmost Quality	115	High

For detailed cost and quality comparisons based on live testing, see

Gemini 3 Pro Added to Database AI Suite. Tested Against Claude Sonnet 4.5 and GPT-5.1 Summary: Claude still leads. GPT-5.1 is solid. Gemini 3 Pro lands third.

What Else Can the App Do

The sample dataset feature is just one entry point. DATS-4 is a full database AI suite. Here's what's available:

Database Connections

Connect to any remote Postgres or MySQL database with your own credentials
Or use the on-the-fly temporary database for quick tests
Paste credentials in any format (URI, table, plain text). AI parses it.

Two Agents

General Analyst: fast execution for direct queries, data pulls, standard charts. Powered by GPT-4.1-mini.
Advanced Analyst: multi-step reasoning for complex analysis. Choice of 9 LLMs for the reasoning step. Execution by GPT-4.1.

File Uploads

Upload CSV or tab-delimited files directly
Upload to temporary database or your own database
AI-powered schema detection. You don't define columns. It figures it out.

Working Tables & Export

Create derived tables, run transformations, merge datasets
Export any table to CSV or pipe-delimited file
Download for offline analysis in Excel or other tools

Table Viewer

Interactive data grid for all uploaded files
Filter, sort, drill down to record level
On-the-fly descriptive statistics and data quality metrics

PDF Output

Agent can convert analysis output to PDF (text only, charts not yet supported)
Structure and content customizable via natural language instructions

Python Charts & Stats

Integrated Python sandbox (e2b Code Interpreter)
Generate charts: bar, line, scatter, heatmap, violin, radar, box plots
Run statistical analysis: Chi-square, ANOVA, correlation matrices, distributions

Logs

Detailed logging of API calls and agent actions
First line of debugging for when things go wrong

Technical note on file uploads

Download App downloads compressed file from GitHub repo.
Compression (frontend) Uncompressed CSV or TXT uploads are compressed using the browser CompressionStream API without loading full file into memory.
Temporary database provisioning A temporary Postgres database is created via Neon with automatic role setup and unique credentials.
File upload to backend Compressed file is sent to the FastAPI SQL connector.
Memory efficient file handling Backend streams file to disk in 32MB chunks to prevent RAM bloat.
Decompression Backend decompresses .gz files when needed, streaming to disk in 32MB chunks.
AI powered schema detection Backend samples first 5 lines, detects delimiter, and sends data to OpenAI for schema inference.
Table creation Empty table is created using the detected schema.
Smart upload path selection Postgres uses in memory COPY for uncompressed files under 100MB and streamed COPY from temp file for larger or compressed files. MySQL always streams in 100K row batches using Polars or Pandas with executemany inserts.
Agent handoff After upload, schema plus credentials and sample rows are handed to the Database Agent.
Confirmation App confirms environment readiness and the Agent confirms schema receipt.

Open Source

All open source. Docs and source code accessible from the app (hit Docs in top nav). Guides and posts at tigzig.com. The app has 7 major components, each with its own GitHub repo:

Main App (React UI)
FastAPI Server: Database Connector
FastAPI Server: Neon DB Creation
Flowise Agent Schemas
Proxy Server
MCP Server: Markdown to PDF
Quant Agent Backend

Full build guide and architecture docs available in the Docs section.

Links

▸ App: app.tigzig.com/analyzer ▸ LLM Cost & Quality Assessment: Gemini 3 Pro Test Results ▸ Field Guide (PDF): DATS-4 Database AI Suite ▸ Guides & Posts: tigzig.com

Applied Gen AI for Analytics, Data Science & Business

TIGZIG

Try Text-to-SQL on Real Data - Multi-Million Rows & GB+ Sizes

Recent Posts