Where Sift Fits

Beyond SQL for text files

The idea of querying text files with SQL is not new. A small ecosystem of tools has grown up around this premise, each carving out its own niche. When I first released Sift, it occupied familiar territory—another entry in the “SQL for the command line” category. But as I kept building, the tool evolved into something different. What started as a grep replacement became infrastructure for AI agents. The comparison landscape shifted.

The Original Competitors

The Structured Data Camp

q parses delimited files—CSVs, TSVs, anything with a consistent separator—and lets you query them with SQL. It’s mature, widely adopted, and excellent at what it does. TextQL occupies similar territory, wrapping SQLite around structured files. dsq extends the pattern to JSON, Parquet, and Excel.

At the heavyweight end sits DuckDB, an embeddable analytical database that runs as a CLI binary. It handles columnar data at scale, with sophisticated query optimization and broad format support.

These tools share an assumption: your data has columns. That’s reasonable for data analysis, but it falls apart when you point them at source code, configuration files, or anything without consistent structure.

The Log Watchers

lnav (the Log File Navigator) is the most sophisticated log analysis tool I’ve used. It parses common log formats automatically, builds virtual tables, and provides SQL queries plus a curses interface for exploration. If you’re investigating incidents in log files, lnav is purpose-built for that workflow.

The Speed Demons

ripgrep is the fastest grep replacement. It respects .gitignore, searches recursively by default, and uses parallelism aggressively. For “find this string in these files,” ripgrep is the right answer.

The Holy Trinity

Before any of these tools existed, Unix provided grep for searching, sed for editing, and awk for transforming. They’re cryptic, powerful, and ubiquitous. When I need a quick one-liner on a remote server with nothing installed, the holy trinity is always there.

The Original Gap

When I first built Sift, I was filling a specific gap: tools that could query unstructured text with SQL and also edit based on what they found. The structured data tools couldn’t handle arbitrary text. The log watchers were read-only. The Unix tools lacked the expressiveness of SQL for complex queries.

Sift’s CLI still fills that gap. Pipe text through sift --for "SELECT..." and query it with SQL. Use --dig to search across files with FTS5 and proximity operators. Edit surgically with --pick. The backup system, dry-run previews, and unified diff output made it safe for automated refactoring.

The New Dimension

But the tool kept evolving. AI agents needed more than clever text processing—they needed infrastructure that addressed their fundamental limitations. That led Sift in a direction none of the original competitors were designed for.

Persistent Memory

Large language models forget everything between sessions. Sift’s memory system stores preferences, patterns, plans, tasks, gotchas, and decisions in a queryable database that survives restarts. When you tell Claude “I prefer early returns,” that preference is recorded and automatically loaded in future sessions. None of the text-processing tools have anything like this—it’s simply not a problem they’re solving.

Grounded Search

LLMs hallucinate. They confabulate file paths, invent line numbers, fabricate API endpoints. Sift’s search returns exact results from indexed sources of truth—real files, real line numbers, real content. When the edit tool receives a target that doesn’t exist, it fails hard rather than proceeding on false assumptions. This isn’t about speed (though FTS5 is 30-195x faster than grep); it’s about forcing confrontation with reality.

Conversation Context

The context subsystem preserves conversation history across sessions. Claude can search 25,000 messages to find exactly when you discussed that authentication bug. Messages link to memories, creating a web of searchable history that traditional tools can’t provide.

Hardware Awareness

Sift monitors system resources and adapts under pressure. When memory is constrained, it automatically reduces result limits, enables streaming, and suggests batching. An AI agent can request a resource budget before expensive operations and receive guidance on what’s actually available. Text-processing tools assume unlimited resources; Sift doesn’t.

Adversarial Retrieval

The memory system includes a devil’s advocate. sift_memory_challenge generates adversarial queries to surface counterevidence against any claim. It searches both memory and conversation history, then classifies results as support, contradiction, or evolution. This is epistemic hygiene built into the tool itself—something no grep replacement needs.

When to Use What

Use q or TextQL when you have clean CSVs and want quick aggregations without leaving the terminal.

Use DuckDB when you’re doing serious data analysis on structured files at scale.

Use lnav when you’re investigating incidents in log files and want automatic format parsing.

Use ripgrep when you just need to find something fast and don’t need SQL expressiveness.

Use the Unix trinity on remote servers where nothing else is installed.

Use Sift’s CLI when you need to query or transform unstructured text with SQL, especially if you want to edit based on what you find.

Use Sift’s MCP server when you’re working with AI agents that need persistent memory, grounded search, conversation tracking, and hardware-aware operation. This is the use case none of the other tools address.

The Category Shift

Sift started in the “SQL for text files” category and grew into “infrastructure for AI agents.” The CLI remains useful for direct text processing, but the 78 MCP tools represent something different—a system designed around the specific failure modes of large language models.

When I compare Sift to q or DuckDB, I’m comparing a screwdriver to a wrench. Both are tools; they solve different problems. But when I compare Sift’s memory system to anything else, I’m not sure what to compare it to. Persistent, queryable, graph-structured memory with adversarial retrieval, fingerprinting, and hardware-aware degradation isn’t a feature of text-processing tools. It’s a response to working with AI systems that hallucinate, forget, and consume unbounded resources.

The original comparison—Sift versus other SQL-on-text tools—still holds for the CLI. But the MCP server puts Sift in a category of its own.

The source is on GitHub, currently proprietary while features mature, with plans to open source once the API stabilizes.

Home Home Intro Reference GitHub