Introducing Sift

SQL meets the command line

There's a certain elegance to Unix pipes. Data flows from one program to the next, transformed at each step, until it emerges in the shape you need. But anyone who has wrestled with grep, sed, and awk in combination knows the friction: each tool speaks its own dialect, and complex queries become inscrutable incantations. I kept asking myself: what if there were a single language—one most developers already know—that could handle searching, filtering, and transforming text in one coherent syntax?

That question led me to build Sift, an open-source command-line tool that lets you query text files using SQL. I compiled SQLite and PCRE2 regex directly into a single binary, exposing your text as database tables that you can search, join, and transform with familiar SQL statements. The result is a tool that feels natural to anyone who has written a SELECT query, yet remains fully compatible with Unix pipelines.

The Core Idea

When you pipe text into Sift, each line becomes a row in a virtual table called lines. A column called content holds the text, and line_number tells you where it came from. From there, you can do anything SQL allows: filter with WHERE, aggregate with GROUP BY, or transform with built-in functions. When you point Sift at multiple files, it builds a full-text search index using SQLite's FTS5 engine, enabling fast semantic queries across entire codebases.

Consider a common task: finding all lines in a C file that mention memory allocation. With traditional tools, you might reach for grep. But what if you want to find places where malloc and free appear near each other—potential memory management patterns? Sift makes this trivial:

find . -name "*.c" | sift --dig --for \
  "SELECT filepath, content FROM search_fts WHERE content MATCH 'malloc NEAR free'"

The MATCH clause uses FTS5 syntax, which understands proximity operators like NEAR out of the box. The query returns file paths and matching content in a single pass, no chaining required.

Surgical Editing

I didn't want Sift to be limited to search. It can modify files in place, using SQL to specify exactly which lines to change and how. The --pick mode targets specific lines, while --refine transforms entire files. For simple insertions—adding a header guard, injecting a license block—the --drop-before and --drop-after commands insert content at precise line numbers without requiring pattern matching at all.

Safety is built in. By default, Sift creates .bak backups before modifying anything. The --shake flag previews changes without writing them, and --diff shows a unified diff of what would change. These guardrails make Sift suitable for automated refactoring—even in scripts that run unattended.

Regex Without the Pain

I bundled PCRE2, the same regex engine behind tools like ripgrep. Three SQL functions expose its power: regex_match tests for a pattern, regex_replace performs substitutions, and regex_extract pulls out capture groups. All three support flags for case-insensitivity, multiline mode, and more.

Extracting data becomes a one-liner. To pull email addresses from a file:

sift --for "SELECT regex_extract('[\w\.-]+@[\w\.-]+', content, 0) FROM lines" < users.txt

No need to remember sed substitution syntax or awk field semantics. If you know regex and SQL, you know Sift.

Built for AI Agents

One of Sift's more unusual features is native support for the Model Context Protocol (MCP), a standard for connecting AI assistants to external tools. Running sift --mcp starts an MCP server that exposes Sift's capabilities as structured function calls. AI agents like Claude Code can then search codebases, read specific line ranges, and perform edits—all through a well-defined API rather than fragile shell commands.

This matters because large language models work better with clean, predictable tool interfaces. Rather than generating grep and sed commands that might fail on edge cases, an AI can call Sift's sift_search or sift_edit functions with typed arguments. The result is more reliable automation and fewer token-wasting error loops.

Getting Started

Sift is distributed as source and compiles on any Unix-like system. Dependencies—SQLite and PCRE2—are bundled, so installation is straightforward:

git clone https://github.com/edwardedmonds/sift.git
cd sift
make
sudo make install

From there, the --help flag and the project's README cover the full command set. For AI integration, claude mcp add sift -- sift --mcp registers the tool with Claude Code.

I've found Sift most valuable when tasks fall between "simple grep" and "write a Python script." It occupies that middle ground where you need more expressiveness than traditional Unix tools provide, but don't want the overhead of a full programming language. The learning curve is gentle if you already know SQL, and the payoff is immediate: fewer tools to juggle, fewer syntax errors to debug, and a single coherent language for text processing.

The source is available under GPL-2.0 on GitHub. Give it a try the next time you find yourself chaining three utilities together and wishing there were a simpler way.

Home Reference Comparison GitHub