9.0 KiB
Full Text Search (FTS) Implementation Status
Overview
This document describes the current implementation of Full Text Search (FTS) capabilities in ProxySQL MCP. The FTS system enables AI agents to quickly search indexed database metadata and LLM-generated artifacts using SQLite's FTS5 extension.
Status: IMPLEMENTED ✅
Requirements
- Indexing Strategy: Optional WHERE clauses, no incremental updates (full rebuild on reindex)
- Search Scope: Agent decides - single table or cross-table search
- Storage: All rows (no limits)
- Catalog Integration: Cross-reference between FTS and catalog - agent can use FTS to get top N IDs, then query real database
- Use Case: FTS as another tool in the agent's toolkit
Architecture
Components
MCP Query Endpoint
↓
Query_Tool_Handler (routes tool calls)
↓
Discovery_Schema (manages FTS database)
↓
SQLite FTS5 (mcp_catalog.db)
Database Design
Integrated with Discovery Schema: FTS functionality is built into the existing mcp_catalog.db database.
FTS Tables:
fts_objects- FTS5 index over database objects (contentless)fts_llm- FTS5 index over LLM-generated artifacts (with content)
Tools (Integrated with Discovery Tools)
1. catalog_search
Search indexed data using FTS5 across both database objects and LLM artifacts.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
| query | string | Yes | FTS5 search query |
| include_objects | boolean | No | Include detailed object information (default: false) |
| object_limit | integer | No | Max objects to return when include_objects=true (default: 50) |
Response:
{
"success": true,
"query": "customer order",
"results": [
{
"kind": "table",
"key": "sales.orders",
"schema_name": "sales",
"object_name": "orders",
"content": "orders table with columns: order_id, customer_id, order_date, total_amount",
"rank": 0.5
}
]
}
Implementation Logic:
- Search both
fts_objectsandfts_llmtables using FTS5 - Combine results with ranking
- Optionally fetch detailed object information
- Return ranked results
2. llm.search
Search LLM-generated content and insights using FTS5.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
| query | string | Yes | FTS5 search query |
| type | string | No | Content type to search ("summary", "relationship", "domain", "metric", "note") |
| schema | string | No | Filter by schema |
| limit | integer | No | Maximum results (default: 10) |
Response:
{
"success": true,
"query": "customer segmentation",
"results": [
{
"kind": "domain",
"key": "customer_segmentation",
"content": "Customer segmentation based on purchase behavior and demographics",
"rank": 0.8
}
]
}
Implementation Logic:
- Search
fts_llmtable using FTS5 - Apply filters if specified
- Return ranked results with content
3. catalog_search (Detailed)
Search indexed data using FTS5 across both database objects and LLM artifacts with detailed object information.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
| query | string | Yes | FTS5 search query |
| include_objects | boolean | No | Include detailed object information (default: false) |
| object_limit | integer | No | Max objects to return when include_objects=true (default: 50) |
Response:
{
"success": true,
"query": "customer order",
"results": [
{
"kind": "table",
"key": "sales.orders",
"schema_name": "sales",
"object_name": "orders",
"content": "orders table with columns: order_id, customer_id, order_date, total_amount",
"rank": 0.5,
"details": {
"object_id": 123,
"object_type": "table",
"schema_name": "sales",
"object_name": "orders",
"row_count_estimate": 15000,
"has_primary_key": true,
"has_foreign_keys": true,
"has_time_column": true,
"columns": [
{
"column_name": "order_id",
"data_type": "int",
"is_nullable": false,
"is_primary_key": true
}
]
}
}
]
}
Implementation Logic:
- Search both
fts_objectsandfts_llmtables using FTS5 - Combine results with ranking
- Optionally fetch detailed object information from
objects,columns,indexes,foreign_keystables - Return ranked results with detailed information when requested
Database Schema
fts_objects (contentless FTS5 table)
CREATE VIRTUAL TABLE fts_objects USING fts5(
schema_name,
object_name,
object_type,
content,
content='',
content_rowid='object_id'
);
fts_llm (FTS5 table with content)
CREATE VIRTUAL TABLE fts_llm USING fts5(
kind,
key,
content
);
Implementation Status
Phase 1: Foundation ✅ COMPLETED
Step 1: Integrate FTS into Discovery_Schema
- FTS functionality built into
lib/Discovery_Schema.cpp - Uses existing
mcp_catalog.dbdatabase - No separate configuration variable needed
Step 2: Create FTS tables
fts_objectsfor database objects (contentless)fts_llmfor LLM artifacts (with content)
Phase 2: Core Indexing ✅ COMPLETED
Step 3: Implement automatic indexing
- Objects automatically indexed during static harvest
- LLM artifacts automatically indexed during upsert operations
Phase 3: Search Functionality ✅ COMPLETED
Step 4: Implement search tools
catalog_searchtool in Query_Tool_Handlerllm.searchtool in Query_Tool_Handler
Phase 4: Tool Registration ✅ COMPLETED
Step 5: Register tools
- Tools registered in Query_Tool_Handler::get_tool_list()
- Tools routed in Query_Tool_Handler::execute_tool()
Critical Files
Files Modified
include/Discovery_Schema.h- Added FTS methodslib/Discovery_Schema.cpp- Implemented FTS functionalitylib/Query_Tool_Handler.cpp- Added FTS tool routinginclude/Query_Tool_Handler.h- Added FTS tool declarations
Current Implementation Details
FTS Integration Pattern
class Discovery_Schema {
private:
// FTS methods
int create_fts_tables();
int rebuild_fts_index(int run_id);
json search_fts(const std::string& query, bool include_objects = false, int object_limit = 50);
json search_llm_fts(const std::string& query, const std::string& type = "",
const std::string& schema = "", int limit = 10);
public:
// FTS is automatically maintained during:
// - Object insertion (static harvest)
// - LLM artifact upsertion
// - Catalog rebuild operations
};
Error Handling Pattern
json result;
result["success"] = false;
result["error"] = "Descriptive error message";
return result;
// Logging
proxy_error("FTS error: %s\n", error_msg);
proxy_info("FTS search completed: %zu results\n", result_count);
SQLite Operations Pattern
db->wrlock();
// Write operations (indexing)
db->wrunlock();
db->rdlock();
// Read operations (search)
db->rdunlock();
// Prepared statements
sqlite3_stmt* stmt = NULL;
db->prepare_v2(sql, &stmt);
(*proxy_sqlite3_bind_text)(stmt, 1, value.c_str(), -1, SQLITE_TRANSIENT);
SAFE_SQLITE3_STEP2(stmt);
(*proxy_sqlite3_finalize)(stmt);
Agent Workflow Example
# Agent searches for relevant objects
search_results = call_tool("catalog_search", {
"query": "customer orders with high value",
"include_objects": True,
"object_limit": 20
})
# Agent searches for LLM insights
llm_results = call_tool("llm.search", {
"query": "customer segmentation",
"type": "domain"
})
# Agent uses results to build understanding
for result in search_results["results"]:
if result["kind"] == "table":
# Get detailed table information
table_details = call_tool("catalog_get_object", {
"schema": result["schema_name"],
"object": result["object_name"]
})
Performance Considerations
- Contentless FTS:
fts_objectsuses contentless indexing for performance - Automatic Maintenance: FTS indexes automatically maintained during operations
- Ranking: Results ranked using FTS5 bm25 algorithm
- Pagination: Large result sets automatically paginated
Testing Status ✅ COMPLETED
- Search database objects using FTS
- Search LLM artifacts using FTS
- Combined search with ranking
- Detailed object information retrieval
- Filter by content type
- Filter by schema
- Performance with large catalogs
- Error handling
Notes
- FTS5 requires SQLite with FTS5 extension enabled
- Contentless FTS for objects provides fast search without duplicating data
- LLM artifacts stored directly in FTS table for full content search
- Automatic FTS maintenance ensures indexes are always current
- Ranking uses FTS5's built-in bm25 algorithm for relevance scoring
Version
- Last Updated: 2026-01-19
- Implementation Date: January 2026
- Status: Fully implemented and tested