mirror of https://github.com/sysown/proxysql
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
336 lines
9.0 KiB
336 lines
9.0 KiB
# Full Text Search (FTS) Implementation Status
|
|
|
|
## Overview
|
|
|
|
This document describes the current implementation of Full Text Search (FTS) capabilities in ProxySQL MCP. The FTS system enables AI agents to quickly search indexed database metadata and LLM-generated artifacts using SQLite's FTS5 extension.
|
|
|
|
**Status: IMPLEMENTED** ✅
|
|
|
|
## Requirements
|
|
|
|
1. **Indexing Strategy**: Optional WHERE clauses, no incremental updates (full rebuild on reindex)
|
|
2. **Search Scope**: Agent decides - single table or cross-table search
|
|
3. **Storage**: All rows (no limits)
|
|
4. **Catalog Integration**: Cross-reference between FTS and catalog - agent can use FTS to get top N IDs, then query real database
|
|
5. **Use Case**: FTS as another tool in the agent's toolkit
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
|
|
```
|
|
MCP Query Endpoint
|
|
↓
|
|
Query_Tool_Handler (routes tool calls)
|
|
↓
|
|
Discovery_Schema (manages FTS database)
|
|
↓
|
|
SQLite FTS5 (mcp_catalog.db)
|
|
```
|
|
|
|
### Database Design
|
|
|
|
**Integrated with Discovery Schema**: FTS functionality is built into the existing `mcp_catalog.db` database.
|
|
|
|
**FTS Tables**:
|
|
- `fts_objects` - FTS5 index over database objects (contentless)
|
|
- `fts_llm` - FTS5 index over LLM-generated artifacts (with content)
|
|
|
|
|
|
## Tools (Integrated with Discovery Tools)
|
|
|
|
### 1. catalog_search
|
|
|
|
Search indexed data using FTS5 across both database objects and LLM artifacts.
|
|
|
|
**Parameters**:
|
|
| Name | Type | Required | Description |
|
|
|------|------|----------|-------------|
|
|
| query | string | Yes | FTS5 search query |
|
|
| include_objects | boolean | No | Include detailed object information (default: false) |
|
|
| object_limit | integer | No | Max objects to return when include_objects=true (default: 50) |
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"query": "customer order",
|
|
"results": [
|
|
{
|
|
"kind": "table",
|
|
"key": "sales.orders",
|
|
"schema_name": "sales",
|
|
"object_name": "orders",
|
|
"content": "orders table with columns: order_id, customer_id, order_date, total_amount",
|
|
"rank": 0.5
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Implementation Logic**:
|
|
1. Search both `fts_objects` and `fts_llm` tables using FTS5
|
|
2. Combine results with ranking
|
|
3. Optionally fetch detailed object information
|
|
4. Return ranked results
|
|
|
|
### 2. llm.search
|
|
|
|
Search LLM-generated content and insights using FTS5.
|
|
|
|
**Parameters**:
|
|
| Name | Type | Required | Description |
|
|
|------|------|----------|-------------|
|
|
| query | string | Yes | FTS5 search query |
|
|
| type | string | No | Content type to search ("summary", "relationship", "domain", "metric", "note") |
|
|
| schema | string | No | Filter by schema |
|
|
| limit | integer | No | Maximum results (default: 10) |
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"query": "customer segmentation",
|
|
"results": [
|
|
{
|
|
"kind": "domain",
|
|
"key": "customer_segmentation",
|
|
"content": "Customer segmentation based on purchase behavior and demographics",
|
|
"rank": 0.8
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Implementation Logic**:
|
|
1. Search `fts_llm` table using FTS5
|
|
2. Apply filters if specified
|
|
3. Return ranked results with content
|
|
|
|
### 3. catalog_search (Detailed)
|
|
|
|
Search indexed data using FTS5 across both database objects and LLM artifacts with detailed object information.
|
|
|
|
**Parameters**:
|
|
| Name | Type | Required | Description |
|
|
|------|------|----------|-------------|
|
|
| query | string | Yes | FTS5 search query |
|
|
| include_objects | boolean | No | Include detailed object information (default: false) |
|
|
| object_limit | integer | No | Max objects to return when include_objects=true (default: 50) |
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"query": "customer order",
|
|
"results": [
|
|
{
|
|
"kind": "table",
|
|
"key": "sales.orders",
|
|
"schema_name": "sales",
|
|
"object_name": "orders",
|
|
"content": "orders table with columns: order_id, customer_id, order_date, total_amount",
|
|
"rank": 0.5,
|
|
"details": {
|
|
"object_id": 123,
|
|
"object_type": "table",
|
|
"schema_name": "sales",
|
|
"object_name": "orders",
|
|
"row_count_estimate": 15000,
|
|
"has_primary_key": true,
|
|
"has_foreign_keys": true,
|
|
"has_time_column": true,
|
|
"columns": [
|
|
{
|
|
"column_name": "order_id",
|
|
"data_type": "int",
|
|
"is_nullable": false,
|
|
"is_primary_key": true
|
|
}
|
|
]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Implementation Logic**:
|
|
1. Search both `fts_objects` and `fts_llm` tables using FTS5
|
|
2. Combine results with ranking
|
|
3. Optionally fetch detailed object information from `objects`, `columns`, `indexes`, `foreign_keys` tables
|
|
4. Return ranked results with detailed information when requested
|
|
|
|
## Database Schema
|
|
|
|
### fts_objects (contentless FTS5 table)
|
|
```sql
|
|
CREATE VIRTUAL TABLE fts_objects USING fts5(
|
|
schema_name,
|
|
object_name,
|
|
object_type,
|
|
content,
|
|
content='',
|
|
content_rowid='object_id'
|
|
);
|
|
```
|
|
|
|
### fts_llm (FTS5 table with content)
|
|
```sql
|
|
CREATE VIRTUAL TABLE fts_llm USING fts5(
|
|
kind,
|
|
key,
|
|
content
|
|
);
|
|
```
|
|
|
|
## Implementation Status
|
|
|
|
### Phase 1: Foundation ✅ COMPLETED
|
|
|
|
**Step 1: Integrate FTS into Discovery_Schema**
|
|
- FTS functionality built into `lib/Discovery_Schema.cpp`
|
|
- Uses existing `mcp_catalog.db` database
|
|
- No separate configuration variable needed
|
|
|
|
**Step 2: Create FTS tables**
|
|
- `fts_objects` for database objects (contentless)
|
|
- `fts_llm` for LLM artifacts (with content)
|
|
|
|
### Phase 2: Core Indexing ✅ COMPLETED
|
|
|
|
**Step 3: Implement automatic indexing**
|
|
- Objects automatically indexed during static harvest
|
|
- LLM artifacts automatically indexed during upsert operations
|
|
|
|
### Phase 3: Search Functionality ✅ COMPLETED
|
|
|
|
**Step 4: Implement search tools**
|
|
- `catalog_search` tool in Query_Tool_Handler
|
|
- `llm.search` tool in Query_Tool_Handler
|
|
|
|
### Phase 4: Tool Registration ✅ COMPLETED
|
|
|
|
**Step 5: Register tools**
|
|
- Tools registered in Query_Tool_Handler::get_tool_list()
|
|
- Tools routed in Query_Tool_Handler::execute_tool()
|
|
|
|
## Critical Files
|
|
|
|
### Files Modified
|
|
- `include/Discovery_Schema.h` - Added FTS methods
|
|
- `lib/Discovery_Schema.cpp` - Implemented FTS functionality
|
|
- `lib/Query_Tool_Handler.cpp` - Added FTS tool routing
|
|
- `include/Query_Tool_Handler.h` - Added FTS tool declarations
|
|
|
|
## Current Implementation Details
|
|
|
|
### FTS Integration Pattern
|
|
|
|
```cpp
|
|
class Discovery_Schema {
|
|
private:
|
|
// FTS methods
|
|
int create_fts_tables();
|
|
int rebuild_fts_index(int run_id);
|
|
json search_fts(const std::string& query, bool include_objects = false, int object_limit = 50);
|
|
json search_llm_fts(const std::string& query, const std::string& type = "",
|
|
const std::string& schema = "", int limit = 10);
|
|
|
|
public:
|
|
// FTS is automatically maintained during:
|
|
// - Object insertion (static harvest)
|
|
// - LLM artifact upsertion
|
|
// - Catalog rebuild operations
|
|
};
|
|
```
|
|
|
|
### Error Handling Pattern
|
|
|
|
```cpp
|
|
json result;
|
|
result["success"] = false;
|
|
result["error"] = "Descriptive error message";
|
|
return result;
|
|
|
|
// Logging
|
|
proxy_error("FTS error: %s\n", error_msg);
|
|
proxy_info("FTS search completed: %zu results\n", result_count);
|
|
```
|
|
|
|
### SQLite Operations Pattern
|
|
|
|
```cpp
|
|
db->wrlock();
|
|
// Write operations (indexing)
|
|
db->wrunlock();
|
|
|
|
db->rdlock();
|
|
// Read operations (search)
|
|
db->rdunlock();
|
|
|
|
// Prepared statements
|
|
sqlite3_stmt* stmt = NULL;
|
|
db->prepare_v2(sql, &stmt);
|
|
(*proxy_sqlite3_bind_text)(stmt, 1, value.c_str(), -1, SQLITE_TRANSIENT);
|
|
SAFE_SQLITE3_STEP2(stmt);
|
|
(*proxy_sqlite3_finalize)(stmt);
|
|
```
|
|
|
|
## Agent Workflow Example
|
|
|
|
```python
|
|
# Agent searches for relevant objects
|
|
search_results = call_tool("catalog_search", {
|
|
"query": "customer orders with high value",
|
|
"include_objects": True,
|
|
"object_limit": 20
|
|
})
|
|
|
|
# Agent searches for LLM insights
|
|
llm_results = call_tool("llm.search", {
|
|
"query": "customer segmentation",
|
|
"type": "domain"
|
|
})
|
|
|
|
# Agent uses results to build understanding
|
|
for result in search_results["results"]:
|
|
if result["kind"] == "table":
|
|
# Get detailed table information
|
|
table_details = call_tool("catalog_get_object", {
|
|
"schema": result["schema_name"],
|
|
"object": result["object_name"]
|
|
})
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
1. **Contentless FTS**: `fts_objects` uses contentless indexing for performance
|
|
2. **Automatic Maintenance**: FTS indexes automatically maintained during operations
|
|
3. **Ranking**: Results ranked using FTS5 bm25 algorithm
|
|
4. **Pagination**: Large result sets automatically paginated
|
|
|
|
## Testing Status ✅ COMPLETED
|
|
|
|
- [x] Search database objects using FTS
|
|
- [x] Search LLM artifacts using FTS
|
|
- [x] Combined search with ranking
|
|
- [x] Detailed object information retrieval
|
|
- [x] Filter by content type
|
|
- [x] Filter by schema
|
|
- [x] Performance with large catalogs
|
|
- [x] Error handling
|
|
|
|
## Notes
|
|
|
|
- FTS5 requires SQLite with FTS5 extension enabled
|
|
- Contentless FTS for objects provides fast search without duplicating data
|
|
- LLM artifacts stored directly in FTS table for full content search
|
|
- Automatic FTS maintenance ensures indexes are always current
|
|
- Ranking uses FTS5's built-in bm25 algorithm for relevance scoring
|
|
|
|
## Version
|
|
|
|
- **Last Updated:** 2026-01-19
|
|
- **Implementation Date:** January 2026
|
|
- **Status:** Fully implemented and tested
|