Add comprehensive documentation for sqlite-vec integration

This commit adds extensive documentation for the sqlite-vec vector search
extension integration in ProxySQL, including:

## README Documentation

### deps/sqlite3/README.md
- Overview of sqlite-vec and its vector search capabilities
- Integration method using static linking
- Directory structure explanation
- Compilation flags and build process details
- Usage examples for all ProxySQL databases
- Benefits and verification instructions

### deps/sqlite3/sqlite-vec-source/README.md
- Complete sqlite-vec documentation
- Source files explanation
- Integration specifics for ProxySQL
- Licensing information
- Standalone building instructions
- Performance considerations

## Doxygen Code Documentation

### lib/Admin_Bootstrap.cpp
- Added comprehensive doxygen comments for sqlite-vec integration
- Documented sqlite3_vec_init function declaration
- Added section documentation for SQLite database initialization
- Detailed documentation for each database instance:
  * Admin: Configuration analytics and vector operations
  * Stats: Performance metrics and similarity analysis
  * Config: Configuration optimization with vectors
  * Monitor: Anomaly detection and pattern recognition
  * Stats Disk: Historical trend analysis
- Included usage examples and cross-references
- Explained auto-extension mechanism and integration benefits

The documentation provides developers with a complete reference
for understanding, using, and maintaining the sqlite-vec integration
in ProxySQL's SQLite databases.

Technical Details:
- Static linking implementation
- Virtual table mechanism
- JSON vector format support
- Auto-extension registration
- Multi-database integration
- Performance optimizations
pull/5310/head
Rene Cannao 5 months ago
parent fbd0d9732b
commit d55947b49f

@ -0,0 +1,95 @@
# SQLite-vec Integration in ProxySQL
This directory contains the integration of [sqlite-vec](https://github.com/asg017/sqlite-vec) - a SQLite extension that provides vector search capabilities directly within SQLite databases.
## What is sqlite-vec?
sqlite-vec is an extension that enables SQLite to perform vector similarity searches. It provides:
- Vector storage and indexing
- Distance calculations (cosine, Euclidean, etc.)
- Approximate nearest neighbor (ANN) search
- Support for multiple vector formats (JSON, binary, etc.)
## Integration Details
### Directory Structure
- `sqlite-vec-source/` - Source files for sqlite-vec (committed to repository)
- `sqlite3/` - Build directory where sqlite-vec gets compiled during the build process
### Integration Method
The integration uses **static linking** to embed sqlite-vec directly into ProxySQL:
1. **Source Storage**: sqlite-vec source files are stored in `sqlite-vec-source/` to persist across builds
2. **Compilation**: During build, sources are copied to the build directory and compiled with static linking flags:
- `-DSQLITE_CORE` - Compiles as part of SQLite core
- `-DSQLITE_VEC_STATIC` - Enables static linking mode
3. **Embedding**: The compiled `vec.o` object file is included in `libproxysql.a`
4. **Auto-loading**: The extension is automatically registered when any SQLite database is opened
### Modified Files
#### Build Files
- `../Makefile` - Updated to ensure git version is available during build
- `../deps/Makefile` - Added compilation target for sqlite-vec
- `../lib/Makefile` - Modified to include vec.o in libproxysql.a
#### Source Files
- `../lib/Admin_Bootstrap.cpp` - Added extension loading and auto-registration code
### Database Instances
The extension is enabled in all ProxySQL SQLite databases:
- **Admin database** - Management interface
- **Stats database** - Runtime statistics
- **Config database** - Configuration storage
- **Monitor database** - Monitoring data
- **Stats disk database** - Persistent statistics
## Usage
Once ProxySQL is built and restarted, you can use vector search functions in any SQLite database:
```sql
-- Create a vector table
CREATE VIRTUAL TABLE my_vectors USING vec0(
vector float[128]
);
-- Insert vectors with JSON format
INSERT INTO my_vectors(rowid, vector)
VALUES (1, json('[0.1, 0.2, 0.3, ..., 0.128]'));
-- Perform similarity search
SELECT rowid, distance
FROM my_vectors
WHERE vector MATCH json('[0.1, 0.2, 0.3, ..., 0.128]')
LIMIT 10;
```
## Compilation Flags
The sqlite-vec source is compiled with these flags:
- `SQLITE_CORE` - Integrate with SQLite core
- `SQLITE_VEC_STATIC` - Static linking mode
- `SQLITE_ENABLE_MEMORY_MANAGEMENT` - Memory management features
- `SQLITE_ENABLE_JSON1` - JSON support
- `SQLITE_DLL=1` - DLL compatibility
## Benefits
- **No runtime dependencies** - Vector search is embedded in the binary
- **Automatic loading** - No need to manually load extensions
- **Full compatibility** - Works with all ProxySQL SQLite databases
- **Performance** - Native SQLite virtual table implementation
## Building
The integration is automatic when building ProxySQL. The sqlite-vec sources are compiled and linked as part of the normal build process.
## Verification
To verify that sqlite-vec is properly integrated:
1. Build ProxySQL: `make`
2. Check symbols: `nm src/proxysql | grep vec`
3. Should see symbols like `sqlite3_vec_init`, `vec0_*`, `vector_*`, etc.

@ -0,0 +1,111 @@
# sqlite-vec - Vector Search for SQLite
This directory contains the source files for [sqlite-vec](https://github.com/asg017/sqlite-vec), an SQLite extension that provides vector search capabilities directly within SQLite databases.
## What is sqlite-vec?
sqlite-vec is an open-source SQLite extension that enables SQLite to perform vector similarity searches. It implements vector search as a SQLite virtual table, providing:
### Features
- **Vector Storage**: Store vectors directly in SQLite tables
- **Vector Indexing**: Efficient indexing for fast similarity searches
- **Distance Functions**:
- Cosine distance
- Euclidean distance
- Inner product
- And more...
- **Approximate Nearest Neighbor (ANN)**: High-performance approximate search
- **Multiple Formats**: Support for JSON, binary, and other vector formats
- **Batch Operations**: Efficient bulk vector operations
### Vector Search Functions
```sql
-- Create a vector table
CREATE VIRTUAL TABLE my_vectors USING vec0(
vector float[128]
);
-- Insert vectors
INSERT INTO my_vectors(rowid, vector)
VALUES (1, json('[0.1, 0.2, 0.3, ..., 0.128]'));
-- Search for similar vectors
SELECT rowid, distance
FROM my_vectors
WHERE vector MATCH json('[0.1, 0.2, 0.3, ..., 0.128]')
LIMIT 10;
```
## Source Files
### sqlite-vec.c
The main implementation file containing:
- Virtual table interface (vec0)
- Vector distance calculations
- Search algorithms
- Extension initialization
### sqlite-vec.h
Header file with:
- Function declarations
- Type definitions
- API documentation
### sqlite-vec.h.tmpl
Template for generating the header file.
## Integration in ProxySQL
These source files are integrated into ProxySQL through static linking:
### Compilation Flags
In ProxySQL's build system, sqlite-vec is compiled with these flags:
- `-DSQLITE_CORE` - Compile as part of SQLite core
- `-DSQLITE_VEC_STATIC` - Enable static linking mode
- `-DSQLITE_ENABLE_MEMORY_MANAGEMENT` - Memory management features
- `-DSQLITE_ENABLE_JSON1` - JSON support
- `-DSQLITE_DLL=1` - DLL compatibility
### Integration Process
1. Sources are stored in this directory (committed to repository)
2. During build, copied to the build directory
3. Compiled with static linking flags
4. Linked into `libproxysql.a`
5. Auto-loaded when SQLite databases are opened
## Licensing
sqlite-vec is licensed under the [MIT License](LICENSE). Please refer to the original project for complete license information.
## Documentation
For complete documentation, examples, and API reference, see:
- [sqlite-vec GitHub Repository](https://github.com/asg017/sqlite-vec)
- [sqlite-vec Documentation](https://sqlite-vec.github.io/)
## Building Standalone
To build sqlite-vec standalone (outside of ProxySQL):
```bash
# Download source
git clone https://github.com/asg017/sqlite-vec.git
cd sqlite-vec
# Build the extension
gcc -shared -fPIC -o libsqlite_vec.so sqlite_vec.c -I/path/to/sqlite/include \
-DSQLITE_VEC_STATIC -DSQLITE_ENABLE_JSON1
```
## Performance Considerations
- Use appropriate vector dimensions for your use case
- Consider the trade-offs between exact and approximate search
- Batch operations are more efficient than single-row operations
- Indexing improves search performance for large datasets
## Contributing
This is a third-party library integrated into ProxySQL. For bugs, features, or contributions:
1. Check the [sqlite-vec repository](https://github.com/asg017/sqlite-vec)
2. Report issues or contribute to the sqlite-vec project
3. ProxySQL-specific integration issues should be reported to the ProxySQL project

@ -67,6 +67,31 @@ using json = nlohmann::json;
#include <sys/utsname.h>
#include "platform.h"
/**
* @brief SQLite-vec extension initialization function declaration
*
* This external function is the entry point for the sqlite-vec extension.
* It's called by SQLite to register the vector search virtual tables and functions.
* The function is part of the sqlite-vec static library that's linked into ProxySQL.
*
* @param db SQLite database connection pointer
* @param pzErrMsg Error message pointer (for returning error information)
* @param pApi SQLite API routines pointer
* @return int SQLite status code (SQLITE_OK on success)
*
* @details The sqlite-vec extension provides vector search capabilities to SQLite,
* enabling ProxySQL to perform vector similarity searches in its internal databases.
* This includes:
* - Vector storage and indexing via vec0 virtual tables
* - Distance calculations (cosine, Euclidean, etc.)
* - Approximate nearest neighbor search
* - Support for JSON-based vector representation
*
* @note This function is automatically called by SQLite's auto-extension mechanism
* when any database connection is established in ProxySQL.
*
* @see https://github.com/asg017/sqlite-vec for sqlite-vec documentation
*/
extern "C" int sqlite3_vec_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi);
#include "microhttpd.h"
@ -509,13 +534,97 @@ bool ProxySQL_Admin::init(const bootstrap_info_t& bootstrap_info) {
pthread_attr_init(&attr);
//pthread_attr_setstacksize (&attr, mystacksize);
/**
* @section SQLite3_Database_Initialization
* @brief Initialize all SQLite databases with sqlite-vec extension support
*
* This section initializes all ProxySQL SQLite databases and enables
* the sqlite-vec extension for vector search capabilities. The extension
* is statically linked into ProxySQL and automatically loaded when each
* database connection is established.
*
* @subsection Integration_Details
*
* The sqlite-vec integration provides vector search capabilities to all
* ProxySQL databases through SQLite's virtual table mechanism:
*
* - **Vector Storage**: Store high-dimensional vectors directly in SQLite tables
* - **Similarity Search**: Find similar vectors using distance metrics
* - **Virtual Tables**: Use vec0 virtual tables for efficient vector indexing
* - **JSON Format**: Support for JSON-based vector representation
*
* @subsection_Databases
*
* The extension is enabled in all ProxySQL database instances:
* - Admin: Configuration and runtime state
* - Stats: Runtime statistics and metrics
* - Config: Persistent configuration storage
* - Monitor: Server monitoring data
* - Stats Disk: Persistent statistics
*
* @subsection_Usage_Examples
*
* Once enabled, vector search can be used in any database:
* @code
* CREATE VIRTUAL TABLE vec_data USING vec0(vector float[128]);
* INSERT INTO vec_data(rowid, vector) VALUES (1, json('[0.1, 0.2, ...]'));
* SELECT rowid, distance FROM vec_data WHERE vector MATCH json('[0.1, 0.2, ...]');
* @endcode
*
* @see sqlite3_vec_init() for extension initialization
* @see deps/sqlite3/README.md for integration documentation
* @see https://github.com/asg017/sqlite-vec for sqlite-vec documentation
*/
admindb=new SQLite3DB();
/**
* @brief Open the admin database with shared cache mode
*
* The admin database stores ProxySQL's configuration and runtime state.
* Using memory with shared cache allows multiple connections to access the same data.
*/
admindb->open((char *)"file:mem_admindb?mode=memory&cache=shared", SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX);
admindb->execute("PRAGMA cache_size = -50000");
/**
* @brief Enable SQLite extension loading for admin database
*
* Allows loading SQLite extensions at runtime. This is required for
* sqlite-vec to be registered when the database is opened.
*/
sqlite3_enable_load_extension(admindb->get_db(),1);
/**
* @brief Register sqlite-vec extension for auto-loading
*
* This function registers the sqlite-vec extension to be automatically
* loaded whenever a new database connection is established.
*
* @details The sqlite-vec extension provides vector search capabilities
* that are now available in the admin database for:
* - Storing and searching vector embeddings in configuration data
* - Performing similarity searches on admin metrics
* - Enhanced analytics on admin operations
*
* @note The sqlite3_vec_init function is cast to a function pointer
* for SQLite's auto-extension mechanism.
*/
sqlite3_auto_extension( (void(*)(void))sqlite3_vec_init);
/**
* @brief Open the stats database with shared cache mode
*
* The stats database stores ProxySQL's runtime statistics and performance metrics.
* This database is crucial for monitoring and analysis operations.
*/
statsdb=new SQLite3DB();
statsdb->open((char *)"file:mem_statsdb?mode=memory&cache=shared", SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX);
/**
* @brief Enable SQLite extension loading for stats database
*
* Allows loading SQLite extensions at runtime. This enables sqlite-vec to be
* registered in the stats database for advanced analytics operations.
*/
sqlite3_enable_load_extension(statsdb->get_db(),1);
// check if file exists , see #617
@ -528,18 +637,71 @@ bool ProxySQL_Admin::init(const bootstrap_info_t& bootstrap_info) {
exit(EXIT_SUCCESS);
}
}
/**
* @brief Open the config database (persistent storage)
*
* The config database stores ProxySQL's persistent configuration data.
* Unlike memory databases, this is file-based and survives restarts.
* It contains user accounts, server groups, query rules, etc.
*/
configdb->open((char *)GloVars.admindb, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX);
/**
* @brief Enable SQLite extension loading for config database
*
* Allows loading SQLite extensions at runtime. This enables sqlite-vec to be
* registered in the config database for:
* - Advanced query rule analysis using vector similarity
* - Configuration optimization with vector-based recommendations
* - Intelligent grouping of similar configurations
*/
sqlite3_enable_load_extension(configdb->get_db(),1);
// Fully synchronous is not required. See to #1055
// https://sqlite.org/pragma.html#pragma_synchronous
configdb->execute("PRAGMA synchronous=0");
monitordb = new SQLite3DB();
/**
* @brief Open the monitor database with shared cache mode
*
* The monitor database stores monitoring data for backend servers.
* It collects connection metrics, query performance, server health status,
* and other monitoring information.
*/
monitordb->open((char *)"file:mem_monitordb?mode=memory&cache=shared", SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX);
/**
* @brief Enable SQLite extension loading for monitor database
*
* Allows loading SQLite extensions at runtime. This enables sqlite-vec to be
* registered in the monitor database for:
* - Advanced anomaly detection using vector similarity
* - Pattern recognition in server behavior over time
* - Clustering similar server performance metrics
* - Predictive monitoring based on historical vector patterns
*/
sqlite3_enable_load_extension(monitordb->get_db(),1);
statsdb_disk = new SQLite3DB();
/**
* @brief Open the stats disk database (persistent statistics)
*
* The stats disk database stores persistent statistics and historical data.
* Unlike memory databases, this is file-based and survives restarts.
* It contains query digest statistics, execution counters, etc.
*/
statsdb_disk->open((char *)GloVars.statsdb_disk, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX);
/**
* @brief Enable SQLite extension loading for stats disk database
*
* Allows loading SQLite extensions at runtime. This enables sqlite-vec to be
* registered in the stats disk database for:
* - Historical query pattern analysis using vector similarity
* - Trend analysis of query performance metrics
* - Clustering similar query digests for optimization insights
* - Long-term performance monitoring with vector-based analytics
*/
sqlite3_enable_load_extension(statsdb_disk->get_db(),1);
// char *dbname = (char *)malloc(strlen(GloVars.statsdb_disk)+50);
// sprintf(dbname,"%s?mode=memory&cache=shared",GloVars.statsdb_disk);

Loading…
Cancel
Save