From 5fd779464d55f3e0eeef41b5975b3f76cc221331 Mon Sep 17 00:00:00 2001 From: Rene Cannao Date: Fri, 23 Jan 2026 18:00:53 +0000 Subject: [PATCH] Add embedding testing plan documentation Document how to test RAG ingestion with embeddings using external OpenAI-compatible services. Users only need to set 4 environment variables (OPENAI_API_BASE, OPENAI_API_KEY, OPENAI_MODEL, OPENAI_EMBEDDING_DIM) and run the test script. Related to issue #5320 (Batch embedding generation) --- RAG_POC/EMBEDDING_TEST_PLAN.md | 130 +++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 RAG_POC/EMBEDDING_TEST_PLAN.md diff --git a/RAG_POC/EMBEDDING_TEST_PLAN.md b/RAG_POC/EMBEDDING_TEST_PLAN.md new file mode 100644 index 000000000..304d71cd5 --- /dev/null +++ b/RAG_POC/EMBEDDING_TEST_PLAN.md @@ -0,0 +1,130 @@ +# Embedding Testing Plan + +## Prerequisites + +1. MySQL server running with test database +2. OpenAI-compatible embedding service accessible + +## Quick Start + +```bash +cd /home/rene/pr5312/proxysql/RAG_POC + +# Step 1: Set your embedding service credentials +export OPENAI_API_BASE="https://your-embedding-service.com/v1" +export OPENAI_API_KEY="your-api-key-here" +export OPENAI_MODEL="your-model-name" +export OPENAI_EMBEDDING_DIM=1536 # Adjust based on your model + +# Step 2: Run the test +./test_rag_ingest.sh +``` + +--- + +## Configuration Options + +### OpenAI API +```bash +export OPENAI_API_BASE="https://api.openai.com/v1" +export OPENAI_API_KEY="sk-your-openai-key" +export OPENAI_MODEL="text-embedding-3-small" +export OPENAI_EMBEDDING_DIM=1536 +``` + +### Azure OpenAI +```bash +export OPENAI_API_BASE="https://your-resource.openai.azure.com/openai/deployments/your-deployment" +export OPENAI_API_KEY="your-azure-key" +export OPENAI_MODEL="text-embedding-ada-002" # Your deployment name +export OPENAI_EMBEDDING_DIM=1536 +``` + +### Other OpenAI-compatible services +```bash +# Any service with OpenAI-compatible API +export OPENAI_API_BASE="https://your-service.com/v1" +export OPENAI_API_KEY="your-key" +export OPENAI_MODEL="model-name" +export OPENAI_EMBEDDING_DIM=dim # e.g., 768, 1536, 3072 +``` + +--- + +## What the Test Does + +**Phase 4** (runs automatically with OPENAI_ variables set): +1. Creates RAG database with schema +2. Configures embedding with your credentials +3. Ingests 10 documents from MySQL +4. Generates embeddings via your service +5. Verifies: + - 10 documents created + - 10 chunks created + - **10 embeddings created** + - Vector self-match works (search finds itself) + +--- + +## Expected Output + +``` +==> embedding_json: {"enabled":true,"provider":"openai","api_base":"https://...","api_key":"***","model":"...","dim":1536,"input":{"concat":[{"col":"Title"},{"lit":"\n"},{"chunk_body":true}]}} +Ingesting source_id=1 name=test_source backend=mysql table=posts +Done source test_source ingested_docs=10 skipped_docs=0 +OK: rag_documents (embeddings enabled) = 10 +OK: rag_chunks (embeddings enabled) = 10 +OK: rag_vec_chunks (embeddings enabled) = 10 +OK: vec self-match (posts:1#0) = posts:1#0 +``` + +--- + +## Verification Queries + +After the test, manually verify: + +```bash +sqlite3 rag_ingest_test_openai.db <