Retrieving Vectors from Vector Store Files

Ragwalla supports retrieving the actual vector embeddings for files in your vector stores. This feature gives you direct access to the processed embeddings, allowing you to examine how your documents were chunked and embedded, or export vectors for use in other systems.

Overview

When you upload files to a vector store, Ragwalla processes them by:
1. Extracting text from your documents
2. Chunking the text into smaller segments
3. Generating embeddings for each chunk using your chosen embedding model
4. Storing the vectors in Cloudflare Vectorize

The vector retrieval API lets you access these stored embeddings along with their metadata, giving you full visibility into how your content was processed.

API Endpoint

List Vectors for a File

GET /v1/vector_stores/{vectorStoreId}/files/{fileId}/vectors

Parameters:
- {vectorStoreId} - The ID of your vector store
- {fileId} - The ID of the file whose vectors you want to retrieve

Get a Single Vector

GET /v1/vector_stores/{vectorStoreId}/vectors/{vectorId}

Parameters:
- {vectorStoreId} - The ID of your vector store
- {vectorId} - The specific vector/chunk ID you want to retrieve

Request Examples

Basic Request

curl -X GET "https://your-instance.ragwalla.com/v1/vector_stores/vs_abc123/files/file_xyz789/vectors" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Site-Name: your-site-name"

Request with Parameters

curl -X GET "https://your-instance.ragwalla.com/v1/vector_stores/vs_abc123/files/file_xyz789/vectors?limit=50&include_values=false" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Site-Name: your-site-name"

Single Vector Request

curl -X GET "https://your-instance.ragwalla.com/v1/vector_stores/vs_abc123/vectors/chunk_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Site-Name: your-site-name"

Query Parameters

Parameter Type Default Description
include_values boolean true Whether to include the actual vector values in the response
limit integer 100 Number of vectors to return (max: 1000)
cursor string - Pagination cursor from previous response

Response Examples

Successful Response

When the file has been fully processed and vectors are available:

{
  "object": "list",
  "data": [
    {
      "object": "vector",
      "id": "chunk_abc123",
      "values": [0.1234, -0.5678, 0.9012, ...],
      "metadata": {
        "resource_type": "file",
        "fileId": "file_xyz789",
        "chunkId": "chunk_abc123",
        "chunk_number": 0,
        "chunk_path": "content.text",
        "chunk_start": 0,
        "chunk_end": 512,
        "file_id": "file_xyz789",
        "filename": "document.pdf",
        "page": 1
      },
      "created_at": 1754400370
    },
    {
      "object": "vector",
      "id": "chunk_def456",
      "values": [0.2345, -0.6789, 0.0123, ...],
      "metadata": {
        "resource_type": "file",
        "fileId": "file_xyz789",
        "chunkId": "chunk_def456",
        "chunk_number": 1,
        "chunk_path": "content.text",  
        "chunk_start": 512,
        "chunk_end": 1024,
        "file_id": "file_xyz789",
        "filename": "document.pdf",
        "page": 2
      },
      "created_at": 1754400371
    }
  ],
  "has_more": true,
  "next_cursor": "2"
}

File Still Processing

If the file is still being processed, you'll get a 202 status:

{
  "object": "list",
  "data": [],
  "has_more": false,
  "status": "in_progress",
  "message": "Vectors are still being generated. Please try again later.",
  "file_status": "in_progress",
  "retry_after": 30
}

Processing Failed

If file processing failed, you'll get a 400 status:

{
  "error": "File processing failed",
  "file_id": "file_xyz789", 
  "status": "failed",
  "error_message": "PDF extraction failed: Invalid PDF format"
}

Without Vector Values

When include_values=false:

{
  "object": "list",
  "data": [
    {
      "object": "vector",
      "id": "chunk_abc123",
      "metadata": {
        "resource_type": "file",
        "fileId": "file_xyz789",
        "chunkId": "chunk_abc123",
        "chunk_number": 0,
        "chunk_path": "content.text",
        "chunk_start": 0,
        "chunk_end": 512,
        "file_id": "file_xyz789",
        "filename": "document.pdf"
      },
      "created_at": 1754400370
    }
  ],
  "has_more": false
}

Understanding Vector Metadata

Each vector includes detailed metadata about the chunk it represents:

Field Description
chunk_number Sequential number of this chunk within the file
chunk_path Path to the content (usually "content.text")
chunk_start Character position where this chunk starts
chunk_end Character position where this chunk ends
filename Original filename
page Page number (for PDFs and similar documents)
fileId The original file ID
chunkId Unique identifier for this chunk

Pagination

When working with large files that have many chunks, use pagination:

let cursor = null;
let allVectors = [];

do {
  const params = new URLSearchParams({
    limit: '100',
    ...(cursor && { cursor })
  });

  const response = await fetch(
    `https://your-instance.ragwalla.com/v1/vector_stores/vs_abc123/files/file_xyz789/vectors?${params}`,
    {
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'X-Site-Name': 'your-site-name'
      }
    }
  );

  const data = await response.json();
  allVectors.push(...data.data);
  cursor = data.next_cursor;
} while (cursor);

Common Use Cases

Debugging Chunking Strategy

Examine how your documents were split into chunks:

// Get vectors without values to see chunk boundaries
const response = await fetch(
  'https://your-instance.ragwalla.com/v1/vector_stores/vs_abc123/files/file_xyz789/vectors?include_values=false',
  { /* headers */ }
);

const data = await response.json();
data.data.forEach(vector => {
  console.log(`Chunk ${vector.metadata.chunk_number}: chars ${vector.metadata.chunk_start}-${vector.metadata.chunk_end}`);
});

Exporting Vectors for Analysis

Export embeddings for use in other systems:

const response = await fetch(
  'https://your-instance.ragwalla.com/v1/vector_stores/vs_abc123/files/file_xyz789/vectors',
  { /* headers */ }
);

const data = await response.json();
const embeddings = data.data.map(vector => ({
  id: vector.id,
  text_chunk: `${vector.metadata.filename} chunk ${vector.metadata.chunk_number}`,
  embedding: vector.values,
  metadata: vector.metadata
}));

// Now you can use embeddings in your own analysis

Quality Assurance

Check that file processing completed successfully:

async function checkFileProcessing(vectorStoreId, fileId) {
  const response = await fetch(
    `https://your-instance.ragwalla.com/v1/vector_stores/${vectorStoreId}/files/${fileId}/vectors?limit=1`,
    { /* headers */ }
  );

  if (response.status === 202) {
    console.log('File still processing...');
    return false;
  } else if (response.status === 400) {
    const error = await response.json();
    console.log('Processing failed:', error.error_message);
    return false;
  } else {
    const data = await response.json();
    console.log(`File processed successfully: ${data.data.length} chunks found`);
    return true;
  }
}

Important Notes

  • Vector dimensions depend on your embedding model (e.g., 1536 for text-embedding-3-small)
  • Large files may have hundreds or thousands of chunks - use pagination appropriately
  • Processing time varies based on file size and complexity
  • Rate limits apply - avoid making too many concurrent requests
  • Vector values are the actual embeddings generated by your chosen model

Error Handling

Always handle potential errors when retrieving vectors:

async function getVectors(vectorStoreId, fileId) {
  try {
    const response = await fetch(
      `https://your-instance.ragwalla.com/v1/vector_stores/${vectorStoreId}/files/${fileId}/vectors`,
      {
        headers: {
          'Authorization': 'Bearer YOUR_API_KEY',
          'X-Site-Name': 'your-site-name'
        }
      }
    );

    if (response.status === 202) {
      // Still processing
      const data = await response.json();
      console.log(data.message);
      return { status: 'processing', retryAfter: data.retry_after };
    } else if (!response.ok) {
      // Error occurred
      const error = await response.json();
      throw new Error(error.error || 'Failed to retrieve vectors');
    }

    return await response.json();
  } catch (error) {
    console.error('Error retrieving vectors:', error.message);
    throw error;
  }
}

Next Steps

Now that you can retrieve vectors from your files, you might want to:

  • Analyze the quality of your chunking strategy
  • Export embeddings for use in custom similarity search
  • Debug issues with document processing
  • Integrate vectors with external analytics tools

For more advanced vector operations, see our Vector Search Guide and Custom Embedding Models documentation.


Need help with vector retrieval? Contact our support team for assistance with implementing vector analysis in your applications.