Turn documents into structured, queryable knowledge. Upload files, define a schema, and let your agents traverse entities and relationships — not just match keywords.

What is a knowledge graph?

A knowledge graph extracts structured entities and relationships from your documents and stores them in a queryable graph. Instead of plain vector search ("find chunks that sound similar to my question"), your agent can follow connections: Who founded Acme Corp? → What products does Acme make? → Which customers use those products?

You define what matters by providing a schema — the entity types and relationship types you care about. The system uses an LLM to extract matching entities and relationships from every file you add, then indexes them for both semantic search and graph traversal.

When to use a knowledge graph vs. plain vector search

Use a knowledge graph when:

  • Your data has meaningful relationships (people → companies, products → features, papers → citations)
  • Your agents need to answer multi-hop questions ("Who manages the team that built Feature X?")
  • You want structured extraction, not just fuzzy retrieval
  • You need to browse, filter, and inspect what was extracted

Use plain vector search when:

  • You just need "find the most relevant passage" for a question
  • Your documents are homogeneous and don't have relational structure
  • Speed matters more than precision

Quick start

1. Create a knowledge graph with a schema

The schema tells the extraction model what entity types and relationship types to look for.

curl -X POST https://api.ragwalla.com/knowledge_graphs \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Engineering Org",
    "description": "People, teams, and projects across engineering",
    "embedding_settings": {
      "model": "text-embedding-3-small"
    },
    "extraction_schema": {
      "type": "object",
      "properties": {
        "entities": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "entity_type": { "enum": ["Person", "Team", "Project", "Technology"] },
              "properties": { "type": "object" }
            },
            "additionalProperties": false,
            "required": ["name", "entity_type"]
          }
        },
        "relationships": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "from_entity": { "type": "string" },
              "to_entity": { "type": "string" },
              "relationship_type": { "enum": ["manages", "member_of", "works_on", "uses"] },
              "properties": { "type": "object" }
            },
            "additionalProperties": false,
            "required": ["from_entity", "to_entity", "relationship_type"]
          }
        }
      },
      "additionalProperties": false,
      "required": ["entities", "relationships"]
    }
  }'
{
  "id": "kg_abc123",
  "object": "knowledge_graph",
  "name": "Engineering Org",
  "description": "People, teams, and projects across engineering",
  "project_id": "proj_xyz",
  "embedding_model": "text-embedding-3-small",
  "extraction_model": "google/gemini-3-flash-preview",
  "extraction_schema": { "..." },
  "extraction_prompt": null,
  "document_schema": null,
  "dimensions": 1536,
  "metric": "cosine",
  "status": "active",
  "entity_count": 0,
  "relationship_count": 0,
  "file_count": 0,
  "created_at": 1710000000
}

2. Upload a file and add it to the graph

Files are uploaded through the Files API first, then associated with a knowledge graph.

# Upload the file
curl -X POST https://api.ragwalla.com/files \
  -H "Authorization: Bearer $API_KEY" \
  -F purpose=knowledge_graph \
  -F file=@team-roster.pdf

# Add it to the knowledge graph
curl -X POST https://api.ragwalla.com/knowledge_graphs/kg_abc123/files \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "file_id": "file_roster456" }'
{
  "knowledge_base_id": "kg_abc123",
  "file_id": "file_roster456",
  "status": "pending",
  "entity_count": 0,
  "relationship_count": 0,
  "created_at": 1710000000
}

The file is now queued for processing. The system will:

  1. Extract text (with OCR for PDFs)
  2. Chunk the content
  3. Generate embeddings
  4. Run the extraction model against your schema to find entities and relationships
  5. Index everything for search and traversal

Processing is asynchronous. Poll the file status to track progress:

curl https://api.ragwalla.com/knowledge_graphs/kg_abc123/files/file_roster456 \
  -H "Authorization: Bearer $API_KEY"
{
  "knowledge_base_id": "kg_abc123",
  "file_id": "file_roster456",
  "filename": "team-roster.pdf",
  "content_type": "application/pdf",
  "bytes": 245000,
  "status": "active",
  "entity_count": 34,
  "relationship_count": 47,
  "chunks_extracted": 12,
  "total_chunks": 12,
  "created_at": 1710000000,
  "updated_at": 1710000060
}

File status progresses through: pendingprocessingactive (or failed).

3. Query the graph

curl -X POST https://api.ragwalla.com/knowledge_graphs/kg_abc123/query \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who works on the payments project?",
    "max_hops": 2,
    "top_k": 10
  }'
{
  "entities": [
    {
      "entity_id": "ent_pay001",
      "name": "Payments",
      "entity_type": "Project",
      "properties_json": "{\"status\": \"active\", \"started\": \"2024-Q1\"}"
    }
  ],
  "neighbors": [
    {
      "entity_id": "ent_per042",
      "name": "Alice Chen",
      "entity_type": "Person",
      "properties_json": "{\"role\": \"Tech Lead\"}"
    },
    {
      "entity_id": "ent_team03",
      "name": "Platform Team",
      "entity_type": "Team",
      "properties_json": null
    }
  ],
  "relationships": [
    {
      "from_entity": "ent_per042",
      "to_entity": "ent_pay001",
      "relationship_type": "works_on"
    },
    {
      "from_entity": "ent_per042",
      "to_entity": "ent_team03",
      "relationship_type": "member_of"
    }
  ]
}

The query endpoint combines semantic vector search with keyword matching, then traverses the graph outward from the matched entities. max_hops controls how far to follow relationships — 1 returns direct connections, 2 returns connections of connections.

4. Attach the graph to an agent

curl -X POST https://api.ragwalla.com/agents/ag_myagent/knowledge_graphs \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "knowledge_base_id": "kg_abc123" }'
{
  "agent_id": "ag_myagent",
  "knowledge_base_id": "kg_abc123",
  "created_at": 1710000000
}

Once attached, the agent gets a search_knowledge_graph tool. When users ask questions, the agent can query the graph, follow relationships, and include the results in its response — without you writing any tool code.


Schemas

Extraction schema

The extraction schema is a JSON Schema that defines what the extraction model looks for in your documents. It must follow a specific structure:

{
  "type": "object",
  "properties": {
    "entities": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "entity_type": { "enum": ["Person", "Company", "Product"] },
          "properties": { "type": "object" }
        },
        "additionalProperties": false,
        "required": ["name", "entity_type"]
      }
    },
    "relationships": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "from_entity": { "type": "string" },
          "to_entity": { "type": "string" },
          "relationship_type": { "enum": ["founded_by", "works_at", "produces"] },
          "properties": { "type": "object" }
        },
        "additionalProperties": false,
        "required": ["from_entity", "to_entity", "relationship_type"]
      }
    }
  },
  "additionalProperties": false,
  "required": ["entities", "relationships"]
}

Key rules:

  • entity_type must be an enum — the extraction model picks from a fixed list of types, not free text
  • relationship_type must also be an enum
  • additionalProperties: false is required at each level
  • The optional properties object on entities and relationships lets you capture extra attributes (role, date, amount, etc.) without constraining them to a schema

Document schema

If you already have a JSON Schema that describes the structure of your documents (fields, nested objects, etc.), you can send it as document_schema and the system will infer an appropriate extraction schema from it using an LLM:

curl -X POST https://api.ragwalla.com/knowledge_graphs \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Invoice Graph",
    "embedding_settings": { "model": "text-embedding-3-small" },
    "document_schema": {
      "type": "object",
      "properties": {
        "invoice_number": { "type": "string" },
        "vendor": { "type": "string" },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": { "type": "string" },
              "amount": { "type": "number" }
            }
          }
        }
      }
    }
  }'

You cannot provide both extraction_schema and document_schema in the same request.

Schema suggestion

Not sure what schema to use? Upload a few files first, then ask the system to suggest one:

curl -X POST https://api.ragwalla.com/knowledge_graphs/kg_abc123/schema/suggest \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "max_files": 5,
    "max_chunks_per_file": 3
  }'
{
  "object": "knowledge_graph.schema_suggestion",
  "knowledge_base_id": "kg_abc123",
  "model": "google/gemini-3-flash-preview",
  "sampled_files": 3,
  "sampled_chunks": 9,
  "source_file_ids": ["file_a", "file_b", "file_c"],
  "extraction_schema": {
    "type": "object",
    "properties": {
      "entities": { "..." },
      "relationships": { "..." }
    }
  },
  "summary": "Identified Person, Department, and Policy entity types with manages, belongs_to, and governs relationships.",
  "assumptions": [
    "Documents are HR policy files",
    "Department names are unique identifiers"
  ]
}

The suggestion samples content from your uploaded files and generates a schema using the extraction model. You can scope the suggestion to specific files with file_ids, or let it sample automatically.

If the suggested schema is too weak (not enough distinct types or relationships), the endpoint returns 422 with an issues array explaining what's wrong.


Browsing entities and relationships

After files are processed, you can inspect what was extracted.

List entities

# All entities
curl "https://api.ragwalla.com/knowledge_graphs/kg_abc123/entities?limit=20" \
  -H "Authorization: Bearer $API_KEY"

# Filter by type
curl "https://api.ragwalla.com/knowledge_graphs/kg_abc123/entities?entity_type=Person&limit=20" \
  -H "Authorization: Bearer $API_KEY"

Get a specific entity

curl https://api.ragwalla.com/knowledge_graphs/kg_abc123/entities/ent_per042 \
  -H "Authorization: Bearer $API_KEY"

List relationships

# All relationships
curl "https://api.ragwalla.com/knowledge_graphs/kg_abc123/relationships?limit=20" \
  -H "Authorization: Bearer $API_KEY"

# Relationships for a specific entity
curl "https://api.ragwalla.com/knowledge_graphs/kg_abc123/relationships?entity_id=ent_per042" \
  -H "Authorization: Bearer $API_KEY"

# Filter by relationship type
curl "https://api.ragwalla.com/knowledge_graphs/kg_abc123/relationships?relationship_type=manages" \
  -H "Authorization: Bearer $API_KEY"

# Filter by source file
curl "https://api.ragwalla.com/knowledge_graphs/kg_abc123/relationships?source_file_id=file_roster456" \
  -H "Authorization: Bearer $API_KEY"

Delete an entity

curl -X DELETE https://api.ragwalla.com/knowledge_graphs/kg_abc123/entities/ent_per042 \
  -H "Authorization: Bearer $API_KEY"

Search vs. query

There are two ways to retrieve information from a knowledge graph:

Search — flat semantic match

curl -X POST https://api.ragwalla.com/knowledge_graphs/kg_abc123/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning engineers",
    "top_k": 10,
    "entity_type": "Person"
  }'

Search finds entities whose names or descriptions are semantically similar to your query. It returns a flat list — no graph traversal, no relationships. Use it when you want a quick lookup.

Query — semantic match + graph traversal

curl -X POST https://api.ragwalla.com/knowledge_graphs/kg_abc123/query \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning engineers",
    "max_hops": 2,
    "top_k": 10
  }'

Query starts with the same semantic match, then walks outward through relationships up to max_hops levels. It returns the matched entities, their neighbors, and the relationships connecting them. Use it when you need context — not just "who matches?" but "who are they connected to, and how?"


Managing knowledge graphs

Update a knowledge graph

curl -X POST https://api.ragwalla.com/knowledge_graphs/kg_abc123 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Engineering Org v2",
    "extraction_prompt": "Focus on reporting relationships and project ownership."
  }'

You can update name, description, extraction_model, extraction_schema, extraction_prompt, and document_schema after creation. The embedding_settings are immutable — they're locked at creation because existing vectors were generated with that model.

Updating the extraction schema or prompt does not retroactively re-extract existing files. It applies to files added after the change.

Remove a file

curl -X DELETE https://api.ragwalla.com/knowledge_graphs/kg_abc123/files/file_roster456 \
  -H "Authorization: Bearer $API_KEY"
{
  "knowledge_base_id": "kg_abc123",
  "file_id": "file_roster456",
  "deleted": false,
  "cleanup_queued": true,
  "cleanup_status": "queued"
}

File removal is asynchronous. The response returns 202 immediately. Entities, relationships, vectors, and chunks associated with the file are cleaned up in the background.

List an agent's knowledge graphs

curl https://api.ragwalla.com/agents/ag_myagent/knowledge_graphs \
  -H "Authorization: Bearer $API_KEY"
{
  "object": "list",
  "data": [
    {
      "agent_id": "ag_myagent",
      "knowledge_base_id": "kg_abc123",
      "name": "Engineering Org",
      "description": "People, teams, and projects across engineering",
      "embedding_model": "text-embedding-3-small",
      "status": "active",
      "created_at": 1710000000
    }
  ]
}

Detach a knowledge graph from an agent

curl -X DELETE https://api.ragwalla.com/agents/ag_myagent/knowledge_graphs/kg_abc123 \
  -H "Authorization: Bearer $API_KEY"

Delete a knowledge graph

curl -X DELETE https://api.ragwalla.com/knowledge_graphs/kg_abc123 \
  -H "Authorization: Bearer $API_KEY"
{
  "id": "kg_abc123",
  "object": "knowledge_graph",
  "deleted": true
}

Deleting a knowledge graph removes all entities, relationships, file associations, and vector indexes.


Use cases

Internal knowledge base

Upload company wikis, onboarding docs, and org charts. Define entity types like Person, Team, Policy, System with relationships like manages, owns, depends_on. Your support agents can then answer questions like "Who owns the billing system?" by traversing the graph rather than hoping a relevant text chunk appears in search results.

Research corpus

Upload academic papers with entity types like Author, Paper, Institution, Method and relationships like authored_by, cites, affiliated_with. Query with max_hops: 3 to discover citation chains and collaboration networks.

Product catalog

Upload product specs and datasheets. Extract Product, Feature, Component, Specification entities with has_feature, contains, compatible_with relationships. Sales agents can answer "Which products support feature X and are compatible with System Y?" with graph traversal instead of keyword guessing.

Compliance and policy

Upload regulatory documents. Extract Regulation, Requirement, Department, Process entities with subject_to, responsible_for, references relationships. Compliance agents can trace which departments are affected by a regulation change by following the graph.


Configuration reference

Field Set at Mutable Description
name creation yes Display name
description creation yes Human-readable description
embedding_settings.model creation no Embedding model for vector search
embedding_settings.metric creation no Distance metric (default: cosine)
extraction_model creation yes LLM used for entity/relationship extraction
extraction_schema creation yes JSON Schema defining entity and relationship types
extraction_prompt creation yes Custom instructions for the extraction model
document_schema creation yes Document structure schema (auto-compiles to extraction schema)

API reference summary

Method Endpoint Description
POST /knowledge_graphs Create a knowledge graph
GET /knowledge_graphs List knowledge graphs
GET /knowledge_graphs/:id Get a knowledge graph
POST /knowledge_graphs/:id Update a knowledge graph
DELETE /knowledge_graphs/:id Delete a knowledge graph
POST /knowledge_graphs/:id/files Add a file
GET /knowledge_graphs/:id/files List files
GET /knowledge_graphs/:id/files/:fileId Get file status
DELETE /knowledge_graphs/:id/files/:fileId Remove a file
GET /knowledge_graphs/:id/entities List entities
GET /knowledge_graphs/:id/entities/:entityId Get an entity
DELETE /knowledge_graphs/:id/entities/:entityId Delete an entity
GET /knowledge_graphs/:id/relationships List relationships
POST /knowledge_graphs/:id/search Semantic entity search
POST /knowledge_graphs/:id/query Semantic search + graph traversal
POST /knowledge_graphs/:id/schema/suggest Suggest an extraction schema
POST /agents/:id/knowledge_graphs Attach graph to agent
GET /agents/:id/knowledge_graphs List agent's graphs
DELETE /agents/:id/knowledge_graphs/:kgId Detach graph from agent