RaaS API Documentation

🔌 API Endpoints

Method	Endpoint	Description
`GET`	`/`	Service dashboard (HTML)
`GET`	`/health`	Health check
`POST`	`/chat`	Chat with RAG context (non-streaming)
`POST`	`/chat/stream`	Chat with RAG context (Server-Sent Events)
`GET`	`/v1/models`	OpenAI-compatible model list
`POST`	`/v1/chat/completions`	OpenAI-compatible chat completions (supports `stream`)

📋 Request & Response Format

POST /chat Non-streaming chat with RAG context

Request Body

Field	Type	Description
`messages`	array	Array of message objects with `role` and `content`

Example Request

{
  "messages": [
    { "role": "system", "content": "Optional system prompt override" },
    { "role": "user", "content": "What does the invoice say about payment terms?" }
  ]
}

Response

200 OK

{
  "model": "kairos-raas",
  "content": "Based on the document, the payment terms are Net 30...",
  "token_count": 42
}

POST /chat/stream Streaming chat (Server-Sent Events)

Request Body

Same as /chat

Response (SSE Stream)

200 OK

data: {"content": "Based"}

data: {"content": " on"}

data: {"content": " the"}

...

data: [DONE]

POST /v1/chat/completions OpenAI-compatible chat completions

Example Request

{
  "model": "kairos-raas",
  "stream": true,
  "messages": [
    { "role": "user", "content": "Summarize this KB" }
  ]
}

Streaming Response (SSE)

200 OK

data: {"object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]}

data: {"object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

data: {"object":"chat.completion.chunk","choices":[{"finish_reason":"stop","delta":{}}]}

data: [DONE]

GET /health Health check

Response

200 OK

{
  "status": "ok",
  "service": "My Knowledge Base",
  "port": 5001
}

💻 Code Examples

All examples use http://localhost:5001 — replace 5001 with your service's configured port.

Non-Streaming

import requests

BASE_URL = "http://localhost:5001"

response = requests.post(f"{BASE_URL}/chat", json={
    "messages": [
        {"role": "user", "content": "Summarize the uploaded document"}
    ]
})

data = response.json()
print(data["content"])

Streaming (SSE)

import requests
import json

response = requests.post(
    "http://localhost:5001/chat/stream",
    json={"messages": [{"role": "user", "content": "What are the key findings?"}]},
    stream=True
)

for line in response.iter_lines():
    if line:
        text = line.decode("utf-8")
        if text.startswith("data: ") and text != "data: [DONE]":
            chunk = json.loads(text[6:])
            print(chunk["content"], end="", flush=True)

Using httpx

import httpx

with httpx.Client(base_url="http://localhost:5001") as client:
    r = client.post("/chat", json={
        "messages": [{"role": "user", "content": "List all action items from the document"}]
    })
    print(r.json()["content"])

Non-Streaming

using System.Net.Http.Json;

var client = new HttpClient { BaseAddress = new Uri("http://localhost:5001") };

var request = new
{
    messages = new[]
    {
        new { role = "user", content = "What is the total amount on this invoice?" }
    }
};

var response = await client.PostAsJsonAsync("/chat", request);
var result = await response.Content.ReadFromJsonAsync<ChatResponse>();
Console.WriteLine(result?.Content);

// Response model
record ChatResponse(string Model, string Content, int TokenCount);

Streaming (SSE)

using System.Net.Http.Json;
using System.Text.Json;

var client = new HttpClient { BaseAddress = new Uri("http://localhost:5001") };

var request = new
{
    messages = new[] { new { role = "user", content = "Explain the contract terms" } }
};

var httpRequest = new HttpRequestMessage(HttpMethod.Post, "/chat/stream")
{
    Content = JsonContent.Create(request)
};

var response = await client.SendAsync(httpRequest, HttpCompletionOption.ResponseHeadersRead);
using var stream = await response.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);

while (!reader.EndOfStream)
{
    var line = await reader.ReadLineAsync();
    if (string.IsNullOrEmpty(line)) continue;
    if (line == "data: [DONE]") break;
    if (line.StartsWith("data: "))
    {
        var json = line[6..];
        var chunk = JsonSerializer.Deserialize<JsonElement>(json);
        Console.Write(chunk.GetProperty("content").GetString());
    }
}

Non-Streaming (fetch API)

// Node.js / Browser (fetch API)
const response = await fetch("http://localhost:5001/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    messages: [{ role: "user", content: "What does this document say about deadlines?" }]
  })
});

const data = await response.json();
console.log(data.content);

Streaming (SSE)

const response = await fetch("http://localhost:5001/chat/stream", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    messages: [{ role: "user", content: "List the key points" }]
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  for (const line of text.split("\n")) {
    if (line.startsWith("data: ") && line !== "data: [DONE]") {
      const chunk = JSON.parse(line.slice(6));
      process.stdout.write(chunk.content);
    }
  }
}

Non-Streaming

import java.net.URI;
import java.net.http.*;
import com.google.gson.JsonParser;

public class KairosRaasClient {
    private static final String BASE_URL = "http://localhost:5001";

    public static void main(String[] args) throws Exception {
        HttpClient client = HttpClient.newHttpClient();

        String body = """
            {
                "messages": [
                    {"role": "user", "content": "What are the payment terms?"}
                ]
            }
            """;

        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(BASE_URL + "/chat"))
            .header("Content-Type", "application/json")
            .POST(HttpRequest.BodyPublishers.ofString(body))
            .build();

        HttpResponse<String> response = client.send(request,
            HttpResponse.BodyHandlers.ofString());

        var json = JsonParser.parseString(response.body()).getAsJsonObject();
        System.out.println(json.get("content").getAsString());
    }
}

Streaming (SSE)

import java.net.URI;
import java.net.http.*;
import java.util.stream.Stream;
import com.google.gson.JsonParser;

HttpClient client = HttpClient.newHttpClient();

String body = """
    {"messages": [{"role": "user", "content": "Summarize the report"}]}
    """;

HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create("http://localhost:5001/chat/stream"))
    .header("Content-Type", "application/json")
    .POST(HttpRequest.BodyPublishers.ofString(body))
    .build();

HttpResponse<Stream<String>> response = client.send(request,
    HttpResponse.BodyHandlers.ofLines());

response.body().forEach(line -> {
    if (line.startsWith("data: ") && !line.equals("data: [DONE]")) {
        var json = JsonParser.parseString(line.substring(6)).getAsJsonObject();
        System.out.print(json.get("content").getAsString());
    }
});

Non-Streaming

use reqwest::Client;
use serde_json::{json, Value};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::new();

    let response = client
        .post("http://localhost:5001/chat")
        .json(&json!({
            "messages": [
                {"role": "user", "content": "What is the summary?"}
            ]
        }))
        .send()
        .await?;

    let data: Value = response.json().await?;
    println!("{}", data["content"].as_str().unwrap_or_default());

    Ok(())
}

Health Check

curl http://localhost:5001/health

Non-Streaming Chat

curl -X POST http://localhost:5001/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Summarize the uploaded document"}
    ]
  }'

Streaming Chat (SSE)

curl -N -X POST http://localhost:5001/chat/stream \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What are the key findings?"}
    ]
  }'

Non-Streaming

$body = @{
    messages = @(
        @{ role = "user"; content = "What does the document say about pricing?" }
    )
} | ConvertTo-Json -Depth 3

$response = Invoke-RestMethod -Uri "http://localhost:5001/chat" `
    -Method Post -ContentType "application/json" -Body $body

Write-Host $response.content

📝 Multi-turn Conversation

All endpoints support multi-turn conversations. Pass the full message history:

{
  "messages": [
    { "role": "system", "content": "You are a legal assistant. Answer based only on the provided documents." },
    { "role": "user", "content": "What is the contract duration?" },
    { "role": "assistant", "content": "The contract duration is 12 months from the signing date." },
    { "role": "user", "content": "What happens if either party wants to terminate early?" }
  ]
}

⚠️ Error Handling

HTTP Status Codes

Code	Meaning
`200`	Success
`400`	Bad request (missing/empty messages array)
`404`	Unknown endpoint
`500`	Server error (model not loaded, internal failure)

Example Error Response

400

{
  "error": {
    "message": "Messages array is required",
    "type": "invalid_request_error"
  }
}

📡 RAG-as-a-Service (RaaS)

⚡ How It Works

🔌 API Endpoints

📋 Request & Response Format

Request Body

Example Request

Response

Request Body

Response (SSE Stream)

Example Request

Streaming Response (SSE)

Response

💻 Code Examples

Non-Streaming

Streaming (SSE)

Using httpx

Non-Streaming

Streaming (SSE)

Non-Streaming (fetch API)

Streaming (SSE)

Non-Streaming

Streaming (SSE)

Non-Streaming

Health Check

Non-Streaming Chat

Streaming Chat (SSE)

Non-Streaming

📝 Multi-turn Conversation

⚠️ Error Handling

Example Error Response