Skip to main content

API Reference

Complete technical reference for all TABS API endpoints. This guide provides detailed documentation for each endpoint, including parameters, response formats, and usage examples.

Base URL and Authentication

Base URL: https://api.tabstack.ai

Authentication: Include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Common Error Responses

All endpoints return consistent error response format:

{
"error": "Error description"
}

HTTP Status Codes:

  • 400 - Bad Request (invalid parameters)
  • 401 - Unauthorized (invalid API key)
  • 404 - Not Found (URL not accessible)
  • 429 - Rate Limit Exceeded
  • 500 - Internal Server Error

Endpoints

GET /fetch

Retrieve raw HTML content from any URL with intelligent fetching strategies.

Parameters

ParameterTypeRequiredDescription
urlstringYesTarget URL to fetch

Example Request

curl -X GET "https://api.tabstack.ai/fetch?url=https://example.com/article" \
-H "Authorization: Bearer YOUR_API_KEY"

Example Response

{
"url": "https://example.com/article",
"statusCode": 200,
"body": "<!DOCTYPE html><html><head><title>Article Title</title>...",
"headers": {
"content-type": "text/html; charset=UTF-8",
"server": "nginx/1.18.0",
"last-modified": "Wed, 15 Jan 2024 10:30:00 GMT"
}
}

Response Fields

FieldTypeDescription
urlstringOriginally requested URL
statusCodeintegerHTTP response status code
bodystringRaw HTML content
headersobjectHTTP response headers

GET /markdown

Convert HTML content to clean, structured markdown perfect for LLM processing.

Parameters

ParameterTypeRequiredDescription
urlstringYesTarget URL to convert
metadatabooleanNoInclude metadata extraction (default: false)
nocachebooleanNoSkip cached content (default: false)

Example Request

curl -X GET "https://api.tabstack.ai/markdown?url=https://blog.example.com/post&metadata=true" \
-H "Authorization: Bearer YOUR_API_KEY"

Example Response

{
"url": "https://blog.example.com/post",
"content": "# Article Title\n\nThis is the main content converted to clean markdown...\n\n## Section Header\n\nMore content here with [links](https://example.com) preserved.",
"metadata": {
"title": "Article Title",
"description": "Article description from meta tags",
"author": "Jane Smith",
"publishedTime": "2024-01-15T10:30:00Z",
"keywords": ["technology", "AI", "development"],
"image": "https://blog.example.com/image.jpg",
"canonicalUrl": "https://blog.example.com/post"
}
}

Response Fields

FieldTypeDescription
urlstringSource URL
contentstringClean markdown content
metadataobjectExtracted metadata (if requested)

Metadata Fields

FieldTypeDescription
titlestringPage title
descriptionstringMeta description
authorstringArticle author
publishedTimestringPublication date (ISO 8601)
keywordsarrayKeywords and tags
imagestringFeatured image URL
canonicalUrlstringCanonical URL

GET /schema

Generate JSON schemas by analyzing web page structure with AI.

Parameters

ParameterTypeRequiredDescription
urlstringYesTarget URL to analyze
instructionsstringNoCustom analysis instructions
nocachebooleanNoSkip cached schema (default: false)

Example Request

curl -X GET "https://api.tabstack.ai/schema?url=https://store.example.com/product/123&instructions=Focus%20on%20product%20information%20and%20pricing" \
-H "Authorization: Bearer YOUR_API_KEY"

Example Response

{
"type": "object",
"properties": {
"product_name": {
"type": "string",
"description": "Product title"
},
"price": {
"type": "object",
"properties": {
"current_price": {"type": "number"},
"original_price": {"type": "number"},
"currency": {"type": "string"}
}
},
"availability": {
"type": "string",
"description": "Stock status"
},
"reviews": {
"type": "object",
"properties": {
"average_rating": {"type": "number"},
"review_count": {"type": "integer"}
}
},
"features": {
"type": "array",
"items": {"type": "string"},
"description": "Product features list"
}
},
"required": ["product_name", "price", "availability"]
}

Schema Generation Tips

  • Use instructions parameter to focus on specific data types
  • Generated schemas follow JSON Schema specification
  • Schemas can be used directly with /json endpoint
  • More specific instructions yield better schemas

POST /json

Extract structured data from web pages using custom JSON schemas.

Request Body

FieldTypeRequiredDescription
urlstringYesTarget URL
json_schemaobjectYesJSON schema for extraction
nocachebooleanNoSkip cached data (default: false)

Schema Requirements

The json_schema parameter must follow OpenAI Structured Outputs specifications:

Required Rules
  1. Root level must be an object { "type": "object", ... }
  2. All properties must be required "required": ["title", "author", "date"]
  3. All objects must forbid additional properties "additionalProperties": false
  4. Only simple types allowed: string, number, boolean, array, object
  5. No format specifiers (e.g., format: "uri", format: "date")
Best Practices
  • Add description fields to each property
  • Arrays should include maxItems (≤40 recommended)
  • Keep nesting depth under 5 levels
  • Nested objects also need required and additionalProperties: false
Example Schema
{
"url": "https://example.com/article",
"json_schema": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Article title"
},
"author": {
"type": "string",
"description": "Author name"
},
"tags": {
"type": "array",
"items": { "type": "string" },
"maxItems": 20,
"description": "Article tags"
}
},
"required": ["title", "author", "tags"],
"additionalProperties": false
}
}

Example Request

curl -X POST "https://api.tabstack.ai/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"json_schema": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Article title"
},
"author": {
"type": "string",
"description": "Author name"
},
"tags": {
"type": "array",
"items": { "type": "string" },
"maxItems": 20,
"description": "Article tags"
}
},
"required": ["title", "author", "tags"],
"additionalProperties": false
}
}'

Example Response

{
"headline": "Breaking: Major Technology Breakthrough Announced",
"author": "Sarah Johnson",
"publish_date": "2024-01-15",
"summary": "Researchers at Tech University have developed a new AI model that achieves unprecedented accuracy in natural language understanding tasks.",
"tags": ["technology", "AI", "research", "breakthrough"],
"reading_time": 5
}

Schema Design Best Practices

  • Use descriptive property names
  • Include required fields for essential data
  • Use appropriate data types (string, number, boolean, array, object)
  • Provide descriptions for complex properties
  • Test schemas with /schema endpoint first

POST /transform

Transform and process web content using AI with custom instructions and schemas.

Request Body

FieldTypeRequiredDescription
urlstringYesTarget URL
instructionsstringYesProcessing instructions
json_schemaobjectNoOutput format schema
nocachebooleanNoSkip cached results (default: false)

Example Request

curl -X POST "https://api.tabstack.ai/transform" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://research.example.com/paper",
"instructions": "Analyze this research paper and extract key findings, methodology, and implications for practical applications.",
"json_schema": {
"type": "object",
"properties": {
"research_summary": {"type": "string"},
"key_findings": {
"type": "array",
"items": {"type": "string"}
},
"methodology": {"type": "string"},
"practical_applications": {
"type": "array",
"items": {"type": "string"}
},
"limitations": {"type": "string"},
"future_research": {"type": "string"}
},
"required": ["research_summary", "key_findings", "methodology"]
}
}'

Example Response

{
"research_summary": "This study investigates the effectiveness of transformer models in multilingual text classification tasks across 12 languages.",
"key_findings": [
"Transformer models outperform traditional approaches by 15-20% across all tested languages",
"Performance degrades significantly for low-resource languages with <10k training examples",
"Cross-lingual transfer learning improves results by up to 25% for related language families"
],
"methodology": "The researchers used a dataset of 500k labeled examples across 12 languages, employing BERT and XLM-R models with fine-tuning approaches.",
"practical_applications": [
"Multilingual customer support systems",
"Cross-border content moderation",
"International market sentiment analysis"
],
"limitations": "The study was limited to text classification tasks and did not explore generative capabilities.",
"future_research": "Future work should investigate multilingual generation tasks and the development of more efficient cross-lingual architectures."
}

Instruction Writing Tips

  • Be specific about desired output format
  • Include examples when helpful
  • Specify analysis depth and focus areas
  • Use clear, actionable language
  • Test instructions iteratively for best results

POST /automate

Execute AI-powered web automation tasks using natural language. This endpoint always streams responses using Server-Sent Events (SSE).

Request Body

FieldTypeRequiredDescription
taskstringYesNatural language task description
urlstringNoStarting URL for the task
dataobjectNoJSON data for form filling or context
guardrailsstringNoSafety constraints for execution
maxIterationsnumberNoMaximum task iterations (1-100, default: 50)
maxValidationAttemptsnumberNoMaximum validation attempts (1-10, default: 3)

Use Cases

  • Web scraping and data extraction
  • Form filling and interaction
  • Navigation and information gathering
  • Multi-step web workflows
  • Content analysis from web pages

Example Request

curl -X POST "https://api.tabstack.ai/automate" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"task": "find product information and pricing",
"url": "https://example-store.com/product/123",
"guardrails": "extract only product details, don'\''t add to cart"
}'

Streaming Response Format

The /automate endpoint always returns a streaming response using Server-Sent Events (SSE). The response content type is text/event-stream.

Event Format:

event: <event_type>
data: <JSON_data>

Each event starts with event: followed by the event type, then data: with JSON payload, separated by empty lines.

Event Types

Task Events:

  • start - Task initialization
  • task:setup - Task configuration
  • task:started - Task execution begins
  • task:completed - Task finished successfully
  • task:aborted - Task was terminated
  • task:validated - Task completion validation
  • task:validation_error - Validation failed

Agent Events:

  • agent:processing - Agent thinking/planning
  • agent:status - Status updates and plans
  • agent:step - Processing step iterations
  • agent:action - Actions being performed
  • agent:reasoned - Agent reasoning output
  • agent:extracted - Data extraction results
  • agent:waiting - Agent waiting for operations

Browser Events:

  • browser:navigated - Page navigation events
  • browser:action_started - Browser action initiated
  • browser:action_completed - Browser action finished
  • browser:screenshot_captured - Screenshot taken

System Events:

  • system:debug_compression - Debug compression info
  • system:debug_message - Debug messages

Stream Control:

  • complete - End of stream with results
  • done - Stream termination
  • error - Error occurred

Example Streaming Response

event: start
data: {"task": "what is the temperature in Tokyo?", "url": "https://weather.com"}

event: agent:processing
data: {"operation": "Creating task plan", "hasScreenshot": false}

event: agent:status
data: {"message": "Task plan created", "plan": "Navigate to weather site and search for Tokyo temperature"}

event: browser:navigated
data: {"title": "Weather.com", "url": "https://weather.com"}

event: task:started
data: {"task": "what is the temperature in Tokyo?", "url": "https://weather.com"}

event: agent:step
data: {"iterationId": "abc123", "currentIteration": 0}

event: agent:action
data: {"action": "fill_and_enter", "ref": "search", "value": "Tokyo"}

event: browser:action_completed
data: {"success": true, "action": "fill_and_enter"}

event: agent:extracted
data: {"extractedData": "Temperature: 23°C (73°F)"}

event: task:completed
data: {"success": true, "finalAnswer": "Current temperature in Tokyo is 23°C (73°F)"}

event: complete
data: {"success": true, "result": {"finalAnswer": "Current temperature in Tokyo is 23°C (73°F)"}}

event: done
data: {}

Handling Streaming Responses

Python Example:

import requests
import json

def stream_automate(api_key, task, url=None, guardrails=None):
headers = {
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json'
}

payload = {'task': task}
if url:
payload['url'] = url
if guardrails:
payload['guardrails'] = guardrails

response = requests.post(
'https://api.tabstack.ai/automate',
headers=headers,
json=payload,
stream=True
)

for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('event:'):
event_type = line[6:].strip()
elif line.startswith('data:'):
data = json.loads(line[5:].strip())
print(f"[{event_type}] {data}")

# Usage
stream_automate(
api_key='YOUR_API_KEY',
task='find the weather in Tokyo',
url='https://weather.com',
guardrails='browse only, no purchases'
)

JavaScript/Node.js Example:

async function streamAutomate(apiKey, task, options = {}) {
const response = await fetch('https://api.tabstack.ai/automate', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
task,
...options
})
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
let currentEvent = '';

while (true) {
const { done, value } = await reader.read();
if (done) break;

buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';

for (const line of lines) {
if (line.startsWith('event:')) {
currentEvent = line.substring(6).trim();
} else if (line.startsWith('data:')) {
const data = JSON.parse(line.substring(5).trim());
console.log(`[${currentEvent}]`, data);
}
}
}
}

// Usage
streamAutomate('YOUR_API_KEY', 'find the weather in Tokyo', {
url: 'https://weather.com',
guardrails: 'browse only, no purchases'
});

Additional Examples

Form Filling:

{
"task": "submit the contact form with my information",
"url": "https://company.com/contact",
"data": {
"name": "John Doe",
"email": "john@example.com",
"message": "Interested in your services"
}
}

Complex Navigation:

{
"task": "search for flights and compare prices",
"url": "https://kayak.com",
"data": {
"from": "NYC",
"to": "LAX",
"date": "2024-12-25"
},
"guardrails": "search and compare only, don't book anything",
"maxIterations": 75
}

Error Responses

400 Bad Request:

{
"error": "task is required"
}

401 Unauthorized:

{
"error": "Unauthorized - Invalid token"
}

500 Internal Server Error:

{
"error": "failed to call automate server"
}

503 Service Unavailable:

{
"error": "automate service not available"
}

Best Practices

  • Be specific with tasks: Clearly describe what you want the automation to accomplish
  • Use guardrails: Set safety constraints to prevent unintended actions
  • Handle streaming events: Process events in real-time to track progress
  • Set appropriate limits: Adjust maxIterations based on task complexity
  • Provide context data: Include relevant data for form filling or search queries
  • Monitor for errors: Watch for error events and handle them gracefully

Error Handling

Common Error Scenarios

URL Not Accessible (404)

{
"error": "failed to fetch url"
}

Invalid JSON Schema (400)

{
"error": "invalid json schema format"
}

Rate Limit Exceeded (429)

{
"error": "rate limit exceeded"
}

Authentication Failed (401)

{
"error": "unauthorized"
}

Retry Logic Recommendations

  • Implement exponential backoff for rate limit errors
  • Retry failed requests up to 3 times
  • Handle timeout errors gracefully
  • Cache successful responses when appropriate

Error Monitoring

Monitor these metrics for production systems:

  • Error rate by endpoint
  • Response time percentiles
  • Rate limit hit frequency
  • Failed URL patterns

Performance Optimization

Caching Strategy

  • Use caching (nocache=false) for stable content
  • Skip cache (nocache=true) for frequently updated pages
  • Implement client-side caching for repeated requests
  • Monitor cache hit rates for optimization

Request Batching

For processing multiple URLs:

import asyncio
import aiohttp

async def batch_process_urls(api_key, urls, endpoint='markdown'):
async with aiohttp.ClientSession() as session:
tasks = []

for url in urls:
task = process_single_url(session, api_key, url, endpoint)
tasks.append(task)

results = await asyncio.gather(*tasks, return_exceptions=True)
return results

async def process_single_url(session, api_key, url, endpoint):
headers = {"Authorization": f"Bearer {api_key}"}
params = {"url": url}

async with session.get(f"https://api.tabstack.ai/{endpoint}",
headers=headers, params=params) as response:
return await response.json()

Monitoring and Alerting

Track these metrics for production use:

  • Response Times: Monitor 95th percentile latency
  • Error Rates: Track 4xx and 5xx responses
  • Rate Limit Usage: Monitor approaching limits
  • Cache Hit Rates: Optimize caching strategy

For more examples and advanced usage patterns, see our Research Assistant Example.