API Reference
Complete technical reference for all TABS API endpoints. This guide provides detailed documentation for each endpoint, including parameters, response formats, and usage examples.
Base URL and Authentication
Base URL: https://api.tabstack.ai
Authentication: Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Common Error Responses
All endpoints return consistent error response format:
{
"error": "Error description"
}
HTTP Status Codes:
400
- Bad Request (invalid parameters)401
- Unauthorized (invalid API key)404
- Not Found (URL not accessible)429
- Rate Limit Exceeded500
- Internal Server Error
Endpoints
GET /fetch
Retrieve raw HTML content from any URL with intelligent fetching strategies.
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url | string | Yes | Target URL to fetch |
Example Request
curl -X GET "https://api.tabstack.ai/fetch?url=https://example.com/article" \
-H "Authorization: Bearer YOUR_API_KEY"
Example Response
{
"url": "https://example.com/article",
"statusCode": 200,
"body": "<!DOCTYPE html><html><head><title>Article Title</title>...",
"headers": {
"content-type": "text/html; charset=UTF-8",
"server": "nginx/1.18.0",
"last-modified": "Wed, 15 Jan 2024 10:30:00 GMT"
}
}
Response Fields
Field | Type | Description |
---|---|---|
url | string | Originally requested URL |
statusCode | integer | HTTP response status code |
body | string | Raw HTML content |
headers | object | HTTP response headers |
GET /markdown
Convert HTML content to clean, structured markdown perfect for LLM processing.
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url | string | Yes | Target URL to convert |
metadata | boolean | No | Include metadata extraction (default: false) |
nocache | boolean | No | Skip cached content (default: false) |
Example Request
curl -X GET "https://api.tabstack.ai/markdown?url=https://blog.example.com/post&metadata=true" \
-H "Authorization: Bearer YOUR_API_KEY"
Example Response
{
"url": "https://blog.example.com/post",
"content": "# Article Title\n\nThis is the main content converted to clean markdown...\n\n## Section Header\n\nMore content here with [links](https://example.com) preserved.",
"metadata": {
"title": "Article Title",
"description": "Article description from meta tags",
"author": "Jane Smith",
"publishedTime": "2024-01-15T10:30:00Z",
"keywords": ["technology", "AI", "development"],
"image": "https://blog.example.com/image.jpg",
"canonicalUrl": "https://blog.example.com/post"
}
}
Response Fields
Field | Type | Description |
---|---|---|
url | string | Source URL |
content | string | Clean markdown content |
metadata | object | Extracted metadata (if requested) |
Metadata Fields
Field | Type | Description |
---|---|---|
title | string | Page title |
description | string | Meta description |
author | string | Article author |
publishedTime | string | Publication date (ISO 8601) |
keywords | array | Keywords and tags |
image | string | Featured image URL |
canonicalUrl | string | Canonical URL |
GET /schema
Generate JSON schemas by analyzing web page structure with AI.
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url | string | Yes | Target URL to analyze |
instructions | string | No | Custom analysis instructions |
nocache | boolean | No | Skip cached schema (default: false) |
Example Request
curl -X GET "https://api.tabstack.ai/schema?url=https://store.example.com/product/123&instructions=Focus%20on%20product%20information%20and%20pricing" \
-H "Authorization: Bearer YOUR_API_KEY"
Example Response
{
"type": "object",
"properties": {
"product_name": {
"type": "string",
"description": "Product title"
},
"price": {
"type": "object",
"properties": {
"current_price": {"type": "number"},
"original_price": {"type": "number"},
"currency": {"type": "string"}
}
},
"availability": {
"type": "string",
"description": "Stock status"
},
"reviews": {
"type": "object",
"properties": {
"average_rating": {"type": "number"},
"review_count": {"type": "integer"}
}
},
"features": {
"type": "array",
"items": {"type": "string"},
"description": "Product features list"
}
},
"required": ["product_name", "price", "availability"]
}
Schema Generation Tips
- Use
instructions
parameter to focus on specific data types - Generated schemas follow JSON Schema specification
- Schemas can be used directly with
/json
endpoint - More specific instructions yield better schemas
POST /json
Extract structured data from web pages using custom JSON schemas.
Request Body
Field | Type | Required | Description |
---|---|---|---|
url | string | Yes | Target URL |
json_schema | object | Yes | JSON schema for extraction |
nocache | boolean | No | Skip cached data (default: false) |
Schema Requirements
The json_schema
parameter must follow OpenAI Structured Outputs specifications:
Required Rules
- Root level must be an object
{ "type": "object", ... }
- All properties must be required
"required": ["title", "author", "date"]
- All objects must forbid additional properties
"additionalProperties": false
- Only simple types allowed:
string
,number
,boolean
,array
,object
- No format specifiers (e.g., format: "uri", format: "date")
Best Practices
- Add description fields to each property
- Arrays should include maxItems (≤40 recommended)
- Keep nesting depth under 5 levels
- Nested objects also need required and additionalProperties: false
Example Schema
{
"url": "https://example.com/article",
"json_schema": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Article title"
},
"author": {
"type": "string",
"description": "Author name"
},
"tags": {
"type": "array",
"items": { "type": "string" },
"maxItems": 20,
"description": "Article tags"
}
},
"required": ["title", "author", "tags"],
"additionalProperties": false
}
}
Example Request
curl -X POST "https://api.tabstack.ai/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"json_schema": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Article title"
},
"author": {
"type": "string",
"description": "Author name"
},
"tags": {
"type": "array",
"items": { "type": "string" },
"maxItems": 20,
"description": "Article tags"
}
},
"required": ["title", "author", "tags"],
"additionalProperties": false
}
}'
Example Response
{
"headline": "Breaking: Major Technology Breakthrough Announced",
"author": "Sarah Johnson",
"publish_date": "2024-01-15",
"summary": "Researchers at Tech University have developed a new AI model that achieves unprecedented accuracy in natural language understanding tasks.",
"tags": ["technology", "AI", "research", "breakthrough"],
"reading_time": 5
}
Schema Design Best Practices
- Use descriptive property names
- Include
required
fields for essential data - Use appropriate data types (
string
,number
,boolean
,array
,object
) - Provide descriptions for complex properties
- Test schemas with
/schema
endpoint first
POST /transform
Transform and process web content using AI with custom instructions and schemas.
Request Body
Field | Type | Required | Description |
---|---|---|---|
url | string | Yes | Target URL |
instructions | string | Yes | Processing instructions |
json_schema | object | No | Output format schema |
nocache | boolean | No | Skip cached results (default: false) |
Example Request
curl -X POST "https://api.tabstack.ai/transform" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://research.example.com/paper",
"instructions": "Analyze this research paper and extract key findings, methodology, and implications for practical applications.",
"json_schema": {
"type": "object",
"properties": {
"research_summary": {"type": "string"},
"key_findings": {
"type": "array",
"items": {"type": "string"}
},
"methodology": {"type": "string"},
"practical_applications": {
"type": "array",
"items": {"type": "string"}
},
"limitations": {"type": "string"},
"future_research": {"type": "string"}
},
"required": ["research_summary", "key_findings", "methodology"]
}
}'
Example Response
{
"research_summary": "This study investigates the effectiveness of transformer models in multilingual text classification tasks across 12 languages.",
"key_findings": [
"Transformer models outperform traditional approaches by 15-20% across all tested languages",
"Performance degrades significantly for low-resource languages with <10k training examples",
"Cross-lingual transfer learning improves results by up to 25% for related language families"
],
"methodology": "The researchers used a dataset of 500k labeled examples across 12 languages, employing BERT and XLM-R models with fine-tuning approaches.",
"practical_applications": [
"Multilingual customer support systems",
"Cross-border content moderation",
"International market sentiment analysis"
],
"limitations": "The study was limited to text classification tasks and did not explore generative capabilities.",
"future_research": "Future work should investigate multilingual generation tasks and the development of more efficient cross-lingual architectures."
}
Instruction Writing Tips
- Be specific about desired output format
- Include examples when helpful
- Specify analysis depth and focus areas
- Use clear, actionable language
- Test instructions iteratively for best results
POST /automate
Execute AI-powered web automation tasks using natural language. This endpoint always streams responses using Server-Sent Events (SSE).
Request Body
Field | Type | Required | Description |
---|---|---|---|
task | string | Yes | Natural language task description |
url | string | No | Starting URL for the task |
data | object | No | JSON data for form filling or context |
guardrails | string | No | Safety constraints for execution |
maxIterations | number | No | Maximum task iterations (1-100, default: 50) |
maxValidationAttempts | number | No | Maximum validation attempts (1-10, default: 3) |
Use Cases
- Web scraping and data extraction
- Form filling and interaction
- Navigation and information gathering
- Multi-step web workflows
- Content analysis from web pages
Example Request
curl -X POST "https://api.tabstack.ai/automate" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"task": "find product information and pricing",
"url": "https://example-store.com/product/123",
"guardrails": "extract only product details, don'\''t add to cart"
}'
Streaming Response Format
The /automate
endpoint always returns a streaming response using Server-Sent Events (SSE). The response content type is text/event-stream
.
Event Format:
event: <event_type>
data: <JSON_data>
Each event starts with event:
followed by the event type, then data:
with JSON payload, separated by empty lines.
Event Types
Task Events:
start
- Task initializationtask:setup
- Task configurationtask:started
- Task execution beginstask:completed
- Task finished successfullytask:aborted
- Task was terminatedtask:validated
- Task completion validationtask:validation_error
- Validation failed
Agent Events:
agent:processing
- Agent thinking/planningagent:status
- Status updates and plansagent:step
- Processing step iterationsagent:action
- Actions being performedagent:reasoned
- Agent reasoning outputagent:extracted
- Data extraction resultsagent:waiting
- Agent waiting for operations
Browser Events:
browser:navigated
- Page navigation eventsbrowser:action_started
- Browser action initiatedbrowser:action_completed
- Browser action finishedbrowser:screenshot_captured
- Screenshot taken
System Events:
system:debug_compression
- Debug compression infosystem:debug_message
- Debug messages
Stream Control:
complete
- End of stream with resultsdone
- Stream terminationerror
- Error occurred
Example Streaming Response
event: start
data: {"task": "what is the temperature in Tokyo?", "url": "https://weather.com"}
event: agent:processing
data: {"operation": "Creating task plan", "hasScreenshot": false}
event: agent:status
data: {"message": "Task plan created", "plan": "Navigate to weather site and search for Tokyo temperature"}
event: browser:navigated
data: {"title": "Weather.com", "url": "https://weather.com"}
event: task:started
data: {"task": "what is the temperature in Tokyo?", "url": "https://weather.com"}
event: agent:step
data: {"iterationId": "abc123", "currentIteration": 0}
event: agent:action
data: {"action": "fill_and_enter", "ref": "search", "value": "Tokyo"}
event: browser:action_completed
data: {"success": true, "action": "fill_and_enter"}
event: agent:extracted
data: {"extractedData": "Temperature: 23°C (73°F)"}
event: task:completed
data: {"success": true, "finalAnswer": "Current temperature in Tokyo is 23°C (73°F)"}
event: complete
data: {"success": true, "result": {"finalAnswer": "Current temperature in Tokyo is 23°C (73°F)"}}
event: done
data: {}
Handling Streaming Responses
Python Example:
import requests
import json
def stream_automate(api_key, task, url=None, guardrails=None):
headers = {
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json'
}
payload = {'task': task}
if url:
payload['url'] = url
if guardrails:
payload['guardrails'] = guardrails
response = requests.post(
'https://api.tabstack.ai/automate',
headers=headers,
json=payload,
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('event:'):
event_type = line[6:].strip()
elif line.startswith('data:'):
data = json.loads(line[5:].strip())
print(f"[{event_type}] {data}")
# Usage
stream_automate(
api_key='YOUR_API_KEY',
task='find the weather in Tokyo',
url='https://weather.com',
guardrails='browse only, no purchases'
)
JavaScript/Node.js Example:
async function streamAutomate(apiKey, task, options = {}) {
const response = await fetch('https://api.tabstack.ai/automate', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
task,
...options
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
let currentEvent = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (line.startsWith('event:')) {
currentEvent = line.substring(6).trim();
} else if (line.startsWith('data:')) {
const data = JSON.parse(line.substring(5).trim());
console.log(`[${currentEvent}]`, data);
}
}
}
}
// Usage
streamAutomate('YOUR_API_KEY', 'find the weather in Tokyo', {
url: 'https://weather.com',
guardrails: 'browse only, no purchases'
});
Additional Examples
Form Filling:
{
"task": "submit the contact form with my information",
"url": "https://company.com/contact",
"data": {
"name": "John Doe",
"email": "john@example.com",
"message": "Interested in your services"
}
}
Complex Navigation:
{
"task": "search for flights and compare prices",
"url": "https://kayak.com",
"data": {
"from": "NYC",
"to": "LAX",
"date": "2024-12-25"
},
"guardrails": "search and compare only, don't book anything",
"maxIterations": 75
}
Error Responses
400 Bad Request:
{
"error": "task is required"
}
401 Unauthorized:
{
"error": "Unauthorized - Invalid token"
}
500 Internal Server Error:
{
"error": "failed to call automate server"
}
503 Service Unavailable:
{
"error": "automate service not available"
}
Best Practices
- Be specific with tasks: Clearly describe what you want the automation to accomplish
- Use guardrails: Set safety constraints to prevent unintended actions
- Handle streaming events: Process events in real-time to track progress
- Set appropriate limits: Adjust
maxIterations
based on task complexity - Provide context data: Include relevant data for form filling or search queries
- Monitor for errors: Watch for
error
events and handle them gracefully
Error Handling
Common Error Scenarios
URL Not Accessible (404)
{
"error": "failed to fetch url"
}
Invalid JSON Schema (400)
{
"error": "invalid json schema format"
}
Rate Limit Exceeded (429)
{
"error": "rate limit exceeded"
}
Authentication Failed (401)
{
"error": "unauthorized"
}
Retry Logic Recommendations
- Implement exponential backoff for rate limit errors
- Retry failed requests up to 3 times
- Handle timeout errors gracefully
- Cache successful responses when appropriate
Error Monitoring
Monitor these metrics for production systems:
- Error rate by endpoint
- Response time percentiles
- Rate limit hit frequency
- Failed URL patterns
Performance Optimization
Caching Strategy
- Use caching (
nocache=false
) for stable content - Skip cache (
nocache=true
) for frequently updated pages - Implement client-side caching for repeated requests
- Monitor cache hit rates for optimization
Request Batching
For processing multiple URLs:
import asyncio
import aiohttp
async def batch_process_urls(api_key, urls, endpoint='markdown'):
async with aiohttp.ClientSession() as session:
tasks = []
for url in urls:
task = process_single_url(session, api_key, url, endpoint)
tasks.append(task)
results = await asyncio.gather(*tasks, return_exceptions=True)
return results
async def process_single_url(session, api_key, url, endpoint):
headers = {"Authorization": f"Bearer {api_key}"}
params = {"url": url}
async with session.get(f"https://api.tabstack.ai/{endpoint}",
headers=headers, params=params) as response:
return await response.json()
Monitoring and Alerting
Track these metrics for production use:
- Response Times: Monitor 95th percentile latency
- Error Rates: Track 4xx and 5xx responses
- Rate Limit Usage: Monitor approaching limits
- Cache Hit Rates: Optimize caching strategy
For more examples and advanced usage patterns, see our Research Assistant Example.