Research Assistant Agent

Complexity: Beginner

Estimated time: 2-3 hours

Key TABS APIs: /automate, /markdown, /transform

Imagine having a personal research assistant that can instantly gather information from across the web, analyze it, and present you with a comprehensive report—complete with sources and confidence scores. That's exactly what we'll build in this guide using the TABS API.

The Story Behind This Project

You're tasked with researching a complex topic for a presentation, report, or decision. Instead of spending hours browsing through dozens of websites, reading through irrelevant content, and manually synthesizing information, you want an intelligent agent that does the heavy lifting for you. This research assistant will be your digital scholar, capable of understanding your query, finding authoritative sources, and delivering insights you can trust.

What Makes This Assistant Special

Unlike simple search tools, our research assistant will:

Stream live progress events as it works, so you can see exactly what's happening
Intelligently discover and evaluate sources using real-time web automation
Extract clean, readable content from messy web pages
Synthesize information across multiple sources to identify patterns and consensus
Generate professional research reports with proper citations
Provide confidence scores so you know how much to trust the findings
Handle complex, multi-faceted research questions with sophisticated analysis
Suggest related topics for deeper exploration

How It Works: The Research Pipeline

Think of our research assistant as working through the same process a human researcher would follow:

Ask Question → Find Sources → Read Content → Analyze & Synthesize → Write Report

Discovery Phase: We use AI automation to search the web and find the best sources with live streaming events
Extraction Phase: We carefully read and clean the content from each source showing real-time progress
Analysis Phase: We identify key themes, patterns, and insights across all sources with status updates
Synthesis Phase: We combine everything into a coherent, comprehensive report with final metrics

Each phase streams detailed progress information so you can watch your research assistant work in real-time!

Project Setup

Let's start by setting up a proper Node.js project with a clean structure. This will make our code maintainable and easy to understand.

Prerequisites

Node.js 20+ installed
A TABS API key (get one here)
Basic command-line knowledge

Step 1: Initialize Your Project

# Create a new directory for your project
mkdir research-assistant
cd research-assistant

# Initialize a new Node.js project
npm init -y

# Install required dependencies
npm install node-fetch dotenv

# Install development dependencies (optional but recommended)
npm install --save-dev nodemon jest

Step 2: Create Your Project Structure

Organize your project with this clean folder structure:

research-assistant/
├── src/
│   ├── core/
│   │   ├── ResearchAssistant.js    # Main assistant class
│   │   ├── SourceDiscovery.js      # Source finding logic
│   │   ├── ContentExtractor.js     # Content extraction logic
│   │   └── Synthesizer.js          # Analysis & synthesis logic
│   ├── models/
│   │   ├── Source.js                # Source data model
│   │   └── ResearchReport.js       # Report data model
│   ├── utils/
│   │   ├── credibility.js          # Source credibility scoring
│   │   ├── retry.js                # Retry logic for API calls
│   │   └── cache.js                # Caching utilities
│   └── config/
│       └── constants.js            # Configuration constants
├── examples/
│   └── basicResearch.js            # Example usage
├── tests/
│   └── ResearchAssistant.test.js   # Test suite
├── .env                             # Environment variables
├── .gitignore                       # Git ignore file
├── package.json                     # Project dependencies
└── README.md                        # Project documentation

Step 3: Set Up Environment Variables

Create a .env file in your project root:

# .env
TABS_API_KEY=your_api_key_here
TABS_API_URL=https://api.tabstack.ai
NODE_ENV=development
LOG_LEVEL=info

Create a .gitignore file to keep your secrets safe:

# .gitignore
node_modules/
.env
*.log
.DS_Store
dist/
coverage/
*.md
!README.md

Step 4: Update package.json Scripts

Add helpful scripts to your package.json:

{
  ...
  "scripts": {
    "start": "node examples/basicResearch.js",
    "dev": "nodemon examples/basicResearch.js",
    "test": "jest",
    "test:watch": "jest --watch"
  },
  ...
}

Building Your Research Assistant

Step 1: Configuration and Constants

Every good application starts with configuration. This is where we define our API endpoints, trust signals for evaluating sources, and other settings that might change between development and production. Think of this as the control panel for your research assistant.

Create the file src/config/constants.js and add the following code to it.

// src/config/constants.js
import dotenv from 'dotenv';
dotenv.config();

export const CONFIG = {
    API: {
        KEY: process.env.TABS_API_KEY,
        BASE_URL: process.env.TABS_API_URL || 'https://api.tabstack.ai',
        TIMEOUT: 30000,
        RETRY_ATTEMPTS: 3
    },
    RESEARCH: {
        DEFAULT_SOURCE_COUNT: 5,
        MIN_CONTENT_WORDS: 100,
        MAX_CONTENT_LENGTH: 50000,
        CACHE_DURATION: 3600000 // 1 hour
    },
    CREDIBILITY: {
        HIGH_TRUST_DOMAINS: ['.edu', '.gov', '.org', 'scholar.google', 'pubmed', 'nature.com'],
        MEDIUM_TRUST_DOMAINS: ['.com', '.net', 'wikipedia.org'],
        LOW_TRUST_DOMAINS: ['blog', 'medium.com', 'personal']
    },
    LOGGING: {
        LEVEL: process.env.LOG_LEVEL || 'info',
        COLORS: {
            info: '\x1b[36m',
            success: '\x1b[32m',
            warning: '\x1b[33m',
            error: '\x1b[31m',
            reset: '\x1b[0m'
        }
    }
};

export default CONFIG;

Step 2: Data Models - Defining Our Data Structures

Before we start fetching and processing data, we need to define what our data looks like. In the world of research, we work with two main entities: Sources (the web pages we analyze) and Research Reports (the final output). These models act as blueprints that ensure our data stays organized and consistent throughout the entire research process.

Go ahead and create the src/models/Source.js file and add the below code to it.

// src/models/Source.js
export class Source {
    constructor(data) {
        this.url = data.url;
        this.title = data.title;
        this.content = data.content || '';
        this.confidence = data.confidence || 0.5;
        this.metadata = {
            author: data.metadata?.author,
            publishedDate: data.metadata?.publishedDate,
            description: data.metadata?.description,
            wordCount: data.metadata?.wordCount || this.countWords(this.content),
            extractedAt: data.metadata?.extractedAt || new Date().toISOString(),
            ...data.metadata
        };
    }

    countWords(text) {
        return text ? text.split(/\s+/).filter(word => word.length > 0).length : 0;
    }

    isValid() {
        return this.url && this.content && this.metadata.wordCount > 50;
    }

    toJSON() {
        return {
            url: this.url,
            title: this.title,
            confidence: this.confidence,
            metadata: this.metadata,
            contentPreview: this.content.substring(0, 200) + '...'
        };
    }
}

export default Source;

This model represents a single source of information - a web page that we've discovered and analyzed. It knows how to validate itself (is there enough content?) and calculate metrics (like word count).

Now let's create the src/models/ResearchReport.js file and add the below code to it.

// src/models/ResearchReport.js
export class ResearchReport {
    constructor(data) {
        this.query = data.query;
        this.executiveSummary = data.executiveSummary;
        this.detailedAnalysis = data.detailedAnalysis;
        this.sources = data.sources || [];
        this.confidenceScore = data.confidenceScore || 0;
        this.keyFindings = data.keyFindings || [];
        this.consensusPoints = data.consensusPoints || [];
        this.conflictingViews = data.conflictingViews || [];
        this.knowledgeGaps = data.knowledgeGaps || [];
        this.relatedTopics = data.relatedTopics || [];
        this.metadata = {
            generatedAt: new Date().toISOString(),
            processingTime: data.processingTime,
            sourceCount: this.sources.length,
            version: '1.0.0'
        };
    }

    getConfidenceLevel() {
        if (this.confidenceScore > 0.8) return 'HIGH';
        if (this.confidenceScore > 0.6) return 'MEDIUM';
        return 'LOW';
    }

    toMarkdown() {
        const confidenceEmoji = this.confidenceScore > 0.8 ? '🔵' :
                               this.confidenceScore > 0.6 ? '🟡' : '🔴';

        return `# Research Report: ${this.query}

*Generated: ${this.metadata.generatedAt}*
*Confidence: ${confidenceEmoji} ${(this.confidenceScore * 100).toFixed(0)}%*

## Executive Summary

${this.executiveSummary}

## Key Findings

${this.keyFindings.map(f => `- ${f}`).join('\n')}

## Detailed Analysis

${this.detailedAnalysis}

## Sources (${this.sources.length})

${this.sources.map((s, i) => `${i+1}. [${s.title}](${s.url}) - Confidence: ${(s.confidence*100).toFixed(0)}%`).join('\n')}
`;
    }

    toJSON() {
        return {
            query: this.query,
            confidenceScore: this.confidenceScore,
            confidenceLevel: this.getConfidenceLevel(),
            keyFindings: this.keyFindings,
            sourcesUsed: this.sources.length,
            metadata: this.metadata
        };
    }
}

export default ResearchReport;

This is our final product - the comprehensive report that brings together all our findings. It includes the executive summary, detailed analysis, confidence scores, and all the sources we used. The toMarkdown() method is particularly useful as it converts our structured data into a beautiful, readable document.

Step 3: Utility Functions - The Helper Tools

Before diving into the main logic, let's build some utility functions that we'll use throughout our application. These are like the Swiss Army knife of our research assistant - small, focused tools that do one thing well.

The first utility we'll create is our "trust calculator" - it evaluates how credible a source is based on its domain. Just like a human researcher would trust a .edu site from a university more than a random blog, our assistant uses these signals to prioritize high-quality sources.

Go ahead and create the src/utils/credibility.js file and add the following code to it.

// src/utils/credibility.js
import { CONFIG } from '../config/constants.js';

export function assessSourceCredibility(url) {
    let confidence = 0.5; // Start with neutral confidence

    // Check for high-trust signals
    for (const signal of CONFIG.CREDIBILITY.HIGH_TRUST_DOMAINS) {
        if (url.includes(signal)) {
            confidence = 0.9;
            break;
        }
    }

    // Check for medium-trust signals
    if (confidence === 0.5) {
        for (const signal of CONFIG.CREDIBILITY.MEDIUM_TRUST_DOMAINS) {
            if (url.includes(signal)) {
                confidence = 0.7;
                break;
            }
        }
    }

    // Check for low-trust signals
    for (const signal of CONFIG.CREDIBILITY.LOW_TRUST_DOMAINS) {
        if (url.includes(signal)) {
            confidence = Math.min(confidence, 0.4);
        }
    }

    // Additional factors
    if (url.includes('https://')) confidence += 0.05;
    if (url.includes('2024') || url.includes('2023')) confidence += 0.05;

    return Math.min(confidence, 1.0);
}

export function calculateOverallConfidence(sources, analysis) {
    if (!sources || sources.length === 0) return 0;

    // Factor 1: Source quality (40% weight)
    const avgSourceConfidence = sources.reduce((sum, s) => sum + s.confidence, 0) / sources.length;
    let score = avgSourceConfidence * 0.4;

    // Factor 2: Source agreement (30% weight)
    if (analysis) {
        const consensusRatio = analysis.consensusPoints?.length /
            (analysis.consensusPoints?.length + analysis.conflictingViews?.length || 1);
        score += consensusRatio * 0.3;
    }

    // Factor 3: Number of sources (20% weight)
    const sourceCountScore = Math.min(sources.length / 5, 1);
    score += sourceCountScore * 0.2;

    // Factor 4: Content completeness (10% weight)
    const avgWordCount = sources.reduce((sum, s) => sum + (s.metadata?.wordCount || 0), 0) / sources.length;
    const contentScore = Math.min(avgWordCount / 1000, 1);
    score += contentScore * 0.1;

    return Math.round(score * 100) / 100;
}

Great, now let's build a retry utility.

The internet isn't always reliable. Sometimes APIs are temporarily down, networks have hiccups, or rate limits kick in. This utility implements "exponential backoff" - if something fails, we wait a bit and try again, waiting longer each time. It's like knocking on a door: first gently, then a bit louder, before giving up.

Go ahead and create the src/utils/retry.js file and add the following code to it.

// src/utils/retry.js
import { CONFIG } from '../config/constants.js';

export async function withRetry(fn, options = {}) {
    const maxAttempts = options.maxAttempts || CONFIG.API.RETRY_ATTEMPTS;
    const baseDelay = options.baseDelay || 1000;
    const maxDelay = options.maxDelay || 10000;

    let lastError;

    for (let attempt = 0; attempt < maxAttempts; attempt++) {
        try {
            return await fn();
        } catch (error) {
            lastError = error;

            // Don't retry on client errors (4xx)
            if (error.status && error.status >= 400 && error.status < 500) {
                throw error;
            }

            if (attempt < maxAttempts - 1) {
                const delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
                console.log(`⏳ Retry ${attempt + 1}/${maxAttempts} in ${delay}ms...`);
                await sleep(delay);
            }
        }
    }

    throw new Error(`Failed after ${maxAttempts} attempts: ${lastError.message}`);
}

export function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}

Step 4: Core Components - The Brain of Our Assistant

Now we get to the heart of our research assistant. These three components work together like a team of researchers: one finds sources, another reads them, and the third analyzes everything to create insights.

The code below is our "scout" - it goes out into the web and finds relevant sources for our research question. It uses the TABS /automate endpoint to perform intelligent web searches with real-time streaming events. You'll see live updates as the AI navigates Google, visits pages, and extracts source information.

Go ahead and create the src/core/SourceDiscovery.js file and add the following code to it.

// src/core/SourceDiscovery.js
import fetch from 'node-fetch';
import { CONFIG } from '../config/constants.js';
import { assessSourceCredibility } from '../utils/credibility.js';
import { withRetry } from '../utils/retry.js';

export class SourceDiscovery {
    constructor(apiKey, eventCallback = null) {
        this.apiKey = apiKey;
        this.baseUrl = CONFIG.API.BASE_URL;
        this.headers = {
            'Authorization': `Bearer ${apiKey}`,
            'Content-Type': 'application/json'
        };
        this.emitEvent = eventCallback || (() => {});
    }

    async discoverSources(query, numSources = CONFIG.RESEARCH.DEFAULT_SOURCE_COUNT) {
        this.emitEvent('discovery:started', {
            message: 'Starting AI-powered source discovery',
            query,
            numSources
        });

        console.log('🌐 Using AI automation to find relevant sources...');
        console.log(`📋 Query: "${query}"`);
        console.log(`🔢 Requesting ${numSources} sources`);

        const searchInstructions = this.buildSearchInstructions(query, numSources);

        this.emitEvent('discovery:instructions_generated', {
            message: `Search instructions generated (${searchInstructions.length} chars)`,
            instructionsLength: searchInstructions.length
        });

        const requestPayload = {
            task: searchInstructions,
            json_schema: this.getSourceSchema(),
            timeout: 60
        };

        this.emitEvent('discovery:api_request', {
            message: 'Sending request to TabStack /automate endpoint',
            timeout: requestPayload.timeout
        });

        const result = await withRetry(async () => {
            const response = await fetch(`${this.baseUrl}/automate`, {
                method: 'POST',
                headers: this.headers,
                body: JSON.stringify(requestPayload)
            });

            this.emitEvent('discovery:api_response', {
                message: `API response received - Status: ${response.status}`,
                status: response.status,
                contentType: response.headers.get('content-type')
            });

            if (!response.ok) {
                this.emitEvent('discovery:api_error', {
                    message: `API error: ${response.statusText}`,
                    status: response.status
                });
                throw new Error(`Source discovery failed: ${response.statusText}`);
            }

            return this.processStreamingResponse(response);
        });

        const processedSources = this.processSources(result, numSources);

        this.emitEvent('discovery:completed', {
            message: `Source discovery completed - found ${processedSources.length} sources`,
            selectedCount: processedSources.length,
            sources: processedSources.map(s => ({
                title: s.title,
                url: s.url,
                confidence: s.confidence
            }))
        });

        return processedSources;
    }

    async processStreamingResponse(response) {
        this.emitEvent('discovery:stream_started', {
            message: 'Processing streaming response from TabStack API'
        });

        const reader = response.body;
        let buffer = '';
        let eventCount = 0;
        let finalResult = null;

        // Process the SSE stream in real-time
        for await (const chunk of reader) {
            const chunkText = chunk.toString();
            buffer += chunkText;

            const lines = buffer.split('\n');
            buffer = lines.pop() || '';

            for (const line of lines) {
                if (line.startsWith('event:')) {
                    eventCount++;
                    const eventType = line.substring(6).trim();
                    this.emitEvent('discovery:stream_event', {
                        message: `TabStack event: ${eventType}`,
                        eventType,
                        eventCount
                    });

                } else if (line.startsWith('data: ')) {
                    const dataContent = line.substring(6).trim();
                    if (dataContent && dataContent !== '[DONE]') {
                        try {
                            const parsed = JSON.parse(dataContent);

                            if (parsed.finalAnswer) {
                                this.emitEvent('discovery:final_answer', {
                                    message: 'Received final answer from TabStack'
                                });

                                if (typeof parsed.finalAnswer === 'string') {
                                    try {
                                        finalResult = JSON.parse(parsed.finalAnswer);
                                        this.emitEvent('discovery:json_parsed', {
                                            message: 'Successfully parsed JSON from final answer',
                                            sourcesFound: finalResult.sources?.length || 0
                                        });
                                    } catch {
                                        finalResult = { sources: [] };
                                    }
                                } else {
                                    finalResult = parsed.finalAnswer;
                                }
                            } else if (parsed.message) {
                                this.emitEvent('discovery:stream_message', {
                                    message: `TabStack: ${parsed.message.substring(0, 100)}...`
                                });
                            }
                        } catch (e) {
                            // Continue processing
                        }
                    }
                }
            }
        }

        this.emitEvent('discovery:stream_completed', {
            message: `Stream processing completed - ${eventCount} events processed`,
            eventCount,
            hasResult: !!finalResult
        });

        if (!finalResult) {
            throw new Error('No valid result received from TabStack API');
        }

        return finalResult;
    }

    buildSearchInstructions(query, numSources) {
        return `
Search the web for ${numSources} high-quality, authoritative sources about "${query}".

Focus on finding sources from:
- Educational institutions (.edu domains)
- Government agencies (.gov domains)
- Reputable organizations (.org domains)
- Established news outlets and publications
- Academic journals and research papers
- Professional industry reports

For each source, gather:
- The complete URL
- The page title
- A brief description of the content
- Why this source is relevant to the query

Prioritize recent, credible sources with substantial content.
Avoid blog posts, social media, and obviously biased sources.
`;
    }

    getSourceSchema() {
        return {
            type: "object",
            properties: {
                sources: {
                    type: "array",
                    items: {
                        type: "object",
                        properties: {
                            url: {
                                type: "string",
                                description: "The complete URL of the source"
                            },
                            title: {
                                type: "string",
                                description: "The title of the webpage or article"
                            },
                            description: {
                                type: "string",
                                description: "Brief description of the content and why it's relevant"
                            },
                            source_type: {
                                type: "string",
                                description: "Type of source (academic, government, news, organization, etc.)"
                            }
                        },
                        required: ["url", "title", "description", "source_type"],
                        additionalProperties: false
                    },
                    maxItems: 10,
                    description: "List of relevant sources found"
                }
            },
            required: ["sources"],
            additionalProperties: false
        };
    }

    processSources(data, maxSources) {
        const sources = [];

        for (const source of data.sources || []) {
            sources.push({
                url: source.url,
                title: source.title,
                description: source.description,
                sourceType: source.source_type,
                confidence: assessSourceCredibility(source.url),
                discoveredAt: new Date().toISOString()
            });
        }

        // Sort by confidence and return top sources
        return sources
            .sort((a, b) => b.confidence - a.confidence)
            .slice(0, maxSources);
    }
}

export default SourceDiscovery;

Okay, now that we have our list of sources, the below component (ContentExtractor) acts as our "reader". It visits each URL and extracts the content using the TABS /markdown endpoint.

Go ahead and create the src/core/ContentExtractor.js file and add the following code to it.

// src/core/ContentExtractor.js
import fetch from 'node-fetch';
import { CONFIG } from '../config/constants.js';
import { Source } from '../models/Source.js';
import { withRetry } from '../utils/retry.js';

export class ContentExtractor {
    constructor(apiKey, eventCallback = null) {
        this.apiKey = apiKey;
        this.baseUrl = CONFIG.API.BASE_URL;
        this.headers = {
            'Authorization': `Bearer ${apiKey}`,
            'Content-Type': 'application/json'
        };
        this.emitEvent = eventCallback || (() => {});
    }

    async extractContent(sources) {
        this.emitEvent('extraction:started', {
            message: 'Starting content extraction',
            totalSources: sources.length
        });

        console.log('📚 Extracting content from sources...');

        const enrichedSources = [];
        const batchSize = 3; // Process in batches to avoid overwhelming the API

        for (let i = 0; i < sources.length; i += batchSize) {
            const batch = sources.slice(i, i + batchSize);
            const batchNumber = Math.floor(i / batchSize) + 1;
            const totalBatches = Math.ceil(sources.length / batchSize);

            this.emitEvent('extraction:batch_started', {
                message: `Processing batch ${batchNumber}/${totalBatches}`,
                batchNumber,
                totalBatches,
                batchSources: batch.map(s => s.title)
            });

            const promises = batch.map((source, index) =>
                this.extractSingleSource(source, i + index + 1, sources.length)
            );
            const results = await Promise.allSettled(promises);

            for (const result of results) {
                if (result.status === 'fulfilled' && result.value) {
                    enrichedSources.push(result.value);
                } else if (result.status === 'rejected') {
                    this.emitEvent('extraction:source_failed', {
                        message: `Failed to extract source: ${result.reason.message}`,
                        error: result.reason.message
                    });
                }
            }

            this.emitEvent('extraction:batch_completed', {
                message: `Batch ${batchNumber} completed`,
                batchNumber,
                successCount: results.filter(r => r.status === 'fulfilled' && r.value).length,
                totalInBatch: batch.length
            });
        }

        this.emitEvent('extraction:completed', {
            message: `Content extraction completed - ${enrichedSources.length}/${sources.length} sources processed`,
            successCount: enrichedSources.length,
            totalSources: sources.length
        });

        console.log(`✨ Extracted content from ${enrichedSources.length} sources`);
        return enrichedSources;
    }

    async extractSingleSource(sourceInfo, sourceIndex = 1, totalSources = 1) {
        try {
            this.emitEvent('extraction:source_started', {
                message: `Reading source ${sourceIndex}/${totalSources}: ${sourceInfo.title}`,
                sourceIndex,
                totalSources,
                title: sourceInfo.title,
                url: sourceInfo.url
            });

            console.log(`  📄 Reading: ${sourceInfo.title || sourceInfo.url}`);

            const data = await withRetry(async () => {
                this.emitEvent('extraction:api_request', {
                    message: `Making API request for: ${sourceInfo.title}`,
                    url: sourceInfo.url
                });

                const response = await fetch(
                    `${this.baseUrl}/markdown?` + new URLSearchParams({
                        url: sourceInfo.url,
                        metadata: 'true'
                    }),
                    { headers: this.headers }
                );

                if (!response.ok) {
                    throw new Error(`Failed to extract: ${response.statusText}`);
                }

                return response.json();
            });

            this.emitEvent('extraction:content_received', {
                message: `Content received for: ${sourceInfo.title}`,
                contentLength: data.content?.length || 0,
                hasMetadata: !!data.metadata
            });

            const source = new Source({
                url: sourceInfo.url,
                title: data.metadata?.title || sourceInfo.title,
                content: data.content,
                confidence: sourceInfo.confidence,
                metadata: {
                    ...data.metadata,
                    extractedAt: new Date().toISOString()
                }
            });

            // Validate source quality
            if (!source.isValid()) {
                this.emitEvent('extraction:source_invalid', {
                    message: `Skipping source due to insufficient content: ${sourceInfo.title}`,
                    title: sourceInfo.title,
                    contentLength: source.content?.length || 0
                });
                console.log(`  ⚠️ Skipping (insufficient content)`);
                return null;
            }

            this.emitEvent('extraction:source_completed', {
                message: `Successfully extracted: ${source.title}`,
                title: source.title,
                contentLength: source.content.length,
                confidence: source.confidence
            });

            return source;

        } catch (error) {
            this.emitEvent('extraction:source_error', {
                message: `Error extracting ${sourceInfo.title}: ${error.message}`,
                title: sourceInfo.title,
                error: error.message
            });
            console.error(`  ❌ Error: ${error.message}`);
            return null;
        }
    }
}

export default ContentExtractor;

This next piece is the Synthesizer. This is where the magic happens - our "analyst" that reads through all the sources and creates a coherent synthesis. Using the TABS /transform.

Go ahead and create the src/core/Synthesizer.js file and add the following code to it:

// src/core/Synthesizer.js
import fetch from 'node-fetch';
import { CONFIG } from '../config/constants.js';
import { calculateOverallConfidence } from '../utils/credibility.js';
import { withRetry } from '../utils/retry.js';

export class Synthesizer {
    constructor(apiKey, eventCallback = null) {
        this.apiKey = apiKey;
        this.baseUrl = CONFIG.API.BASE_URL;
        this.headers = {
            'Authorization': `Bearer ${apiKey}`,
            'Content-Type': 'application/json'
        };
        this.emitEvent = eventCallback || (() => {});
    }

    async synthesizeInformation(query, sources) {
        this.emitEvent('synthesis:started', {
            message: 'Starting information synthesis and analysis',
            query,
            sourceCount: sources.length
        });

        console.log('🤔 Analyzing and synthesizing information...');

        this.emitEvent('synthesis:preparing', {
            message: 'Preparing sources for analysis',
            sources: sources.map(s => ({ title: s.title, contentLength: s.content?.length || 0 }))
        });

        const sourcesSummary = this.prepareSourcesForAnalysis(sources);
        const schema = this.getAnalysisSchema();
        const instructions = this.getAnalysisInstructions(query, sourcesSummary);

        this.emitEvent('synthesis:api_request', {
            message: 'Sending analysis request to AI',
            instructionsLength: instructions.length,
            sourcesSummaryLength: sourcesSummary.length
        });

        const analysis = await withRetry(async () => {
            const response = await fetch(`${this.baseUrl}/transform`, {
                method: 'POST',
                headers: this.headers,
                body: JSON.stringify({
                    url: sources[0].url,
                    json_schema: schema,
                    instructions: instructions
                })
            });

            if (!response.ok) {
                this.emitEvent('synthesis:api_error', {
                    message: `API request failed: ${response.statusText}`,
                    status: response.status,
                    statusText: response.statusText
                });
                throw new Error(`Synthesis failed: ${response.statusText}`);
            }

            return response.json();
        });

        this.emitEvent('synthesis:analyzing', {
            message: 'Processing analysis results',
            hasExecutiveSummary: !!analysis.executiveSummary,
            keyFindingsCount: analysis.keyFindings?.length || 0,
            consensusPointsCount: analysis.consensusPoints?.length || 0
        });

        // Add calculated confidence score
        analysis.calculatedConfidence = calculateOverallConfidence(sources, analysis);

        this.emitEvent('synthesis:completed', {
            message: 'Information synthesis completed',
            confidenceScore: analysis.calculatedConfidence,
            keyFindingsCount: analysis.keyFindings?.length || 0,
            consensusPointsCount: analysis.consensusPoints?.length || 0,
            conflictingViewsCount: analysis.conflictingViews?.length || 0
        });

        console.log('✅ Synthesis complete');
        return analysis;
    }

    prepareSourcesForAnalysis(sources) {
        return sources.map((source, index) => {
            return `
Source ${index + 1}: ${source.title}
URL: ${source.url}
Confidence: ${(source.confidence * 100).toFixed(0)}%
Content:
${source.content}
`;
        }).join('\n---\n');
    }

    getAnalysisSchema() {
        return {
            type: "object",
            properties: {
                executiveSummary: {
                    type: "string",
                    description: "A clear, concise 2-3 paragraph summary"
                },
                detailedAnalysis: {
                    type: "string",
                    description: "In-depth analysis with examples and data"
                },
                keyFindings: {
                    type: "array",
                    items: { type: "string" },
                    maxItems: 10,
                    description: "Most important discoveries"
                },
                consensusPoints: {
                    type: "array",
                    items: { type: "string" },
                    maxItems: 5,
                    description: "Points where sources agree"
                },
                conflictingViews: {
                    type: "array",
                    items: { type: "string" },
                    maxItems: 5,
                    description: "Areas of disagreement"
                },
                knowledgeGaps: {
                    type: "array",
                    items: { type: "string" },
                    maxItems: 5,
                    description: "Unanswered questions"
                },
                relatedTopics: {
                    type: "array",
                    items: { type: "string" },
                    maxItems: 5,
                    description: "Topics for further research"
                }
            },
            required: [
                "executiveSummary", "detailedAnalysis", "keyFindings",
                "consensusPoints", "conflictingViews", "knowledgeGaps", "relatedTopics"
            ],
            additionalProperties: false
        };
    }

    getAnalysisInstructions(query, sourcesSummary) {
        return `
        You are an expert research analyst. Analyze these sources about "${query}".

        Your analysis should:
        1. Identify patterns across all sources
        2. Highlight consensus and contradictions
        3. Assess information quality and gaps
        4. Suggest related research topics
        5. Use specific examples and data

        Sources to analyze:
        ${sourcesSummary}
        `;
    }
}

export default Synthesizer;

Step 5: The Main Research Assistant - Bringing It All Together

This is the conductor of our orchestra - it coordinates all the components we've built to deliver a complete research experience. With streaming events enabled, you'll see live progress as it orchestrates the entire pipeline: finding sources, extracting content, synthesizing information, and generating the final report.

Create src/core/ResearchAssistant.js:

// src/core/ResearchAssistant.js
import { SourceDiscovery } from './SourceDiscovery.js';
import { ContentExtractor } from './ContentExtractor.js';
import { Synthesizer } from './Synthesizer.js';
import { ResearchReport } from '../models/ResearchReport.js';
import { CONFIG } from '../config/constants.js';

export class ResearchAssistant {
    constructor(apiKey, options = {}) {
        if (!apiKey) {
            throw new Error('API key is required');
        }

        this.apiKey = apiKey;
        this.enableStreaming = options.enableStreaming !== false;
        this.eventCallback = this.enableStreaming ? this.handleEvent.bind(this) : null;

        this.sourceDiscovery = new SourceDiscovery(apiKey, this.eventCallback);
        this.contentExtractor = new ContentExtractor(apiKey, this.eventCallback);
        this.synthesizer = new Synthesizer(apiKey, this.eventCallback);
        this.startTime = null;
    }

    handleEvent(eventType, data) {
        if (!this.enableStreaming) return;

        const timestamp = new Date().toLocaleTimeString();
        const eventIcon = this.getEventIcon(eventType);

        console.log(`[${timestamp}] ${eventIcon} ${data.message || eventType}`);

        // Show additional details for key events
        if (eventType === 'discovery:completed' && data.sources) {
            data.sources.forEach((source, i) => {
                console.log(`  ${i + 1}. ${source.title} (${(source.confidence * 100).toFixed(0)}% confidence)`);
            });
        } else if (eventType === 'synthesis:completed') {
            console.log(`    • Confidence: ${(data.confidenceScore * 100).toFixed(0)}%`);
            console.log(`    • Key findings: ${data.keyFindingsCount}`);
            console.log(`    • Consensus points: ${data.consensusPointsCount}`);
        }
    }

    getEventIcon(eventType) {
        const icons = {
            'discovery:started': '🔍',
            'discovery:api_request': '🌐',
            'discovery:stream_event': '📡',
            'discovery:final_answer': '🎯',
            'discovery:completed': '✅',
            'extraction:started': '📄',
            'extraction:batch_started': '📦',
            'extraction:source_started': '📖',
            'extraction:api_request': '🌐',
            'extraction:source_completed': '✅',
            'extraction:completed': '✅',
            'synthesis:started': '🤔',
            'synthesis:preparing': '📋',
            'synthesis:api_request': '🧠',
            'synthesis:completed': '✅'
        };
        return icons[eventType] || '📡';
    }

    async research(query, options = {}) {
        this.startTime = Date.now();

        const config = {
            numSources: options.numSources || CONFIG.RESEARCH.DEFAULT_SOURCE_COUNT,
            includeConflicts: options.includeConflicts !== false,
            ...options
        };

        console.log(`🔍 Starting research on: "${query}"`);
        console.log(`📚 Looking for ${config.numSources} sources...`);

        try {
            // Phase 1: Discover relevant sources
            const sources = await this.sourceDiscovery.discoverSources(query, config.numSources);

            if (sources.length === 0) {
                throw new Error('No sources found for the given query');
            }

            // Phase 2: Extract content from each source
            const enrichedSources = await this.contentExtractor.extractContent(sources);

            if (enrichedSources.length === 0) {
                throw new Error('Failed to extract content from any sources');
            }

            // Phase 3: Analyze and synthesize findings
            const analysis = await this.synthesizer.synthesizeInformation(query, enrichedSources);

            // Phase 4: Generate the final report
            const report = this.generateReport(query, analysis, enrichedSources);

            console.log(`✅ Research complete! Analyzed ${enrichedSources.length} sources`);
            return report;

        } catch (error) {
            console.error('❌ Research failed:', error.message);
            throw error;
        }
    }

    generateReport(query, analysis, sources) {
        const processingTime = `${((Date.now() - this.startTime) / 1000).toFixed(1)}s`;

        return new ResearchReport({
            query,
            executiveSummary: analysis.executiveSummary,
            detailedAnalysis: analysis.detailedAnalysis,
            sources,
            confidenceScore: analysis.calculatedConfidence,
            keyFindings: analysis.keyFindings,
            consensusPoints: analysis.consensusPoints,
            conflictingViews: analysis.conflictingViews,
            knowledgeGaps: analysis.knowledgeGaps,
            relatedTopics: analysis.relatedTopics,
            processingTime
        });
    }

    async saveReport(report, filename) {
        const fs = await import('fs/promises');
        const path = await import('path');

        if (!filename) {
            const safeQuery = report.query
                .replace(/[^a-z0-9]/gi, '_')
                .substring(0, 30);
            filename = `report_${safeQuery}_${Date.now()}.md`;
        }

        const markdown = report.toMarkdown();
        await fs.writeFile(filename, markdown, 'utf8');

        console.log(`💾 Report saved to: ${filename}`);
        return filename;
    }
}

export default ResearchAssistant;

Step 6: Putting Your Assistant to Work

Now for the exciting part - let's create a real example that shows our research assistant in action. This example demonstrates both basic and advanced usage patterns.

Create examples/basicResearch.js:

// examples/basicResearch.js
import { ResearchAssistant } from '../src/core/ResearchAssistant.js';
import dotenv from 'dotenv';

dotenv.config();

async function main() {
    // Check for API key
    if (!process.env.TABS_API_KEY) {
        console.error('❌ Please set TABS_API_KEY in your .env file');
        process.exit(1);
    }

    // Initialize the assistant with streaming enabled
    const assistant = new ResearchAssistant(process.env.TABS_API_KEY, {
        enableStreaming: true
    });

    try {
        console.log('\n🚀 === ADVANCED RESEARCH ASSISTANT ===\n');
        console.log('📡 Live streaming events enabled');
        console.log('🔬 Using TabStack /automate endpoint for web research\n');
        console.log('=' * 80);

        // Complex, multi-faceted research question
        const complexQuery = `
        Analyze the economic, environmental, and social impacts of implementing
        large-scale carbon capture and storage (CCS) technology in developing nations,
        considering the trade-offs between climate mitigation benefits and potential
        infrastructure costs, energy security implications, and effects on local communities.
        `.trim();

        console.log('🧠 Complex Research Question:');
        console.log(`"${complexQuery}"`);
        console.log('\n📋 This query requires finding sources on:');
        console.log('  • Carbon capture technology');
        console.log('  • Economic analysis in developing countries');
        console.log('  • Environmental policy impacts');
        console.log('  • Social and community effects');
        console.log('  • Energy infrastructure and security\n');
        console.log('⏳ This may take 2-3 minutes due to complex web search requirements...\n');
        console.log('=' * 80 + '\n');

        const startTime = Date.now();

        const report = await assistant.research(complexQuery, {
            numSources: 4  // Focused research for better quality
        });

        const totalTime = ((Date.now() - startTime) / 1000).toFixed(1);

        console.log('\n' + '='.repeat(80));
        console.log('📊 RESEARCH COMPLETED');
        console.log('='.repeat(80));

        // Display comprehensive results
        console.log(`\n✅ Research Complete!`);
        console.log(`📋 Query: Carbon Capture & Storage in Developing Nations`);
        console.log(`🎯 Confidence: ${(report.confidenceScore * 100).toFixed(0)}% (${report.getConfidenceLevel()})`);
        console.log(`📚 Sources Found & Analyzed: ${report.sources.length}`);
        console.log(`⏱️  Total Processing Time: ${totalTime}s`);
        console.log(`🌐 Research Method: AI-powered web automation`);

        // Show source quality
        console.log('\n📖 Sources Analyzed:');
        report.sources.forEach((source, i) => {
            console.log(`  ${i + 1}. ${source.title}`);
            console.log(`     🔗 ${source.url}`);
            console.log(`     📊 Confidence: ${(source.confidence * 100).toFixed(0)}%`);
            console.log(`     🏷️  Type: ${source.sourceType}`);
            console.log('');
        });

        console.log('💡 Key Research Findings:');
        report.keyFindings.forEach((finding, i) => {
            console.log(` ${i + 1}. ${finding}`);
        });

        console.log('\n✅ Expert Consensus Points:');
        report.consensusPoints.forEach(point => {
            console.log(` • ${point}`);
        });

        if (report.conflictingViews.length > 0) {
            console.log('\n⚠️ Areas of Disagreement:');
            report.conflictingViews.forEach(view => {
                console.log(` • ${view}`);
            });
        }

        if (report.knowledgeGaps.length > 0) {
            console.log('\n❓ Identified Knowledge Gaps:');
            report.knowledgeGaps.forEach(gap => {
                console.log(` • ${gap}`);
            });
        }

        // Save the comprehensive report
        const filename = await assistant.saveReport(report);
        console.log(`\n💾 Comprehensive report saved to: ${filename}`);

        console.log('\n🎉 Advanced research analysis complete!');
        console.log('📈 This demonstrates the power of AI-driven research automation');

    } catch (error) {
        console.error('\n❌ Research Failed:', error.message);

        if (error.message.includes('automation failed')) {
            console.log('\n🤖 This indicates the TabStack API encountered issues with:');
            console.log('  • reCAPTCHA challenges on search engines');
            console.log('  • Timeout during complex web navigation');
            console.log('  • Rate limiting or anti-bot detection');
            console.log('\n💡 The streaming events above show exactly where the process failed');
        }

        process.exit(1);
    }
}

// Run the advanced research example
main().catch(console.error);

Step 7: Testing Your Assistant

Good software needs good tests. These tests ensure that our credibility scoring works correctly, that we handle errors gracefully, and that our reports are properly formatted. While these are basic tests, they provide a foundation for more comprehensive testing as your assistant evolves.

Create tests/ResearchAssistant.test.js:

// tests/ResearchAssistant.test.js
import { ResearchAssistant } from '../src/core/ResearchAssistant.js';
import { assessSourceCredibility } from '../src/utils/credibility.js';

describe('ResearchAssistant', () => {
    let assistant;

    beforeAll(() => {
        const apiKey = process.env.TABS_API_KEY || 'test-key';
        assistant = new ResearchAssistant(apiKey);
    });

    describe('Source Credibility', () => {
        test('should rate .edu domains highly', () => {
            const score = assessSourceCredibility('https://stanford.edu/research');
            expect(score).toBeGreaterThanOrEqual(0.9);
        });

        test('should rate blog domains lower', () => {
            const score = assessSourceCredibility('https://random-blog.com/post');
            expect(score).toBeLessThan(0.5);
        });

        test('should give medium confidence to .com domains', () => {
            const score = assessSourceCredibility('https://example.com/article');
            expect(score).toBeGreaterThanOrEqual(0.5);
            expect(score).toBeLessThanOrEqual(0.8);
        });
    });

    describe('Research Process', () => {
        test('should handle empty queries gracefully', async () => {
            await expect(assistant.research('')).rejects.toThrow();
        });

        test('should respect source count limits', async () => {
            // This would need mocking in a real test
            const sources = await assistant.sourceDiscovery.discoverSources(
                'test query',
                3
            );
            expect(sources.length).toBeLessThanOrEqual(3);
        });
    });

    describe('Report Generation', () => {
        test('should generate valid markdown', () => {
            const mockReport = {
                query: 'Test Query',
                executiveSummary: 'Summary',
                detailedAnalysis: 'Analysis',
                sources: [],
                confidenceScore: 0.8,
                keyFindings: ['Finding 1', 'Finding 2'],
                getConfidenceLevel: () => 'HIGH',
                metadata: {
                    generatedAt: new Date().toISOString(),
                    sourceCount: 0
                },
                toMarkdown: function() {
                    return `# Research Report: ${this.query}`;
                }
            };

            const markdown = mockReport.toMarkdown();
            expect(markdown).toContain('Research Report');
            expect(markdown).toContain('Test Query');
        });
    });
});

Running Your Research Assistant

Understanding the API Endpoints

Our research assistant uses three key TABS API endpoints:

/automate: Uses AI to perform web searches and find relevant sources
/markdown: Converts web pages to clean, readable markdown content
/transform: Analyzes content and generates structured insights

Running the Application

Now that you've set up all the files, here's how to run your research assistant:

# Run the basic example
npm start

# Or use nodemon for development (auto-restart on changes)
npm run dev

# Run tests
npm test

Expected Output

When you run the application, you'll see live streaming output like this:

🚀 === ADVANCED RESEARCH ASSISTANT ===

📡 Live streaming events enabled
🔬 Using TabStack /automate endpoint for web research

🧠 Complex Research Question:
"Analyze the economic, environmental, and social impacts of implementing large-scale carbon capture and storage (CCS) technology in developing nations..."

📋 This query requires finding sources on:
  • Carbon capture technology
  • Economic analysis in developing countries
  • Environmental policy impacts
  • Social and community effects
  • Energy infrastructure and security

⏳ This may take 2-3 minutes due to complex web search requirements...

================================================================================

🔍 Starting research on: "Analyze the economic, environmental, and social impacts..."
📚 Looking for 4 sources...
[4:15:32 PM] 🔍 Starting AI-powered source discovery
🌐 Using AI automation to find relevant sources...
[4:15:32 PM] 🌐 Sending request to TabStack /automate endpoint
[4:15:32 PM] 📡 TabStack event: start
[4:15:33 PM] 📡 TabStack event: agent:processing
[4:15:38 PM] 📡 TabStack event: browser:navigated
[4:15:38 PM] 📡 TabStack event: agent:action
[4:15:45 PM] 📡 TabStack event: agent:extracted
[4:15:52 PM] 🎯 Received final answer from TabStack
[4:15:52 PM] ✅ Source discovery completed - found 4 sources
  1. IEA Carbon Capture Report (95% confidence)
  2. World Bank CCS Economics Study (90% confidence)
  3. UN Climate Technology Analysis (88% confidence)
  4. MIT Energy Initiative Research (85% confidence)
✅ Source discovery completed. Found 4 sources.
[4:15:52 PM] 📄 Starting content extraction
📚 Extracting content from sources...
[4:15:52 PM] 📦 Processing batch 1/2
[4:15:52 PM] 📖 Reading source 1/4: IEA Carbon Capture Report
[4:15:53 PM] 🌐 Making API request for: IEA Carbon Capture Report
[4:15:55 PM] ✅ Successfully extracted: IEA Carbon Capture Report
[4:15:55 PM] 📖 Reading source 2/4: World Bank CCS Economics Study
[4:15:58 PM] ✅ Successfully extracted: World Bank CCS Economics Study
[4:15:58 PM] 📦 Batch 1 completed
[4:15:58 PM] ✅ Content extraction completed - 4/4 sources processed
✨ Extracted content from 4 sources
[4:15:58 PM] 🤔 Starting information synthesis and analysis
🤔 Analyzing and synthesizing information...
[4:15:58 PM] 📋 Preparing sources for analysis
[4:15:58 PM] 🧠 Sending analysis request to AI
[4:16:15 PM] ✅ Information synthesis completed
    • Confidence: 82%
    • Key findings: 8
    • Consensus points: 5
✅ Synthesis complete
✅ Research complete! Analyzed 4 sources

================================================================================
📊 RESEARCH COMPLETED
================================================================================

✅ Research Complete!
📋 Query: Carbon Capture & Storage in Developing Nations
🎯 Confidence: 82% (HIGH)
📚 Sources Found & Analyzed: 4
⏱️  Total Processing Time: 43.2s
🌐 Research Method: AI-powered web automation

📖 Sources Analyzed:
  1. IEA Carbon Capture Report
     🔗 https://iea.org/reports/carbon-capture-utilisation-and-storage
     📊 Confidence: 95%
     🏷️  Type: organization

💡 Key Research Findings:
 1. CCS implementation costs range from $50-200 per ton of CO2 in developing nations
 2. Infrastructure requirements could strain existing energy grids by 15-30%
 3. Local communities face displacement risks but also job creation opportunities
 4. Environmental co-benefits include reduced local air pollution
 5. Technology transfer barriers remain significant for developing economies

✅ Expert Consensus Points:
 • CCS technology shows promise for climate mitigation in developing countries
 • Economic feasibility depends heavily on carbon pricing mechanisms
 • Community engagement is critical for successful implementation

💾 Comprehensive report saved to: report_Carbon_Capture_Storage_1703123456.md

🎉 Advanced research analysis complete!
📈 This demonstrates the power of AI-driven research automation

Understanding the Architecture

Our research assistant follows a clean, modular architecture:

┌───────────────────────────────────────────┐
│           ResearchAssistant             │
│     (Orchestrates the entire flow)      │
└─────────────────┬───────────────────────┘
                 │
     ┌─────────┴──────────┐
     │                      │
┌────┴─────┐       ┌─────┴──────┐       ┌──────────┐
│ Discovery │       │ Extractor  │       │Synthesizer│
│  (Find)   │       │   (Read)   │       │ (Analyze) │
└─────┬─────┘       └─────┬──────┘       └────┬─────┘
      │                     │                     │
      └─────────┬─────────┴─────────┬───────┘
                │                     │
        ┌───────┴──────┐      ┌──────┴──────┐
        │  TABS /answers │      │ TABS /markdown│
        │  TABS /transform│      └───────────────┘
        └───────────────┘

Each component in our architecture has a specific role:

ResearchAssistant: The conductor that orchestrates everything
SourceDiscovery: Finds relevant sources using web search
ContentExtractor: Reads and cleans content from web pages
Synthesizer: Analyzes all sources and generates insights
Models: Define the shape of our data
Utils: Helper functions for common tasks

This modular design makes the code easy to understand, test, and extend. Each piece does one thing well, and they all work together seamlessly.

Next Steps

Congratulations! You've built a fully functional research assistant that can:

Find authoritative sources on any topic
Extract and clean content from web pages
Synthesize information across multiple sources
Generate comprehensive research reports
Assess confidence in its findings

Here are some ideas for extending your assistant:

Add a Web Interface: Build a simple Express.js API and React frontend to make your assistant accessible via a browser
Implement Caching: Save research reports to avoid re-processing identical queries
Add Export Options: Generate PDFs or Word documents from your markdown reports
Create Specialized Assistants: Build domain-specific versions for medical research, legal research, or market analysis
Add Real-time Updates: Use WebSockets to show research progress in real-time

Troubleshooting Guide

Common Issues and Solutions

🔴 TabStack /automate Blocked by reCAPTCHA

This is the most common issue with web automation:

// Watch for streaming events that indicate blocking:
[4:15:45 PM] 📡 TabStack event: browser:blocked
[4:15:46 PM] ❌ Task failed: Blocked by reCAPTCHA on Google search

// Solution: Implement fallback strategies
export class SourceDiscovery {
    async discoverSources(query, numSources) {
        try {
            return await this.automateSearch(query, numSources);
        } catch (error) {
            if (error.message.includes('reCAPTCHA') || error.message.includes('blocked')) {
                console.log('🔄 Web automation blocked, trying alternative approach...');

                // Fallback: Try different search engines or direct site searches
                return await this.fallbackSearch(query, numSources);
            }
            throw error;
        }
    }

    async fallbackSearch(query, numSources) {
        // Alternative: Search specific authoritative sites directly
        const directSites = [
            'site:nature.com',
            'site:sciencedirect.com',
            'site:scholar.google.com',
            'site:who.int',
            'site:gov'
        ];

        for (const site of directSites) {
            try {
                const result = await this.automateSearch(`${query} ${site}`, 2);
                if (result.sources?.length > 0) {
                    return result.sources.slice(0, numSources);
                }
            } catch (e) {
                continue; // Try next site
            }
        }

        throw new Error('All search strategies failed - try again later');
    }
}

🟠 Streaming Events Stop/Timeout

// If you see events stop mid-stream:
[4:15:32 PM] 🔍 Starting AI-powered source discovery
[4:15:45 PM] 📡 TabStack event: agent:processing
// ... then nothing for 60+ seconds

// Solution: Implement timeout handling
async processStreamingResponse(response) {
    const timeout = 120000; // 2 minutes
    const startTime = Date.now();

    return new Promise((resolve, reject) => {
        const timeoutId = setTimeout(() => {
            reject(new Error('Streaming response timeout after 2 minutes'));
        }, timeout);

        // Process stream as before...
        for await (const chunk of response.body) {
            // Clear timeout on each chunk
            clearTimeout(timeoutId);
            setTimeout(() => {
                reject(new Error('No data received for 30 seconds'));
            }, 30000);

            // Process chunk...
        }

        clearTimeout(timeoutId);
    });
}

🔴 "No sources found" Error

🟪 "Rate limit exceeded" Error

// Solution: Implement request queuing
class QueuedResearchAssistant extends ResearchAssistant {
    constructor(apiKey) {
        super(apiKey);
        this.requestQueue = [];
        this.processing = false;
        this.requestsPerMinute = 30;
    }

    async queueRequest(fn) {
        return new Promise((resolve, reject) => {
            this.requestQueue.push({ fn, resolve, reject });
            this.processQueue();
        });
    }

    async processQueue() {
        if (this.processing || this.requestQueue.length === 0) return;

        this.processing = true;
        const delay = 60000 / this.requestsPerMinute;

        while (this.requestQueue.length > 0) {
            const { fn, resolve, reject } = this.requestQueue.shift();
            try {
                const result = await fn();
                resolve(result);
            } catch (error) {
                reject(error);
            }
            await this.sleep(delay);
        }

        this.processing = false;
    }
}

⏳ Slow Performance

// Solution: Optimize with parallel processing and caching
const optimizations = {
    // 1. Process in parallel where possible
    parallelExtraction: true,
    maxConcurrency: 5,

    // 2. Cache frequently accessed data
    cacheEnabled: true,
    cacheDuration: 3600000, // 1 hour

    // 3. Skip low-quality sources early
    minSourceConfidence: 0.5,

    // 4. Limit content size
    maxContentLength: 10000 // characters
};

Performance Benchmarks

Typical performance metrics for reference with streaming enabled:

Source Discovery (with /automate): 20-60 seconds (depends on search complexity and reCAPTCHA)
Content Extraction: 1-3 seconds per source (batched)
Synthesis: 5-15 seconds (depends on content volume)
Total Research Time: 30-90 seconds for 4 complex sources

Performance varies significantly based on:

Query complexity (simple vs multi-faceted)
Web automation challenges (reCAPTCHA frequency)
Source accessibility (paywalls, loading speed)
TabStack API load and response times

Streaming events help you understand exactly where time is spent:

🔍 Discovery: 0-45s    📄 Extraction: 45-55s    🤔 Synthesis: 55-70s
     ↓                        ↓                          ↓
  🌐 Web search          📦 Batch processing      🧠 AI analysis
  📡 Browser events      🌐 API requests          📋 Report generation
  🎯 Result parsing      ✅ Content validation    ✅ Confidence scoring

Getting Help

If you're still stuck:

Review the API Reference for endpoint details
Check that you're using the correct TABS API endpoints:
- /automate for source discovery
- /markdown for content extraction
- /transform for analysis
Ensure your API key has access to the /automate endpoint

The Story Behind This Project​

What Makes This Assistant Special​

How It Works: The Research Pipeline​

Project Setup​

Prerequisites​

Step 1: Initialize Your Project​

Step 2: Create Your Project Structure​

Step 3: Set Up Environment Variables​

Step 4: Update package.json Scripts​

Building Your Research Assistant​

Step 1: Configuration and Constants​

Step 2: Data Models - Defining Our Data Structures​

Step 3: Utility Functions - The Helper Tools​

Step 4: Core Components - The Brain of Our Assistant​

Step 5: The Main Research Assistant - Bringing It All Together​

Step 6: Putting Your Assistant to Work​

Step 7: Testing Your Assistant​

Running Your Research Assistant​

Understanding the API Endpoints​

Running the Application​

Expected Output​

Understanding the Architecture​

Next Steps​

Troubleshooting Guide​

Common Issues and Solutions​

🔴 TabStack /automate Blocked by reCAPTCHA​

🟠 Streaming Events Stop/Timeout​

🔴 "No sources found" Error​

🟪 "Rate limit exceeded" Error​

⏳ Slow Performance​

Performance Benchmarks​

Getting Help​