Implementing Streaming Output with NixAPI: A Complete Guide to the Typewriter Effect

Learn how to use NixAPI's streaming API to display LLM responses token-by-token in real time. Covers Python, Node.js, and browser-side React implementations with full runnable code.

NixAPI Team January 22, 2025 ~3 min read

The “typewriter effect” you see in ChatGPT — where text appears character by character — is called streaming output. It uses the HTTP Server-Sent Events (SSE) protocol under the hood.

This guide walks you through the concept and implementation end-to-end.


Why Use Streaming?

A standard API call waits until the model has finished generating all content before returning a response. If the model is writing a 1,000-word article, you could be staring at a blank screen for 10–20 seconds.

Streaming pushes each token to the client as soon as it’s generated, so users see content appearing immediately — a dramatically better experience.

Standard mode:  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ → display all at once
Streaming mode: token → token → token → real-time display

Python Streaming

from openai import OpenAI

client = OpenAI(
    api_key="your-NixAPI-key",
    base_url="https://api.nixapi.com/v1",
)

# Enable streaming with stream=True
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem about autumn"}],
    stream=True,   # ← key parameter
)

# Print each chunk as it arrives
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

print()  # newline at the end

Run this and you’ll see text appear one character at a time. flush=True prevents output buffering delays.


Node.js / TypeScript Streaming

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-NixAPI-key',
  baseURL: 'https://api.nixapi.com/v1',
});

async function streamChat() {
  const stream = client.chat.completions.stream({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Explain what a vector database is' }],
  });

  // Approach 1: async iterator
  for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content ?? '';
    process.stdout.write(text);
  }

  // Get the final complete result
  const finalCompletion = await stream.finalChatCompletion();
  console.log('\nTotal tokens used:', finalCompletion.usage?.total_tokens);
}

streamChat();

Browser / React Frontend

import { useState } from 'react';

export default function StreamingChat() {
  const [output, setOutput] = useState('');
  const [loading, setLoading] = useState(false);

  async function handleAsk() {
    setOutput('');
    setLoading(true);

    const response = await fetch('https://api.nixapi.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${import.meta.env.VITE_NIXAPI_KEY}`,
      },
      body: JSON.stringify({
        model: 'gpt-4o',
        messages: [{ role: 'user', content: 'Explain quantum entanglement simply' }],
        stream: true,
      }),
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const lines = decoder.decode(value).split('\n');
      for (const line of lines) {
        if (!line.startsWith('data: ')) continue;
        const data = line.slice(6);
        if (data === '[DONE]') break;

        try {
          const json = JSON.parse(data);
          const text = json.choices[0]?.delta?.content ?? '';
          setOutput(prev => prev + text);
        } catch {}
      }
    }

    setLoading(false);
  }

  return (
    <div>
      <button onClick={handleAsk} disabled={loading}>
        {loading ? 'Generating...' : 'Ask'}
      </button>
      <p style={{ whiteSpace: 'pre-wrap' }}>{output}</p>
    </div>
  );
}

Security note: API keys in frontend code are visible to users. In production, proxy requests through your backend instead.


Things to Keep in Mind

  1. Timeouts: Streaming requests can run for a while. Set your HTTP timeout to at least 120s.
  2. Error handling: Catch exceptions on network drops and prompt the user to retry.
  3. Token usage: In streaming mode, the usage field only appears in the final chunk.

Summary

Standard ModeStreaming Mode
Parameterdefaultstream: true
Response formatSingle JSON objectSSE data stream
User experienceWait, then display allReal-time token-by-token
Best forBatch processing, background tasksChat UI, content generation

👉 Sign up for NixAPI and try streaming for free

Try NixAPI Now

Reliable LLM API relay for OpenAI, Claude, Gemini, DeepSeek, Qwen, and Grok with ¥1 = $1 top-up

Sign Up Free