Server-sent events, Laravel’s streaming response helpers, Livewire real-time updates, broadcasting AI output token by token, handling mid-stream errors, and the UX patterns that make users feel like the AI is typing — not loading. The complete implementation guide.
The UX difference between a loading spinner for 8 seconds and text appearing word by word in real time is enormous — even when the total time is identical. The first feels like waiting. The second feels like watching. Streaming turns a latency problem into an engagement feature.
Every major AI interface ships with streaming: ChatGPT, Claude, Gemini, Copilot. The expectation is set. When your Laravel AI feature responds with a spinner and then dumps the full response, it feels dated.
This post is the complete implementation: how to stream AI responses to the browser, how to build it with Server-Sent Events and with Livewire, how to handle errors mid-stream, and the UX patterns that make streaming feel polished rather than janky.
How AI Streaming Works
Before the implementation, understanding the mechanism:
Without streaming:
Client → POST /api/ask → wait 8 seconds → receive full response → display
With streaming:
Client → POST /api/ask → receive chunk 1 (after ~200ms) → display
→ receive chunk 2 (after ~400ms) → append
→ receive chunk 3 (after ~600ms) → append
→ ...
→ stream complete → done
Most LLM providers support streaming via Server-Sent Events (SSE) — a persistent HTTP connection where the server pushes data to the client incrementally. The Laravel AI SDK wraps this in a clean API.
The Backend: Streaming with the Laravel AI SDK
// app/Http/Controllers/AiStreamController.php
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\AI;
use Symfony\Component\HttpFoundation\StreamedResponse;
class AiStreamController extends Controller
{
public function stream(Request $request): StreamedResponse
{
$request->validate([
'prompt' => ['required', 'string', 'max:2000'],
'context' => ['nullable', 'string'],
]);
$prompt = $request->input('prompt')
$context = $request->input('context', '')
return response()->stream(function () use ($prompt, $context) {
// Disable output buffering — critical for streaming
if (ob_get_level() > 0) {
ob_end_flush()
}
try {
$stream = AI::text(
prompt: $prompt,
system: "You are a helpful assistant. {$context}",
model: 'gpt-4o',
stream: true,
);
foreach ($stream as $chunk) {
if (!empty($chunk->text)) {
// SSE format: each event is "data: {payload}\n\n"
echo 'data: ' . json_encode([
'type' => 'content',
'token' => $chunk->text,
]) . "\n\n";
// Flush the output to the browser immediately
flush()
}
}
// Send a completion event so the client knows we're done
echo 'data: ' . json_encode(['type' => 'done']) . "\n\n";
flush()
} catch (\Throwable $e) {
// Send error through the stream — client handles it
echo 'data: ' . json_encode([
'type' => 'error',
'message' => 'An error occurred while generating the response.',
]) . "\n\n";
flush()
}
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no', // disables Nginx buffering
'Connection' => 'keep-alive',
'Access-Control-Allow-Origin' => '*', // adjust for your CORS needs
]);
}
}
The Three Critical Headers
Content-Type: text/event-stream
→ Tells the browser this is an SSE stream, not a regular HTTP response
Cache-Control: no-cache
→ Prevents any proxy from caching the stream
X-Accel-Buffering: no
→ Disables Nginx's response buffering
Without this, Nginx holds the response until it's complete
Everything arrives at once — streaming is defeated
The Route
// routes/api.php
Route::middleware(['auth:sanctum', 'throttle:ai'])->group(function () {
Route::post('/ai/stream', [AiStreamController::class, 'stream']);
});
Rate limiting for AI streaming endpoints deserves special attention — each stream holds an open connection:
// In AppServiceProvider or RouteServiceProvider
RateLimiter::for('ai', function (Request $request) {
return [
Limit::perMinute(20)->by($request->user()->id), // 20 streams per minute
Limit::perHour(200)->by($request->user()->id), // 200 per hour
];
});
The Frontend: Consuming SSE with the EventSource API
The browser’s native EventSource API handles SSE connections. For POST requests (which SSE doesn’t natively support), use fetch with ReadableStream.
Option A: Using fetch() with ReadableStream (Recommended)
// resources/js/composables/useAiStream.js
export function useAiStream() {
const content = ref('')
const isStreaming = ref(false)
const error = ref(null)
let controller = null
async function stream(prompt, context = '') {
// Cancel any existing stream
if (controller) controller.abort()
controller = new AbortController()
content.value = ''
isStreaming.value = true
error.value = null
try {
const response = await fetch('/api/ai/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Accept': 'text/event-stream',
'Authorization': `Bearer ${getToken()}`,
'X-CSRF-TOKEN': document.querySelector('meta[name="csrf-token"]')?.content,
},
body: JSON.stringify({ prompt, context }),
signal: controller.signal,
})
if (!response.ok) {
throw new Error(`HTTP ${response.status}`)
}
// Read the stream
const reader = response.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
// Parse SSE lines from the buffer
const lines = buffer.split('\n')
buffer = lines.pop() // keep incomplete line in buffer
for (const line of lines) {
if (!line.startsWith('data: ')) continue
const jsonStr = line.slice(6).trim()
if (!jsonStr) continue
try {
const data = JSON.parse(jsonStr)
if (data.type === 'content') {
content.value += data.token
} else if (data.type === 'done') {
isStreaming.value = false
} else if (data.type === 'error') {
error.value = data.message
isStreaming.value = false
}
} catch {
// Malformed JSON in stream — skip
}
}
}
} catch (err) {
if (err.name !== 'AbortError') {
error.value = 'Connection failed. Please try again.'
}
} finally {
isStreaming.value = false
}
}
function cancel() {
if (controller) {
controller.abort()
isStreaming.value = false
}
}
return { content, isStreaming, error, stream, cancel }
}
Vue Component Using the Composable
<!-- resources/js/components/AiChat.vue -->
<script setup lang="ts">
import { ref } from 'vue'
import { useAiStream } from '@/composables/useAiStream'
const prompt = ref('')
const { content, isStreaming, error, stream, cancel } = useAiStream()
async function submit() {
if (!prompt.value.trim() || isStreaming.value) return
await stream(prompt.value)
prompt.value = ''
}
</script>
<template>
<div class="ai-chat">
<!-- Response area -->
<div class="ai-response" :class="{ streaming: isStreaming }">
<template v-if="content || isStreaming">
<!-- Render content with preserved whitespace -->
<p v-for="paragraph in content.split('\n\n')" :key="paragraph">
{{ paragraph }}
</p>
<!-- Typing cursor shown while streaming -->
<span v-if="isStreaming" class="cursor" aria-hidden="true">▌</span>
</template>
<template v-else-if="error">
<p class="error">{{ error }}</p>
</template>
<template v-else>
<p class="placeholder">Ask me anything...</p>
</template>
</div>
<!-- Input -->
<form @submit.prevent="submit" class="ai-input">
<textarea
v-model="prompt"
placeholder="Type your question..."
rows="3"
:disabled="isStreaming"
@keydown.enter.exact.prevent="submit"
@keydown.enter.shift.prevent="prompt += '\n'"
/>
<div class="ai-input-actions">
<button v-if="isStreaming" type="button" @click="cancel" class="btn-cancel">
Stop generating
</button>
<button v-else type="submit" :disabled="!prompt.trim()" class="btn-submit">
Send
</button>
</div>
</form>
</div>
</template>
<style scoped>
.cursor {
display: inline-block;
animation: blink 1s step-end infinite;
color: #3b82f6;
font-weight: bold;
}
@keyframes blink {
0%, 100% { opacity: 1; }
50% { opacity: 0; }
}
.ai-response {
min-height: 200px;
padding: 16px;
border-radius: 8px;
background: #f8fafc;
font-family: inherit;
line-height: 1.7;
white-space: pre-wrap;
}
</style>
Streaming with Livewire
For Livewire-based applications, streaming works differently — Livewire doesn’t natively support server-sent events. The two practical approaches:
Approach 1: Livewire + Alpine.js + fetch()
Use Livewire for state management and Alpine.js to handle the SSE stream:
// app/Livewire/AiAssistant.php
<?php
namespace App\Livewire;
use Livewire\Component;
class AiAssistant extends Component
{
public string $prompt = '';
public string $response = '';
public bool $loading = false;
public function render(): \Illuminate\View\View
{
return view('livewire.ai-assistant');
}
public function startStream(): void
{
$this->loading = true;
$this->response = '';
// The actual streaming happens via Alpine.js + fetch()
// Livewire dispatches an event that Alpine intercepts
$this->dispatch('start-ai-stream', [
'prompt' => $this->prompt,
]);
}
public function appendToken(string $token): void
{
$this->response .= $token;
}
public function streamComplete(): void
{
$this->loading = false;
}
public function streamError(string $message): void
{
$this->loading = false;
$this->response = "Error: {$message}";
}
}
{{-- resources/views/livewire/ai-assistant.blade.php --}}
<div
x-data="aiStream()"
x-init="init()"
@start-ai-stream.window="startStream($event.detail[0].prompt)"
>
{{-- Response display --}}
<div class="response-area">
<p>{{ $response }}</p>
<span x-show="$wire.loading" class="cursor">▌</span>
</div>
{{-- Input form --}}
<form wire:submit="startStream">
<textarea wire:model="prompt"></textarea>
<button type="submit" :disabled="$wire.loading">
{{ $loading ? 'Generating...' : 'Ask' }}
</button>
</form>
</div>
<script>
function aiStream() {
return {
controller: null,
init() {
// Listen for the Livewire event
},
async startStream(prompt) {
if (this.controller) this.controller.abort()
this.controller = new AbortController()
try {
const response = await fetch('/api/ai/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-CSRF-TOKEN': document.querySelector('meta[name="csrf-token"]').content,
},
body: JSON.stringify({ prompt }),
signal: this.controller.signal,
})
const reader = response.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop()
for (const line of lines) {
if (!line.startsWith('data: ')) continue
const data = JSON.parse(line.slice(6))
if (data.type === 'content') {
// Call Livewire method to append the token
$wire.appendToken(data.token)
} else if (data.type === 'done') {
$wire.streamComplete()
} else if (data.type === 'error') {
$wire.streamError(data.message)
}
}
}
} catch (err) {
if (err.name !== 'AbortError') {
$wire.streamError('Connection failed.')
}
}
}
}
}
</script>
Approach 2: Streaming Directly to a Livewire Component via Events
For a cleaner Livewire architecture, use $this->stream() — Livewire 3’s native streaming helper:
// app/Livewire/AiAssistant.php (Livewire 3 streaming)
use Livewire\Attributes\Locked;
class AiAssistant extends Component
{
public string $prompt = '';
public string $response = '';
#[Locked]
public bool $streaming = false;
public function ask(): void
{
$this->validate(['prompt' => 'required|string|max:2000']);
$this->response = '';
$this->streaming = true;
$stream = AI::text(
prompt: $this->prompt,
model: 'gpt-4o',
stream: true,
);
foreach ($stream as $chunk) {
if (!empty($chunk->text)) {
// Livewire 3 native streaming — sends DOM updates to browser
$this->stream(
to: 'response',
content: $chunk->text,
replace: false, // append, not replace
);
}
}
$this->streaming = false;
}
public function render(): \Illuminate\View\View
{
return view('livewire.ai-assistant')
}
}
{{-- resources/views/livewire/ai-assistant.blade.php --}}
<div>
<div class="response">
{{-- wire:stream="response" receives streamed updates --}}
<span wire:stream="response">{{ $response }}</span>
@if($streaming)
<span class="cursor">▌</span>
@endif
</div>
<form wire:submit="ask">
<textarea wire:model="prompt" :disabled="$streaming"></textarea>
<button type="submit" wire:loading.attr="disabled">
<span wire:loading wire:target="ask">Generating...</span>
<span wire:loading.remove wire:target="ask">Ask</span>
</button>
</form>
</div>
Handling Errors Mid-Stream
Errors mid-stream are the most common production issue with AI streaming. The connection can fail, the model can error, or the generation can be cut short. Handling each gracefully:
Server-Side Error Categorisation
// In the stream controller
try {
$stream = AI::text(prompt: $prompt, stream: true);
foreach ($stream as $chunk) {
echo 'data: ' . json_encode([
'type' => 'content',
'token' => $chunk->text,
]) . "\n\n";
flush();
}
echo 'data: ' . json_encode(['type' => 'done']) . "\n\n";
flush();
} catch (\Laravel\Ai\Exceptions\RateLimitException $e) {
echo 'data: ' . json_encode([
'type' => 'error',
'code' => 'rate_limit',
'message' => 'Too many requests. Please wait a moment.',
'retryAfter' => 60,
]) . "\n\n";
flush();
} catch (\Laravel\Ai\Exceptions\ContextLengthException $e) {
echo 'data: ' . json_encode([
'type' => 'error',
'code' => 'context_too_long',
'message' => 'Your message is too long. Please shorten it.',
]) . "\n\n";
flush();
} catch (\Throwable $e) {
Log::error('AI stream error', [
'error' => $e->getMessage(),
'trace' => $e->getTraceAsString(),
]);
echo 'data: ' . json_encode([
'type' => 'error',
'code' => 'server_error',
'message' => 'Something went wrong. Please try again.',
]) . "\n\n";
flush();
}
Client-Side Error Handling with Retry
// In the composable — retry logic for transient errors
async function streamWithRetry(prompt, maxRetries = 2) {
let attempt = 0
while (attempt <= maxRetries) {
try {
await stream(prompt)
return // success
} catch (err) {
attempt++
if (attempt > maxRetries) {
error.value = 'Failed after multiple attempts. Please try again later.'
return
}
// Wait before retry — exponential backoff
const delay = Math.pow(2, attempt) * 1000
await new Promise(resolve => setTimeout(resolve, delay))
content.value = '' // reset for retry
}
}
}
Connection Timeout and Reconnection
// Detect connection drops and reconnect
let streamTimeout = null
function resetStreamTimeout() {
clearTimeout(streamTimeout)
streamTimeout = setTimeout(() => {
// No data received for 30 seconds — connection may have dropped
if (isStreaming.value) {
error.value = 'Connection timed out. The response may be incomplete.'
cancel()
}
}, 30_000)
}
// Reset timeout on every chunk received
if (data.type === 'content') {
content.value += data.token
resetStreamTimeout()
}
Multi-Turn Conversation Streaming
Real AI chat requires conversation history — each message builds on previous ones:
// app/Http/Controllers/AiChatController.php
class AiChatController extends Controller
{
public function stream(Request $request): StreamedResponse
{
$request->validate([
'messages' => ['required', 'array', 'max:50'],
'messages.*.role' => ['required', 'in:user,assistant'],
'messages.*.content' => ['required', 'string', 'max:4000'],
]);
$messages = collect($request->input('messages'))
->map(fn($m) => match ($m['role']) {
'user' => new UserMessage($m['content']),
'assistant' => new AssistantMessage($m['content']),
})
->all();
return response()->stream(function () use ($messages) {
if (ob_get_level() > 0) ob_end_flush();
try {
$stream = AI::text(
messages: $messages,
model: 'gpt-4o',
stream: true,
);
// Collect the full response for storing in conversation history
$fullResponse = '';
foreach ($stream as $chunk) {
if (!empty($chunk->text)) {
$fullResponse .= $chunk->text;
echo 'data: ' . json_encode([
'type' => 'content',
'token' => $chunk->text,
]) . "\n\n";
flush();
}
}
// Store the conversation in the database after streaming completes
ConversationMessage::create([
'conversation_id' => $request->input('conversation_id'),
'role' => 'assistant',
'content' => $fullResponse,
]);
echo 'data: ' . json_encode(['type' => 'done']) . "\n\n";
flush();
} catch (\Throwable $e) {
echo 'data: ' . json_encode(['type' => 'error', 'message' => 'Error occurred.']) . "\n\n";
flush();
}
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no',
]);
}
}
The UX Patterns That Make Streaming Feel Polished
Technical streaming is table stakes. The UX details determine whether it feels smooth or broken.
1. The Blinking Cursor
A blinking cursor signals “I’m still typing” vs “something broke”:
.cursor {
display: inline-block;
width: 2px;
height: 1.2em;
background: currentColor;
animation: blink 1.1s step-end infinite;
vertical-align: text-bottom;
margin-left: 1px;
}
@keyframes blink {
0%, 100% { opacity: 1; }
50% { opacity: 0; }
}
Show the cursor only while isStreaming is true. Remove it immediately when the stream completes or errors.
2. Scroll-Follow Behaviour
Auto-scroll to show new content as it streams — but stop auto-scrolling if the user manually scrolls up:
const responseEl = ref(null)
let userScrolled = false
// Detect manual scroll
onMounted(() => {
responseEl.value?.addEventListener('scroll', () => {
const el = responseEl.value
const atBottom = el.scrollHeight - el.scrollTop - el.clientHeight < 50
userScrolled = !atBottom
})
})
// Auto-scroll when new content arrives
watch(content, () => {
if (!userScrolled && responseEl.value) {
nextTick(() => {
responseEl.value.scrollTop = responseEl.value.scrollHeight
})
}
})
3. Cancel/Stop Button
Always show a “Stop generating” button while streaming. Users change their minds. Long responses should be interruptible:
<button
v-if="isStreaming"
@click="cancel"
class="stop-btn"
aria-label="Stop generating response"
>
<svg><!-- stop icon --></svg>
Stop generating
</button>
4. Partial Response on Cancel
When the user cancels, keep the partial response visible. Don’t clear it:
function cancel() {
if (controller) {
controller.abort()
// Don't clear content.value — the user can read what was generated
isStreaming.value = false
}
}
5. Markdown Rendering During Stream
If the response includes Markdown, render it progressively:
<script setup>
import { marked } from 'marked'
import { computed } from 'vue'
const { content, isStreaming } = useAiStream()
// Parse markdown progressively
const renderedContent = computed(() => {
if (!content.value) return ''
// Add a placeholder for the streaming cursor before parsing
const withCursor = isStreaming.value
? content.value + '\u200B' // zero-width space to stabilise rendering
: content.value
return marked.parse(withCursor)
})
</script>
<template>
<div
class="prose"
v-html="renderedContent"
/>
<span v-if="isStreaming" class="cursor">▌</span>
</template>
6. Show First Chunk Fast — Skeleton While Waiting
Between form submit and the first token arriving, show a skeleton:
<template>
<!-- Submitted but no content yet — show skeleton -->
<div v-if="isStreaming && !content" class="skeleton">
<div class="skeleton-line w-3/4" />
<div class="skeleton-line w-full" />
<div class="skeleton-line w-1/2" />
</div>
<!-- Content arriving — show it -->
<div v-else-if="content" class="response">
{{ content }}
<span v-if="isStreaming" class="cursor">▌</span>
</div>
</template>
Nginx Configuration for SSE
Without proper Nginx configuration, SSE won’t work. Nginx buffers responses by default:
# In your server block or location block
location /api/ai/stream {
proxy_pass http://your_upstream;
proxy_buffering off; # Critical — disable buffering for SSE
proxy_cache off;
proxy_set_header X-Accel-Buffering no;
# Keep the connection alive
proxy_read_timeout 300s; # 5 minutes — AI generation can be slow
proxy_send_timeout 300s;
# SSE-specific headers
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding on;
}
Testing Streaming Endpoints
// tests/Feature/AiStreamControllerTest.php
class AiStreamControllerTest extends TestCase
{
use RefreshDatabase;
public function test_stream_returns_sse_content_type(): void
{
$user = User::factory()->create();
// Fake the AI response
AI::fake([
'chunks' => [
new TextChunk('Hello'),
new TextChunk(' world'),
new TextChunk('!'),
]
]);
$response = $this->actingAs($user)
->postJson('/api/ai/stream', ['prompt' => 'Say hello'])
$response->assertOk()
->assertHeader('Content-Type', 'text/event-stream');
}
public function test_stream_returns_done_event(): void
{
AI::fake(['chunks' => [new TextChunk('Test')]]);
$response = $this->actingAs(User::factory()->create())
->postJson('/api/ai/stream', ['prompt' => 'Test'])
$this->assertStringContainsString('"type":"done"', $response->content());
}
public function test_stream_handles_ai_error_gracefully(): void
{
AI::fake()->throws(new \Exception('AI unavailable'));
$response = $this->actingAs(User::factory()->create())
->postJson('/api/ai/stream', ['prompt' => 'Test'])
$this->assertStringContainsString('"type":"error"', $response->content());
}
}
The Complete Checklist
Backend:
✓ response()->stream() with correct SSE headers
✓ ob_end_flush() called at the start of the stream closure
✓ X-Accel-Buffering: no header set
✓ flush() called after every echo
✓ done event sent when stream completes
✓ error event sent when exceptions occur (typed errors per exception class)
✓ Rate limiting configured for the streaming endpoint
✓ Timeout configured (proxy_read_timeout in Nginx)
Frontend:
✓ fetch() with ReadableStream (not EventSource — POST support)
✓ AbortController for cancellation
✓ SSE buffer accumulated and split on \n correctly
✓ JSON parse wrapped in try/catch (malformed chunks possible)
✓ isStreaming set to false in finally block (not just on done/error)
UX:
✓ Blinking cursor shown while streaming
✓ Skeleton shown between submit and first token
✓ Auto-scroll follows new content
✓ Auto-scroll pauses if user scrolls up
✓ "Stop generating" button visible while streaming
✓ Partial response kept on cancel (not cleared)
✓ Markdown rendered progressively if applicable
Infrastructure:
✓ Nginx proxy_buffering off for SSE endpoint
✓ proxy_read_timeout set to at least 120s
✓ chunked_transfer_encoding on in Nginx
Final Thoughts
Streaming AI responses is the difference between an AI feature that feels like a tool and one that feels like a conversation. The technical implementation is not complex — a response()->stream() wrapper, the right HTTP headers, and a fetch-based stream reader. The UX details — the cursor, the scroll-follow, the cancel button, the skeleton — are what separate polished from amateur.
The most common failure in production is Nginx buffering. Everything works in php artisan serve, then breaks on the actual server. The proxy_buffering off and X-Accel-Buffering: no configuration is always the fix.
Build the streaming first. Then layer the UX details. Users notice streaming before they notice anything else about your AI feature — it’s the first thing that signals whether the product was built thoughtfully or assembled hastily.
