Vue 3 + AI: How to Build a Streaming Chat Interface That Doesn’t Feel Like a Loading Spinner

Server-sent events in Vue, composables for streaming state, token-by-token rendering, the auto-scroll trick that feels like magic, typing indicators, error recovery mid-stream, and abort on demand — everything the “build a chatbot in 5 minutes” tutorials skip.

Every “build an AI chatbot in Vue” tutorial shows the same thing: a text input, a button, an API call, a loading spinner, then the full response drops in. Users stare at a blank box for three seconds. Then text. The experience feels like a form submission, not a conversation. It doesn’t feel like ChatGPT because ChatGPT doesn’t wait for the full response — it streams tokens as they arrive, giving the illusion of the AI thinking in real time. That illusion is the product. This post builds it from scratch, properly.

Why Streaming Changes the Experience

The psychological difference between waiting and watching matters more than raw speed. A response that takes four seconds to stream token-by-token feels faster than a two-second wait followed by a full text drop. The user’s attention is engaged from the first token. They’re reading while the model is generating.

The technical mechanism is Server-Sent Events (SSE) — a browser-native protocol for receiving a stream of events over a single HTTP connection. The server sends events as they’re ready; the browser receives them without polling. The fetch() API with ReadableStream gives you the same capability without the EventSource limitations (no headers, no POST).

The architecture:

User types → Vue sends POST to Laravel endpoint
            → Laravel calls OpenAI/Anthropic with stream: true
            → Laravel streams the response as SSE
            → Vue reads the stream token-by-token
            → Each token appends to the message in reactive state
            → The DOM updates automatically as tokens arrive

The Laravel streaming endpoint is covered in the Streaming AI in Laravel post. This post covers everything on the Vue side — the composable, the rendering, the scroll behaviour, the abort, and the edge cases.

The Composable That Owns Streaming State

Streaming state doesn’t belong in a component. It belongs in a composable — reusable, testable, and completely decoupled from the template that renders it.

// composables/useChat.ts
import { ref, readonly, nextTick } from 'vue'

export interface Message {
  id:        string
  role:      'user' | 'assistant'
  content:   string
  status:    'complete' | 'streaming' | 'error'
  createdAt: Date
}

export function useChat(endpoint: string = '/api/chat') {
  const messages    = ref<Message[]>([])
  const isStreaming = ref(false)
  const error       = ref<string | null>(null)

  // The AbortController lives here — sendMessage creates a new one per request
  let abortController: AbortController | null = null

  async function sendMessage(content: string): Promise<void> {
    if (isStreaming.value || !content.trim()) return

    error.value = null

    // Push the user message immediately — don't wait for the server
    const userMessage: Message = {
      id:        crypto.randomUUID(),
      role:      'user',
      content:   content.trim(),
      status:    'complete',
      createdAt: new Date(),
    }
    messages.value.push(userMessage)

    // Push a placeholder for the assistant response
    const assistantMessage: Message = {
      id:        crypto.randomUUID(),
      role:      'assistant',
      content:   '',           // starts empty — tokens append here
      status:    'streaming',  // drives typing indicator
      createdAt: new Date(),
    }
    messages.value.push(assistantMessage)

    isStreaming.value   = true
    abortController     = new AbortController()

    try {
      await streamResponse(assistantMessage.id, content)
    } catch (err) {
      if ((err as Error).name === 'AbortError') {
        // User aborted — mark the message as complete at wherever it stopped
        updateMessage(assistantMessage.id, { status: 'complete' })
      } else {
        updateMessage(assistantMessage.id, {
          content: '',
          status:  'error',
        })
        error.value = (err as Error).message
      }
    } finally {
      isStreaming.value = false
      abortController   = null
    }
  }

  async function streamResponse(messageId: string, userContent: string): Promise<void> {
    const response = await fetch(endpoint, {
      method:  'POST',
      headers: {
        'Content-Type': 'application/json',
        'Accept':        'text/event-stream',
        'X-CSRF-TOKEN':  getCsrfToken(),
      },
      body:   JSON.stringify({
        messages: buildMessageHistory(userContent),
      }),
      signal: abortController!.signal,
    })

    if (!response.ok) {
      const body = await response.json().catch(() => ({}))
      throw new Error(body.message ?? `Server error: ${response.status}`)
    }

    if (!response.body) {
      throw new Error('Response body is empty')
    }

    await readStream(response.body, messageId)
  }

  async function readStream(body: ReadableStream<Uint8Array>, messageId: string): Promise<void> {
    const reader  = body.getReader()
    const decoder = new TextDecoder()
    let   buffer  = ''

    while (true) {
      const { done, value } = await reader.read()

      if (done) {
        // Flush any remaining buffered content
        if (buffer.trim()) {
          processChunk(buffer, messageId)
        }
        updateMessage(messageId, { status: 'complete' })
        break
      }

      // Decode the raw bytes to string
      buffer += decoder.decode(value, { stream: true })

      // Process complete SSE events — split on double newline
      const events = buffer.split('\n\n')

      // Keep the last incomplete event in the buffer
      buffer = events.pop() ?? ''

      for (const event of events) {
        processChunk(event, messageId)
      }
    }
  }

  function processChunk(raw: string, messageId: string): void {
    // SSE format: "data: <content>\n"
    for (const line of raw.split('\n')) {
      if (!line.startsWith('data: ')) continue

      const data = line.slice(6).trim()

      if (data === '[DONE]') return    // OpenAI stream terminator

      try {
        const parsed = JSON.parse(data)
        const token  = parsed.choices?.[0]?.delta?.content  // OpenAI format
                    ?? parsed.delta?.text                     // Anthropic format
                    ?? parsed.content                         // generic
                    ?? ''

        if (token) {
          appendToken(messageId, token)
        }
      } catch {
        // Non-JSON data line — append as-is (some providers stream raw text)
        if (data) appendToken(messageId, data)
      }
    }
  }

  function appendToken(messageId: string, token: string): void {
    const index = messages.value.findIndex(m => m.id === messageId)
    if (index === -1) return
    messages.value[index].content += token
  }

  function updateMessage(messageId: string, patch: Partial<Message>): void {
    const index = messages.value.findIndex(m => m.id === messageId)
    if (index === -1) return
    Object.assign(messages.value[index], patch)
  }

  function buildMessageHistory(currentContent: string) {
    // Send the full conversation history for context
    // Exclude the last (empty) assistant placeholder
    return messages.value
      .filter(m => m.status === 'complete' && m.role !== 'assistant' || m.content)
      .slice(0, -1)  // exclude the empty assistant placeholder we just pushed
      .map(m => ({ role: m.role, content: m.content }))
  }

  function abort(): void {
    abortController?.abort()
  }

  function clearMessages(): void {
    if (isStreaming.value) abort()
    messages.value = []
    error.value    = null
  }

  function getCsrfToken(): string {
    return document
      .querySelector<HTMLMetaElement>('meta[name="csrf-token"]')
      ?.content ?? ''
  }

  return {
    messages:    readonly(messages),
    isStreaming: readonly(isStreaming),
    error:       readonly(error),
    sendMessage,
    abort,
    clearMessages,
  }
}

The composable is the only place that touches the stream. The component gets messages, isStreaming, error, and three functions. It never sees the ReadableStream, the decoder, or the buffer.

The Auto-Scroll Trick

Auto-scroll during streaming requires care. The naive implementation — scroll to bottom after every token — creates jitter if the user tries to scroll up to read earlier messages. The correct behaviour: scroll automatically only when the user is already near the bottom. If they’ve scrolled up, respect it.

// composables/useAutoScroll.ts
import { ref, watch, nextTick } from 'vue'
import type { Ref } from 'vue'
import type { Message } from './useChat'

export function useAutoScroll(
  containerRef: Ref<HTMLElement | null>,
  messages: Ref<readonly Message[]>,
) {
  // Track whether the user is near the bottom
  const isNearBottom  = ref(true)
  const THRESHOLD_PX  = 80  // within 80px of bottom = "near bottom"

  function onScroll(): void {
    const el = containerRef.value
    if (!el) return

    const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight
    isNearBottom.value = distanceFromBottom <= THRESHOLD_PX
  }

  async function scrollToBottom(behavior: ScrollBehavior = 'smooth'): Promise<void> {
    await nextTick()
    const el = containerRef.value
    if (!el) return
    el.scrollTo({ top: el.scrollHeight, behavior })
  }

  // Watch messages — scroll only if near bottom
  watch(
    messages,
    async () => {
      if (isNearBottom.value) {
        await scrollToBottom('smooth')
      }
    },
    { deep: true },
  )

  // When a new user message is sent, always scroll down
  function scrollToBottomForce(): void {
    isNearBottom.value = true
    scrollToBottom('smooth')
  }

  return {
    onScroll,
    scrollToBottomForce,
    isNearBottom,
  }
}

The deep: true watcher fires when any nested property of any message changes — including content as tokens append. The THRESHOLD_PX value is calibrated: 80px means the user can accidentally scroll one notch without losing auto-scroll, but an intentional scroll up disables it.

The Typing Indicator

The status: 'streaming' field on the assistant message drives a typing indicator that appears at the end of the content while the model is generating. Three approaches, in order of how much they actually look right:

Option 1: CSS cursor blink (simplest, most authentic):

<!-- The streaming cursor that blinks at the end of content -->
<span
  v-if="message.status === 'streaming' && !message.content"
  class="typing-cursor"
/>

<style scoped>
.typing-cursor {
  display: inline-block;
  width: 2px;
  height: 1.1em;
  background: currentColor;
  vertical-align: text-bottom;
  animation: blink 1s step-end infinite;
  margin-left: 1px;
}

@keyframes blink {
  0%, 100% { opacity: 1; }
  50%       { opacity: 0; }
}
</style>

Option 2: Animated dots (for the “assistant is thinking” state before first token):

<template>
  <div v-if="message.status === 'streaming' && !message.content" class="typing-dots">
    <span /><span /><span />
  </div>
  <div v-else>{{ message.content }}</div>
</template>

<style scoped>
.typing-dots {
  display: flex;
  gap: 4px;
  align-items: center;
  padding: 4px 0;
}

.typing-dots span {
  width:  8px;
  height: 8px;
  border-radius: 50%;
  background: currentColor;
  opacity: 0.4;
  animation: dot-pulse 1.4s ease-in-out infinite;
}

.typing-dots span:nth-child(2) { animation-delay: 0.2s; }
.typing-dots span:nth-child(3) { animation-delay: 0.4s; }

@keyframes dot-pulse {
  0%, 80%, 100% { opacity: 0.4; transform: scale(1); }
  40%           { opacity: 1;   transform: scale(1.2); }
}
</style>

The two indicators compose: dots while content is empty (model hasn’t started generating yet), blinking cursor while content is arriving.

The Full Chat Component

<!-- ChatInterface.vue -->
<script setup lang="ts">
import { ref, onMounted } from 'vue'
import { useChat }        from '@/composables/useChat'
import { useAutoScroll }  from '@/composables/useAutoScroll'
import ChatMessage        from './ChatMessage.vue'

const { messages, isStreaming, error, sendMessage, abort, clearMessages } = useChat()

const inputValue   = ref('')
const containerRef = ref<HTMLElement | null>(null)

const { onScroll, scrollToBottomForce } = useAutoScroll(containerRef, messages)

async function handleSubmit(): Promise<void> {
  const content = inputValue.value.trim()
  if (!content || isStreaming.value) return

  inputValue.value = ''
  scrollToBottomForce()

  await sendMessage(content)
}

function handleKeydown(event: KeyboardEvent): void {
  if (event.key === 'Enter' && !event.shiftKey) {
    event.preventDefault()
    handleSubmit()
  }
  // Shift+Enter inserts a newline — default behaviour, no handling needed
}
</script>

<template>
  <div class="chat-interface">

    <!-- Message list -->
    <div
      ref="containerRef"
      class="message-container"
      @scroll="onScroll"
    >
      <div v-if="messages.length === 0" class="empty-state">
        <p>Start a conversation.</p>
      </div>

      <ChatMessage
        v-for="message in messages"
        :key="message.id"
        :message="message"
      />

      <!-- Error recovery -->
      <div v-if="error" class="error-banner">
        <p>{{ error }}</p>
        <button @click="sendMessage(messages.at(-2)?.content ?? '')">
          Try again
        </button>
      </div>
    </div>

    <!-- Input area -->
    <div class="input-area">
      <textarea
        v-model="inputValue"
        placeholder="Message…"
        rows="1"
        :disabled="isStreaming"
        @keydown="handleKeydown"
      />

      <!-- Send or Abort — swap based on streaming state -->
      <button v-if="!isStreaming" :disabled="!inputValue.trim()" @click="handleSubmit">
        Send
      </button>
      <button v-else class="abort-button" @click="abort">
        Stop
      </button>
    </div>

  </div>
</template>

Notice what’s not in the template: no stream reading, no buffer management, no abort controller wiring. The component handles user interaction and renders state. The composable owns everything else.

The Message Component With Markdown Rendering

AI responses typically contain markdown — code blocks, bold text, lists. Rendering raw content as text is a degraded experience. Use marked for parsing and highlight.js for code blocks:

npm install marked highlight.js
npm install -D @types/marked

<!-- ChatMessage.vue -->
<script setup lang="ts">
import { computed }          from 'vue'
import { marked }            from 'marked'
import hljs                  from 'highlight.js'
import type { Message }      from '@/composables/useChat'

const props = defineProps<{ message: Message }>()

// Configure marked once — syntax highlighting for code blocks
marked.setOptions({
  highlight: (code: string, lang: string) => {
    if (lang && hljs.getLanguage(lang)) {
      return hljs.highlight(code, { language: lang }).value
    }
    return hljs.highlightAuto(code).value
  },
  breaks:   true,   // newlines become <br>
  gfm:      true,   // GitHub Flavored Markdown
})

const renderedContent = computed((): string => {
  if (!props.message.content) return ''
  return marked(props.message.content) as string
})

const isStreaming = computed(() => props.message.status === 'streaming')
const hasContent  = computed(() => props.message.content.length > 0)
</script>

<template>
  <div :class="['message', `message--${message.role}`, { 'message--streaming': isStreaming }]">

    <!-- User message — plain text -->
    <div v-if="message.role === 'user'" class="message__content">
      {{ message.content }}
    </div>

    <!-- Assistant — streaming or complete -->
    <div v-else class="message__content">

      <!-- Typing dots: streaming, no content yet -->
      <div v-if="isStreaming && !hasContent" class="typing-dots">
        <span /><span /><span />
      </div>

      <!-- Rendered markdown: content has arrived -->
      <!-- v-html is safe here — content comes from the AI, not from user input -->
      <div
        v-else
        class="prose"
        v-html="renderedContent"
      />

      <!-- Blinking cursor: streaming, content is arriving -->
      <span v-if="isStreaming && hasContent" class="typing-cursor" />

    </div>

    <!-- Error state -->
    <div v-if="message.status === 'error'" class="message__error">
      Failed to generate response.
    </div>

  </div>
</template>

The v-html concern is worth naming: the content comes from the AI provider, not from user input rendered back to themselves. In a multi-user chat where one user’s messages are rendered in another’s browser, sanitise the output. In a personal chat interface, it’s not a concern.

Error Recovery Mid-Stream

The composable handles three error categories:

// Already in useChat.ts — expanded for clarity

async function streamResponse(messageId: string, userContent: string): Promise<void> {
  let response: Response

  try {
    response = await fetch(endpoint, { /* ... */ })
  } catch (networkError) {
    // Connection failed before the stream started
    // The request never reached the server
    throw new Error('Connection failed. Check your network.')
  }

  if (response.status === 429) {
    throw new Error('Rate limit reached. Wait a moment and try again.')
  }

  if (response.status === 401) {
    throw new Error('Session expired. Refresh the page.')
  }

  if (!response.ok) {
    const body = await response.json().catch(() => ({}))
    throw new Error(body.message ?? `Server error (${response.status})`)
  }

  try {
    await readStream(response.body!, messageId)
  } catch (streamError) {
    if ((streamError as Error).name === 'AbortError') throw streamError

    // Stream started but broke mid-way
    // The assistant message has partial content — preserve it
    updateMessage(messageId, { status: 'error' })
    throw new Error('Stream interrupted. The response may be incomplete.')
  }
}

The mid-stream failure case is the most important to handle correctly. When the stream breaks after content has started arriving, the partial response is more useful than nothing. The composable marks the message as status: 'error' but preserves whatever content arrived. The template renders the partial content with an error indicator, not a blank message.

The retry in the template:

<button @click="sendMessage(messages.at(-2)?.content ?? '')">
  Try again
</button>

messages.at(-2) is the last user message — the one immediately before the failed assistant response. Clicking “Try again” replays it. The composable pushes a fresh assistant placeholder and starts a new stream.

Abort on Demand

The Stop button is not just a UX nicety. On mobile networks or slow connections, aborting an unwanted long response is essential. The AbortController in the composable makes this a single line:

function abort(): void {
  abortController?.abort()
}

When abort() is called, the fetch rejects with an AbortError. The catch block in sendMessage detects it by name and marks the message as 'complete' at wherever it stopped — rather than 'error'. The user sees the partial response; the error banner doesn’t appear.

The Stop button’s location matters. It should be exactly where the Send button was — same position, same size. Muscle memory sends the user to that spot; the button should be there.

<!-- Swap in-place — no layout shift -->
<button v-if="!isStreaming" @click="handleSubmit">Send</button>
<button v-else             @click="abort">Stop</button>

v-if / v-else on elements of the same type and size means Vue replaces the text and event handler with no layout change. If you use different elements or sizes, the input area shifts when streaming starts and stops — noticeable, distracting.

The Textarea That Grows

A fixed-height input is the wrong choice for a chat interface where users write multi-sentence messages. The textarea should expand as the user types and shrink when they delete:

<script setup lang="ts">
import { ref, watch, nextTick } from 'vue'

const inputValue = ref('')
const textareaRef = ref<HTMLTextAreaElement | null>(null)

watch(inputValue, async () => {
  await nextTick()
  const el = textareaRef.value
  if (!el) return

  el.style.height = 'auto'            // reset to minimum
  el.style.height = `${el.scrollHeight}px` // grow to fit content
})
</script>

<template>
  <textarea
    ref="textareaRef"
    v-model="inputValue"
    rows="1"
    style="resize: none; overflow: hidden; min-height: 40px; max-height: 200px;"
    @keydown="handleKeydown"
  />
</template>

overflow: hidden prevents the scrollbar from appearing as it grows. max-height: 200px caps expansion — beyond that, the textarea scrolls internally. resize: none removes the resize handle, which serves no purpose in a chat input that auto-resizes.

Composing It Into a Page

The full setup is three files: useChat.ts, useAutoScroll.ts, and the component tree. A page component that wires them together:

<!-- pages/ChatPage.vue -->
<script setup lang="ts">
import ChatInterface from '@/components/ChatInterface.vue'
</script>

<template>
  <div class="chat-page">
    <header class="chat-header">
      <h1>Assistant</h1>
    </header>

    <main class="chat-main">
      <ChatInterface />
    </main>
  </div>
</template>

<style scoped>
.chat-page {
  display:        flex;
  flex-direction: column;
  height:         100dvh;  /* dvh — accounts for mobile browser chrome */
}

.chat-main {
  flex:     1;
  overflow: hidden;  /* contain the scrollable message list */
}
</style>

100dvh instead of 100vh is the mobile detail that trips most implementations. On mobile Safari, 100vh includes the browser chrome — the address bar and tab bar. The chat input ends up behind the keyboard. 100dvh (dynamic viewport height) adjusts for the visible viewport, which is what you actually want.

What the Tutorials Skip

The gap between a streaming chat that works and one that feels right is mostly in the details: the scroll threshold that respects intentional scrolling, the abort that preserves partial content instead of clearing it, the Stop button that doesn’t shift the layout, the textarea that grows instead of wrapping, 100dvh instead of 100vh.

None of these are hard to implement. They just require knowing they’re problems before you encounter users hitting them. The composable pattern pays off here — when you need to fix any of these, the fix lives in one place and the component doesn’t change.

The full code handles everything: stream reading with a buffer, SSE parsing, token appending, auto-scroll with user intent detection, markdown rendering, typing indicators in two states, error recovery with partial content preservation, and abort with status differentiation. That’s a shipping chat interface, not a demo.