branch:
README.md
1805 bytesRaw
# Voice Input
Voice-to-text dictation example using the `useVoiceInput` hook from `@cloudflare/voice`.
Captures microphone audio, streams it to a PartyServer-based Durable Object for real-time speech-to-text using Workers AI, and displays the transcript in a text area.
## Run it
```bash
npm install && npm start
```
No API keys needed — uses Workers AI (bound via `wrangler.jsonc`).
## How it works
### Server (`src/server.ts`)
Uses `withVoiceInput` — a lightweight mixin that only does STT. No TTS provider, no `onTurn` handler needed:
```typescript
import { Server } from "partyserver";
import { withVoiceInput, WorkersAIFluxSTT } from "@cloudflare/voice";
const InputServer = withVoiceInput(Server);
export class VoiceInputAgent extends InputServer<Env> {
streamingStt = new WorkersAIFluxSTT(this.env.AI);
onTranscript(text, connection) {
console.log("User said:", text);
}
}
```
### Client (`src/client.tsx`)
Uses `useVoiceInput` — a lightweight React hook that accumulates transcripts into a single string:
```tsx
import { useVoiceInput } from "@cloudflare/voice/react";
const { transcript, interimTranscript, isListening, start, stop, clear } =
useVoiceInput({ agent: "VoiceInputAgent" });
```
Returns:
- **`transcript`** — accumulated final text from all utterances
- **`interimTranscript`** — real-time partial transcript (updates as you speak)
- **`isListening`** — whether the mic is active
- **`audioLevel`** — current audio level for visual feedback
- **`start()` / `stop()`** — control listening
- **`toggleMute()`** — mute without stopping
- **`clear()`** — reset the transcript
## Related
- [`examples/playground`](../playground) — full voice agent with conversation
- [`@cloudflare/voice`](../../packages/voice) — the voice package