AssemblyAI Universal-3 Pro
AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.
Quick start
# Inspect the price — a plain request returns the 402 challenge:
curl -i https://api.glianalabs.com/v1/infer \
-H "content-type: application/json" \
-d '{
"model": "assemblyai/universal-3-pro",
"prompt": <string>
}'
# Pay + run in one step with the mppx CLI (create a wallet: npx mppx account create):
npx mppx https://api.glianalabs.com/v1/infer \
-J '{"model": "assemblyai/universal-3-pro", "prompt": "<string>"}'Parameters
Timestamp (in milliseconds) to end transcription at.
Timestamp (in milliseconds) to start transcription from.
The URL of the audio file to transcribe. Can be a publicly accessible URL or a data URI (data:audio/...;base64,...). For data URIs, the audio will be uploaded to AssemblyAI automatically. Required for pre-recorded transcription (when stream is false or not set).
Enable automatic chapter detection.
Enable automatic extraction of key phrases and highlights.
How much to boost the words in word_boost.
Enable content safety detection for sensitive content.
Custom spelling rules to replace specific words or phrases in the transcription output.
Include filler words like "um", "uh", etc. in the transcript.
Domain-specific transcription mode. "medical-v1" enables medical terminology optimization.
Process audio as dual-channel (stereo) for better accuracy.
Enable detection of entities like names, organizations, and locations.
Filter profanity from the transcription.
Enable IAB (Interactive Advertising Bureau) content taxonomy classification.
An array of up to 1,000 words or phrases (max 6 words per phrase) to improve transcription accuracy. Cannot be used with the prompt parameter.
The language code for the audio file (e.g., "en", "es", "fr"). Defaults to automatic language detection.
Enable automatic language detection. When enabled with speech_models, the system will automatically select the best model for the detected language.
Process each audio channel separately for multi-channel audio files.
A custom prompt to guide transcription style, formatting, and output characteristics. Maximum 1,500 words.
Redact personally identifiable information.
Generate a redacted audio file with PII removed.
Specific PII policies to apply for redaction.
Strategy for substituting redacted PII.
Enable sentiment analysis for each sentence.
Enable speaker diarization to identify different speakers in the audio.
Expected number of speakers for speaker diarization.
Confidence threshold for speech detection.
Controls randomness in model output (0.0-1.0). Lower values make output more deterministic. Default is 0.0.
URL to receive webhook notifications when transcription is complete.
Enable real-time WebSocket streaming for live audio transcription. When true, a WebSocket connection is established instead of submitting a pre-recorded transcription job. Cannot be used with audio_url.
Array of words to boost recognition accuracy (legacy - use keyterms_prompt instead).