AssemblyAI Universal-3 Pro

Name: AssemblyAI Universal-3 Pro
Brand: AssemblyAI
Price: 0.006300 USD
Availability: InStock

assemblyai/universal-3-pro

AssemblyAI Speech-to-TextTranscriptionMultilingual

AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.

Quick start

# Inspect the price — a plain request returns the 402 challenge:
curl -i https://api.glianalabs.com/v1/infer \
  -H "content-type: application/json" \
  -d '{
    "model": "assemblyai/universal-3-pro",
    "audio_url": "https://example.com/input.mp3"
  }'

# Pay + run in one step with the mppx CLI (create a wallet: npx mppx account create):
npx mppx https://api.glianalabs.com/v1/infer \
  -J '{"model": "assemblyai/universal-3-pro", "audio_url": "https://example.com/input.mp3"}'

Parameters

Input

audio_url string required

The URL of the audio file to transcribe. Can be a publicly accessible URL or a data URI (data:audio/...;base64,...). For data URIs, the audio will be uploaded to AssemblyAI automatically. Required for pre-recorded transcription (when stream is false or not set).

audio_end_at integer optional

Timestamp (in milliseconds) to end transcription at.

audio_start_from integer optional

Timestamp (in milliseconds) to start transcription from.

auto_chapters boolean optional

Enable automatic chapter detection.

auto_highlights boolean optional

Enable automatic extraction of key phrases and highlights.

boost_param string optional

How much to boost the words in word_boost.

content_safety boolean optional

Enable content safety detection for sensitive content.

custom_spelling array optional

Custom spelling rules to replace specific words or phrases in the transcription output.

disfluencies boolean optional

Include filler words like "um", "uh", etc. in the transcript.

domain string optional

Domain-specific transcription mode. "medical-v1" enables medical terminology optimization.

dual_channel boolean optional

Process audio as dual-channel (stereo) for better accuracy.

entity_detection boolean optional

Enable detection of entities like names, organizations, and locations.

filter_profanity boolean optional

Filter profanity from the transcription.

iab_categories boolean optional

Enable IAB (Interactive Advertising Bureau) content taxonomy classification.

keyterms_prompt array optional

An array of up to 1,000 words or phrases (max 6 words per phrase) to improve transcription accuracy. Cannot be used with the prompt parameter.

language_code string optional

The language code for the audio file (e.g., "en", "es", "fr"). Defaults to automatic language detection.

language_detection boolean optional

Enable automatic language detection. When enabled with speech_models, the system will automatically select the best model for the detected language.

multichannel boolean optional

Process each audio channel separately for multi-channel audio files.

prompt string optional

A custom prompt to guide transcription style, formatting, and output characteristics. Maximum 1,500 words.

redact_pii boolean optional

Redact personally identifiable information.

redact_pii_audio boolean optional

Generate a redacted audio file with PII removed.

redact_pii_policies array optional

Specific PII policies to apply for redaction.

redact_pii_sub string optional

Strategy for substituting redacted PII.

sentiment_analysis boolean optional

Enable sentiment analysis for each sentence.

speaker_labels boolean optional

Enable speaker diarization to identify different speakers in the audio.

speakers_expected integer optional

Expected number of speakers for speaker diarization.

speech_threshold number optional

Confidence threshold for speech detection.

temperature number optional

Controls randomness in model output (0.0-1.0). Lower values make output more deterministic. Default is 0.0.

webhook_url string optional

URL to receive webhook notifications when transcription is complete.

websocket boolean optional

Enable real-time WebSocket streaming for live audio transcription. When true, a WebSocket connection is established instead of submitting a pre-recorded transcription job. Cannot be used with audio_url.

word_boost array optional

Array of words to boost recognition accuracy (legacy - use keyterms_prompt instead).

Output

confidence: Overall confidence score for the transcription.

language_code: Detected or specified language code.

language_confidence: Confidence score for language detection.

text: The transcribed text.

utterances: Speaker-separated utterances (when speaker_labels is enabled).

words: Word-level timestamps and confidence scores.