All models

GPT-4o Transcribe

openai/gpt-4o-transcribe
OpenAI Speech-to-TextTranscriptionMultilingual

A speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.

Quick start

# Inspect the price — a plain request returns the 402 challenge:
curl -i https://api.glianalabs.com/v1/infer \
  -H "content-type: application/json" \
  -d '{
    "model": "openai/gpt-4o-transcribe",
    "file": <string>,
    "prompt": <string>
  }'

# Pay + run in one step with the mppx CLI (create a wallet: npx mppx account create):
npx mppx https://api.glianalabs.com/v1/infer \
  -J '{"model": "openai/gpt-4o-transcribe", "file": "<string>", "prompt": "<string>"}'

Parameters

Input
file string required

The audio file as a data URI (data:audio/...;base64,...) or HTTPS URL. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

language string

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

prompt string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

temperature number

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 0 if omitted.

Output
text: The transcribed text.