All models

Inworld TTS 1.5 Max

inworld/tts-1.5-max
Inworld TTSSpeech SynthesisLow LatencyMultilingualEmotion Control

Highest-quality text-to-speech with under 200ms latency, emotion control, and 15-language support.

Quick start

# Inspect the price — a plain request returns the 402 challenge:
curl -i https://api.glianalabs.com/v1/infer \
  -H "content-type: application/json" \
  -d '{
    "model": "inworld/tts-1.5-max",
    "text": <string>
  }'

# Pay + run in one step with the mppx CLI (create a wallet: npx mppx account create):
npx mppx https://api.glianalabs.com/v1/infer \
  -J '{"model": "inworld/tts-1.5-max", "text": "<string>"}'

Examples

Parameters

Input
apply_text_normalization boolean

When enabled, text normalization expands numbers, dates, times, and abbreviations before converting to speech. Turning this off may reduce latency.

bit_rate integer

Bits per second of the audio. Only for compressed audio formats (mp3, opus). The default is 128,000.

output_format string required

The output format for the audio. Supported formats are mp3, opus, wav, and flac. Defaults to mp3.

sample_rate integer

The synthesis sample rate in hertz. Accepts: 8000, 16000, 22050, 24000, 32000, 44100, 48000. The default is 48,000.

speaking_rate number

Speaking rate/speed, in the range [0.5, 1.5]. The default is 1.0. We recommend using values above 0.8 to ensure high quality.

temperature number required

Determines the degree of randomness when sampling audio tokens. Defaults to 1.0. Accepts values between 0 (exclusive) and 2 (inclusive). Higher values = more expressive, lower values = more deterministic.

text string required

The text to be synthesized into speech. Maximum input of 2,000 characters.

timestamp_type string required

Controls timestamp metadata returned with the audio. "word" returns word-level timing, "character" returns character-level timing. Note: adds latency. Defaults to none.

voice_id string required

The ID of the voice to use for synthesizing speech. Defaults to Dennis.

Output
audio: URL to the generated audio file