Technology
LTX-2.3 Drops on Fal with Native Audio and Desktop-Grade Performance
Lightricks releases its most capable open video model yet, bringing synchronized sound generation and 4K output to both cloud APIs and consumer hardware.

Lightricks releases its most capable open video model yet, bringing synchronized sound generation and 4K output to both cloud APIs and consumer hardware.
You can now generate a 20-second video with synchronized audio in a single pass for about $1.20. That's the pitch behind LTX-2.3, which went live on fal.ai today after Lightricks quietly pushed the model to Hugging Face yesterday. The update represents more than incremental improvements. It's the first open-source video model to generate native audio alongside visuals, potentially shifting the economics of AI video production.
The timing matters. While OpenAI's Sora remains in limited preview and Runway's Gen-3 costs $10 per minute of generation, Lightricks has positioned LTX-2.3 as the production-grade alternative that creators can actually access. The model runs on consumer GPUs with 24GB of VRAM or through fal.ai's API at $0.06 per second for standard definition, scaling to $0.10 for their new Retake feature that regenerates scenes while preserving composition.
According to The Neuron, the model introduces a redesigned VAE (variational autoencoder) that produces sharper textures and more consistent motion across frames. Reddit discussions from StableDiffusion users confirm they are already experimenting with the gated attention text connector, which appears to improve how closely generated videos match text prompts, a persistent weakness in previous generations.
The technical architecture shows careful engineering choices. Built on a DiT (Diffusion Transformer) foundation, LTX-2.3 handles both text-to-video and image-to-video workflows. The model generates up to 20-second clips at 768x512 resolution, with 4K upscaling available through fal.ai's pipeline. Portrait mode at 9:16 ratio suggests Lightricks understands where most video content actually lives: vertical feeds.
The audio generation breaks new ground. Previous open models required separate audio synthesis and manual synchronization. LTX-2.3 generates both streams simultaneously, though early tests suggest the audio remains fairly basic: ambient sounds and simple musical patterns rather than detailed soundscapes or dialogue.
Get the latest model rankings, product launches, and evaluation insights delivered to your inbox.
The Apache 2.0 license removes commercial restrictions that hampered adoption of earlier models. Combined with LoRA fine-tuning capabilities, this positions LTX-2.3 as infrastructure rather than just a tool. Developers can train custom versions on specific visual styles or content types without licensing negotiations.
Fal.ai's implementation includes a Fast mode that trades quality for iteration speed, useful for creators testing concepts before committing to full renders. The Retake endpoint, priced slightly higher at $0.10 per second, allows regenerating specific scenes while maintaining the original structure. This addresses a common workflow bottleneck where one problematic segment ruins an otherwise usable generation.
The model card on Hugging Face states training on publicly available datasets, though Lightricks declined to specify sources. This opacity remains standard practice across the industry, even as concerns about training data provenance intensify.
Performance on consumer hardware appears viable but demanding. Users report successful local runs on RTX 4090s, though generation times stretch to several minutes per clip compared to seconds on fal.ai's infrastructure. The tradeoff between privacy and speed will likely determine adoption patterns.
Native audio generation eliminates post-production synchronization for simple projects. Apache 2.0 licensing enables commercial use without revenue sharing. LoRA capabilities allow custom training for brand-specific visual styles. Desktop deployment preserves IP control for sensitive content. The Retake feature reduces waste from partial regeneration needs.
The real test comes next quarter when creators start shipping content built entirely on LTX-2.3's pipeline. If the model proves stable enough for production workflows, it could establish the price floor for AI video generation, forcing larger players to justify their premium pricing with substantially better quality.
The question is whether good enough at one-tenth the price makes different kinds of video content economically viable to produce.