Guest User 10/27/25 Guest User 10/27/25

Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video

Foley Control is a lightweight approach to video-guided Foley that keeps pretrained single-modality models frozen and learns only a small cross-attention bridge between them.

Guest User 10/10/25 Guest User 10/10/25

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently.

Guest User 10/2/25 Guest User 10/2/25

ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction

We introduce Reservoir SWD (ReSWD), which integrates Weighted Reservoir Sampling into SWD to adaptively retain informative projection directions in optimization steps, resulting in stable gradients while remaining unbiased.

Guest User 10/1/25 Guest User 10/1/25

Stable Cinemetrics: Structured Taxonomy and Evaluation for Professional Video Generation

We introduce Stable Cinemetrics, a structured evaluation framework that formalizes filmmaking controls into four disentangled, hierarchical taxonomies: Setup, Event, Lighting, and Camera.

Guest User 9/30/25 Guest User 9/30/25

Music and Artificial Intelligence: Artistic Trends

We study how musicians use artificial intelligence (AI) across formats like singles, albums, performances, installations, voices, ballets, operas, or soundtracks.

Guest User 9/26/25 Guest User 9/26/25

SD3.5-Flash: Distribution-Guided Distillation of Generative Flows

We present SD3.5-Flash, an efficient few-step distillation framework that brings high-quality image generation to accessible consumer devices.

Guest User 9/17/25 Guest User 9/17/25

Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

We present Stable Part Diffusion 4D (SP4D), a framework for generating paired RGB and kinematic part videos from monocular inputs.

Guest User 6/5/25 Guest User 6/5/25

MARBLE: Material Recomposition and Blending in CLIP-Space

Editing materials of objects in images based on exemplar images is an active area of research in computer vision and graphics. We propose MARBLE, a method for performing material blending and recomposing fine-grained material properties by finding material embeddings in CLIP-space and using that to control pre-trained text-to-image models.

Guest User 5/13/25 Guest User 5/13/25

Fast Text-to-Audio Generation with Adversarial Post-Training

We present Adversarial Relativistic-Contrastive (ARC) post-training, the first adversarial acceleration algorithm for diffusion/flow models not based on distillation.

Guest User 4/21/25 Guest User 4/21/25

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

We present a novel framework for generating high-quality, animatable 4D avatar from a single image. While recent advances have shown promising results in 4D avatar creation, existing methods either require extensive multiview data or struggle with shape accuracy and identity consistency.

Guest User 3/25/25 Guest User 3/25/25

SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation

We present Stable Video 4D 2.0 (SV4D 2.0), a multi-view video diffusion model for dynamic 3D asset generation. Compared to its predecessor SV4D, SV4D 2.0 is more robust to occlusions and large motion, generalizes better to real-world videos, and produces higher-quality outputs in terms of detail sharpness and spatio-temporal consistency.

Guest User 3/18/25 Guest User 3/18/25

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator.

Guest User 3/18/25 Guest User 3/18/25

Stable Virtual Camera: Multi-View Video Generation with 3D Camera Control

We present Stable Virtual Camera, a generalist diffusion model that creates novel views of a scene, given any number of input views and target cameras.

Guest User 1/8/25 Guest User 1/8/25

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions.

Guest User 8/1/24 Guest User 8/1/24

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

We present SF3D, a novel method for rapid and high-quality textured object mesh reconstruction from a single image in just 0.5 seconds.

Guest User 7/24/24 Guest User 7/24/24

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation.

Guest User 7/19/24 Guest User 7/19/24

Stable Audio Open

Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics.

Guest User 4/15/24 Guest User 4/15/24

Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints

This workshop paper highlights the limitations of generative AI tools in translating digital creations into the physical world and proposes new augmentations to generative AI tools for creating physically viable 3D models.

Guest User 3/18/24 Guest User 3/18/24

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object.

Guest User 3/5/24 Guest User 3/5/24

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales.

Exploring the Latest Advancements in AI Research

Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction

Stable Cinemetrics: Structured Taxonomy and Evaluation for Professional Video Generation

Music and Artificial Intelligence: Artistic Trends

SD3.5-Flash: Distribution-Guided Distillation of Generative Flows

Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

MARBLE: Material Recomposition and Blending in CLIP-Space

Fast Text-to-Audio Generation with Adversarial Post-Training

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Stable Virtual Camera: Multi-View Video Generation with 3D Camera Control

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Stable Audio Open

Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Company

Models

Deployment

ResourceS

Contact Us

Legal

Applications

Join the Mailing List