TimeColor: Flexible Reference Colorization via Temporal-Channel Concatenation

🎵 For the best viewing experience, go full screen and turn up your volume 🎵

Abstract

Colorization in animation remains labor-intensive task. Existing automatic approaches typically support only a single reference, often restricted to the colored first frame. We present TimeColor, a Diffusion-Transformer-based sketched video colorization system that accepts a flexible number and variety of references with fixed parameter count. Unlike methods tied to a single first-frame exemplar, TimeColor ingests a variable-size bank of colored image references, including the first frame, arbitrary frames, or per-subject character sheets, and employs temporal-channel concatenation with custom RoPE indexing across modalities to achieve accurate per-subject colorization. During training, we further encourage per-subject correspondence with a latent identity map that provides spatially localized guidance, reducing cross-subject leakage while avoiding channel blow-up. Evaluated on the SAKUGA-42M test set using SSIM, PSNR, LPIPS, FVD, and FID, TimeColor attains higher colorization quality and temporal consistency than prior baselines across diverse reference types.

Source Code, paper, and other details are coming pretty soon. Please stay tuned!

Disclaimer

All images, videos, and related materials on this page are provided exclusively for academic and research use. TimeColor is a research project and is not intended for commercial applications.