Colorization in animation remains labor-intensive task. Existing automatic approaches typically support only a single reference, often restricted to the colored first frame. We present TimeColor, a Diffusion-Transformer-based sketched video colorization system that accepts a flexible number and variety of references with fixed parameter count. Unlike methods tied to a single first-frame exemplar, TimeColor ingests a variable-size bank of colored image references, including the first frame, arbitrary frames, or per-subject character sheets, and employs temporal-channel concatenation with custom RoPE indexing across modalities to achieve accurate per-subject colorization. During training, we further encourage per-subject correspondence with a latent identity map that provides spatially localized guidance, reducing cross-subject leakage while avoiding channel blow-up. Evaluated on the SAKUGA-42M test set using SSIM, PSNR, LPIPS, FVD, and FID, TimeColor attains higher colorization quality and temporal consistency than prior baselines across diverse reference types.
All images, videos, and related materials on this page are provided exclusively for academic and research use. TimeColor is a research project and is not intended for commercial applications.