Most colorization models condition only on a single reference, typically the first frame of the scene. However, this approach ignores other sources of conditional data, such as character sheets, background images, or arbitrary colorized frames. We propose TimeColor, a sketch-based video colorization model that supports heterogeneous, variable-count references with the use of explicit per-reference region assignment. TimeColor encodes references as additional latent frames which are concatenated temporally, permitting them to be processed concurrently in each diffusion step while keeping the model's parameter count fixed. TimeColor also uses spatiotemporal correspondence-masked attention to enforce subject--reference binding in addition to modality-disjoint RoPE indexing. These mechanisms mitigate shortcutting and cross-identity palette leakage. Experiments on SAKUGA-42M under both single- and multi-reference protocols show that TimeColor improves color fidelity, identity consistency, and temporal stability over prior baselines.
Stack flexible amount of subject and background references across time, colorize the entire scene.
Use a single-frame reference from any moment to guide the colorization
From the first frame, color the whole scene
Demonstrating reference/scene reusability with consistent 720×480 displays
Methods: Ours, LVCD, ToonCrafter, AniDoc, LongAnimation, ToonComposer
All images, videos, and related materials on this page are provided exclusively for academic and research use. TimeColor is a research project and is not intended for commercial applications.