Demo of SketchColour:
Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation

Bryan Constantine Sadihin, Michael Hua Wang, Shei Pern Chua, Hang Su

Department of Computer Science
Tsinghua University

Comparative results for our Sketch-to-Color Video Diffusion model against prior work (LVCD, ToonCrafter, AniDoc) across scenes of SAKUGA validation dataset.

ArXiv Code
HOLD ON! There are a lot of videos to load here, make sure to take time scrolling to avoid crashing the page
The video will play once; press any of the videos again to restart the scenes altogether
14 Frames
MethodMSCE ↓PSNR ↑SSIM ↑LPIPS ↓FVD ↓
AniDoc2612.61 (±4823.98)18.04 (±4.99)0.73 (±0.12)0.30 (±0.13)898.19 (±704.30)
LVCD7937.33 (±4727.72)10.00 (±2.75)0.57 (±0.16)0.45 (±0.12)2738.45 (±1555.28)
SketchColour (Ours)2214.18 (±3867.13)20.23 (±5.82)0.79 (±0.12)0.24 (±0.14)829.27 (±723.77)
16 Frames
ToonCrafter4619.98 (±4086.97)13.06 (±3.52)0.56 (±0.13)0.47 (±0.13)1464.59 (±1030.63)
SketchColour (Ours)2403.40 (±4075.10)19.75 (±5.83)0.78 (±0.12)0.25 (±0.14)860.78 (±750.60)
17 Frames
SketchColour (Ours)2512.78 (±4190.19)19.51 (±5.83)0.78 (±0.12)0.25 (±0.14)918.70 (±771.13)

Quantitative comparison of mean (± std) video colorization methods at different frame lengths. Results follow the same frame counts as baselines for fair comparison.

Cite this work

    @article{sadihin2025sketchcolour,
      author  = {Bryan Constantine Sadihin and Michael Hua Wang and
                Shei Pern Chua and Hang Su},
      title   = {SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour
                Pipeline for 2D Animation},
      journal = {arXiv preprint arXiv:2507.01586},
      year    = {2025},
      url     = {https://arxiv.org/abs/2507.01586}
    }