Demo of SketchColour:
Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation

Bryan Constantine Sadihin, Michael Hua Wang, Shei Pern Chua, Hang Su

Department of Computer Science
Tsinghua University

Comparative results for our Sketch-to-Color Video Diffusion model against prior work (LVCD, ToonCrafter, AniDoc) across scenes of SAKUGA validation dataset.

ArXiv Code

HOLD ON! There are a lot of videos to load here, make sure to take time scrolling to avoid crashing the page

The video will play once; press any of the videos again to restart the scenes altogether

14 Frames
Method	MSCE ↓	PSNR ↑	SSIM ↑	LPIPS ↓	FVD ↓
AniDoc	2612.61 (±4823.98)	18.04 (±4.99)	0.73 (±0.12)	0.30 (±0.13)	898.19 (±704.30)
LVCD	7937.33 (±4727.72)	10.00 (±2.75)	0.57 (±0.16)	0.45 (±0.12)	2738.45 (±1555.28)
SketchColour (Ours)	2214.18 (±3867.13)	20.23 (±5.82)	0.79 (±0.12)	0.24 (±0.14)	829.27 (±723.77)
16 Frames
ToonCrafter	4619.98 (±4086.97)	13.06 (±3.52)	0.56 (±0.13)	0.47 (±0.13)	1464.59 (±1030.63)
SketchColour (Ours)	2403.40 (±4075.10)	19.75 (±5.83)	0.78 (±0.12)	0.25 (±0.14)	860.78 (±750.60)
17 Frames
SketchColour (Ours)	2512.78 (±4190.19)	19.51 (±5.83)	0.78 (±0.12)	0.25 (±0.14)	918.70 (±771.13)

Quantitative comparison of mean (± std) video colorization methods at different frame lengths. Results follow the same frame counts as baselines for fair comparison.

Cite this work

    @article{sadihin2025sketchcolour,
      author  = {Bryan Constantine Sadihin and Michael Hua Wang and
                Shei Pern Chua and Hang Su},
      title   = {SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour
                Pipeline for 2D Animation},
      journal = {arXiv preprint arXiv:2507.01586},
      year    = {2025},
      url     = {https://arxiv.org/abs/2507.01586}
    }