Department of Computer Science
Tsinghua University
Comparative results for our Sketch-to-Color Video Diffusion model against prior work (LVCD, ToonCrafter, AniDoc) across scenes of SAKUGA validation dataset.
14 Frames | |||||
---|---|---|---|---|---|
Method | MSCE ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | FVD ↓ |
AniDoc | 2612.61 (±4823.98) | 18.04 (±4.99) | 0.73 (±0.12) | 0.30 (±0.13) | 898.19 (±704.30) |
LVCD | 7937.33 (±4727.72) | 10.00 (±2.75) | 0.57 (±0.16) | 0.45 (±0.12) | 2738.45 (±1555.28) |
SketchColour (Ours) | 2214.18 (±3867.13) | 20.23 (±5.82) | 0.79 (±0.12) | 0.24 (±0.14) | 829.27 (±723.77) |
16 Frames | |||||
ToonCrafter | 4619.98 (±4086.97) | 13.06 (±3.52) | 0.56 (±0.13) | 0.47 (±0.13) | 1464.59 (±1030.63) |
SketchColour (Ours) | 2403.40 (±4075.10) | 19.75 (±5.83) | 0.78 (±0.12) | 0.25 (±0.14) | 860.78 (±750.60) |
17 Frames | |||||
SketchColour (Ours) | 2512.78 (±4190.19) | 19.51 (±5.83) | 0.78 (±0.12) | 0.25 (±0.14) | 918.70 (±771.13) |
Quantitative comparison of mean (± std) video colorization methods at different frame lengths. Results follow the same frame counts as baselines for fair comparison.
@article{sadihin2025sketchcolour, author = {Bryan Constantine Sadihin and Michael Hua Wang and Shei Pern Chua and Hang Su}, title = {SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation}, journal = {arXiv preprint arXiv:2507.01586}, year = {2025}, url = {https://arxiv.org/abs/2507.01586} }