Automated Video Colorization Techniques for Enhanced Visual Realism and Computational Efficiency
Main Article Content
Abstract
Automatic video colorization remains a challenging computer vision task, particularly when ensuring semantic accuracy and temporal coherence across dynamic, multi-scene content. Existing methods often rely on a single fixed reference image, which fails to adapt to abrupt scene changes or variations in lighting and texture. This study presents a hybrid deep learning framework that dynamically selects multiple reference images per scene using adaptive thresholds derived from the Structural Similarity Index Measure (SSIM) and deep features extracted via a ResNet50 backbone with Generalized Mean Pooling (GeM). The framework integrates three specialized modules pre-processing, reference image processing, and attention-based colorization—operating in the Lab color space before conversion to RGB. Experimental evaluations on the YouTube-8M dataset demonstrate a PSNR of 37.89, SSIM of 0.998, and inference speed of 2.6 FPS with a compact 81 MB model (3.2M parameters). Compared to state-of-the-art methods, the proposed approach achieves superior color fidelity and temporal stability while maintaining efficiency, making it suitable for deployment in resource-constrained environments such as embedded vision and IoT systems.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.