Generation of a dataset with realistic noise to train denoising networks

Kovalenko A.S., Demyanenko Y.M.

Southern Federal University, Russia, 344090, Roston-on-Don, st. Milchakova, 8а, +78632975111,

Along with noise reduction at images, video noise cancelling is an actual problem. But unlike images, it is difficult to get aligned frames to build clean samples for subsequent training of neural networks. In this paper, we propose using noise maps predictable by the neural network to obtain noisy samples from clear video frames.

Since there are currently no aligned datasets for the video noise reduction problem, researchers need to apply noise to video frames. For example, the authors of approaches [1], [2], add Gaussian noise with different sigma parameters to output images to train their models. But this approach has a significant weakness. When applying noise to the input image, the physical properties of the CMOS camera sensor are not taken into account. For example, overexposed regions of an image should not contain noise. And regions which close to overexposed should contain a low level of noise. The author of the work “Saturation (Imaging)” [3] refers to these points.

Since more advanced architectures for image denoising have already been developed, we have replaced the classic UNet with an architecture called Uformer [3], because it showed state-of-the-art results. Moreover, the Uformer [3] model is based on the vision transformers architecture. It provides to this model to extract the spatial and local features from the input image to build attention maps. With these attention maps, it is possible to forecast relationships between image regions. And it is an excellent feature to forecast noise maps.

We trained the Uformer model from scratch on the SIDD dataset [2]. Training pipeline based on the DANet approach. This model trained for 42 epochs. It achieved a PSNR value is 39.4 on the SIDD dataset validation set. The original model by the author demonstrates the results of PSNR is 39.3. Moreover, as we can see from the illustration, the Uformer model result is closer to the real noise distribution than the Gaussian distribution.


1. Tassano Matias, Delon Julie, Veit Thomas. FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation // Proceedings ofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). — 2020. — June.р.

2. Claus Michele, Gemert Jan. ViDeNN: Deep Blind Video Denois-ing. — 2019. — 06. — P. 1843–1852.

3. Hasinoff Samuel W. Saturation (Imaging) // Computer Vision, A Refer-ence Guide. — 2014.

4. Yue Zongsheng, Zhao Qian, Zhang Lei, andMeng Deyu. Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation // Proceedings of the European Conference on ComputerVision (ECCV). — 2020. — August.

5. Wang Zhendong, Cun Xiaodong, Bao Jianmin, and Liu Jianzhuang. Uformer: A General U-Shaped Transformer for Image Restoration // CoRR. — 2021. — Vol. abs/2106.03106. — arXiv :2106.03106.

6. Timofte Radu, Brown Michael S. NTIRE 2019 Challenge on Real Image Denoising: Methods // The IEEE/CVF Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW). — 2019. — June.


© 2004 Designed by Lyceum of Informational Technologies №1533