帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(外文):Wu, Yen-Yi
論文名稱(外文):Semantic-Aware Interactive Image Manipulation with Conditional Generative Adversarial Networks
指導教授(外文):Lai, Shang-Hong
口試委員(外文):Liu, Tyng-Luh
Huang, Szu-Hao
Lee, Che-Rung
外文關鍵詞:Deep LearningGenerative Adversarial NetworksConditional Generative Adversarial NetworksImage Manipulation
  • 推薦推薦:0
  • 點閱點閱:363
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0


Image manipulation is a challenging task because it requires not only understanding of the semantic content and style of the images but also skills of keeping the modification semantically consistent with the unmodified parts. In this thesis, we propose a conditional GAN model to assist users in manipulating complicated images with simple operations like brushes and erasers.

Our model is an encode-decoder structure, in which the encoder generates high dimensional feature maps corresponding to semantic information with the help of a segmentation branch for users to manipulate and the decoder produces realistic images from the modified feature maps. Experiments of reconstructing from the features demonstrate that our high dimensional feature maps can better represent the style of images of multiple objects than latent vectors.

Finally, we build an interactive image editing application based on our approach and provide some comparisons on the processes of model training and the image editing results with other previous works to demonstrate the proposed method gives superior results for manipulating real images.
1 Introduction 1
1.1 Motivation 1
1.2 Problem Statement 2
1.3 Contributions 2
1.4 Thesis Organization 3
2 Related Work 4
2.1 Generative Adversarial Networks 4
2.2 Conditional GANs 4
2.3 Photorealistic Image Synthesis 5
2.3.1 CRN 5
2.3.2 pix2pixHD 5
2.3.3 SPADE 6
2.4 Interactive Image Editing 6
2.4.1 Locating and ablating units 6
2.4.2 Encoder-Decoder 7
2.4.3 Image Translation 7
2.4.4 Image Completion 8
3 Method 9
3.1 Network Architecture 10
3.1.1 Encoder 10
3.1.2 Color-wise Averaging 11
3.1.3 Generator and Discriminator 13
3.2 Objective Functions 14
3.2.1 Segmentation Loss 14
3.2.2 Adversarial Loss 14
3.2.3 Reconstruction Loss 15
3.2.4 Perceptual Loss 15
3.2.5 Full Objective 16
4 Interactive Editing 17
4.1 Editing Pipeline 17
4.2 Datasets 18
4.3 Edit Transfer 19
4.4 Interactive Application 20
4.5 Editing Results 21
5 Experiments 25
5.1 Implementation Details 25
5.2 Editing Comparison 25
5.2.1 Comparison with SPADE 26
5.2.2 Comparison with SC-FEGAN 29
5.3 Comparison of Training and Testing Processes 30
5.4 Ablation Study 32
6 Conclusions 35
References 36
[1] Bau, D., Zhu, J.-Y., Strobelt, H., Bolei, Z., Tenenbaum, J. B., Freeman, W. T., and Torralba, A. Gan dissection: Visualizing and understanding generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR) (2019).
[2] Chen, Q., and Koltun, V. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV)) (2017).
[3] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
[4] Denton, E. L., Chintala, S., Fergus, R., et al. Deep generative image models using a laplacian pyramid of dversarial networks. In Proceedings of the Neural Information Processing Systems Conference (NIPS) (2015).
[5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680.
[6] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
[7] Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
[8] Jo, Y., and Park, J. Sc-fegan: Face editing generative adversarial network with user’s sketch and color. arXiv preprint arXiv:1902.06838 (2019).
[9] Johnson, J., Alahi, A., and Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV) (2016).
[10] Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations (ICLR) (2017).
[11] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[12] Kingma, D. P., and Welling, M. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR) (2014).
[13] Lee, C.-H., Liu, Z., Wu, L., and Luo, P. Maskgan: Towards diverse and interactive facial image manipulation. arXiv preprint arXiv:1907.11922 (2019).
[14] Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015).
[15] Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., and Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017).
[16] Mirza, M., and Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
[17] Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
[18] Reinhard, E., Adhikhmin, M., Gooch, B., and Shirley, P. Color transfer between images. IEEE Computer Graphics and Applications 21, 5 (2001), 34–41.
[19] Ronneberger, O., Fischer, P., and Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2015), Springer, pp. 234–241.
[20] Simonyan, K., and Zisserman, A. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[21] Tyleček, R., and Šára, R. Spatial pattern templates for recognition of objects with regular structure. In Proceedings of the German Conference on Pattern Recognition (GCPR) (2013).
[22] Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., and Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).
[23] Yu, F., Koltun, V., and Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
[24] Zhu, J.-Y., Krähenbühl, P., Shechtman, E., and Efros, A. A. Generative visual manipulation on the natural image manifold. In Proceedings of the European Conference on Computer Vision (ECCV) (2016).
[25] Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017).
第一頁 上一頁 下一頁 最後一頁 top
* *