|
[1] Reed, s., akata, z., yan, x., logeswaran, l., schiele, b., lee, h. (2016, june). generative adversarial text to image synthesis. in international conference on machine learning (pp. 1060-1069). pmlr. [2] Zhang, h., xu, t., li, h., zhang, s., wang, x., huang, x., metaxas, d. n. (2017). stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. in proceedings of the ieee international conference on computer vision (pp. 5907-5915). [3] Xu, t., zhang, p., huang, q., zhang, h., gan, z., huang, x., he, x. (2018). attngan: Fine-grained text to image generation with attentional generative adversarial networks. in proceedings of the ieee conference on computer vision and pattern recognition (pp. 1316-1324). [4] Ramesh, a., pavlov, m., goh, g., gray, s., voss, c., radford, a., ... sutskever, i. (2021, july). zero-shot text-to-image generation. in international conference on machine learning (pp. 8821-8831). pmlr. [5] Esser, p., rombach, r., ommer, b. (2021). taming transformers for highresolution image synthesis. in proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 12873-12883). [6] Radford, a., kim, j. w., hallacy, c., ramesh, a., goh, g., agarwal, s., ... sutskever, i. (2021, july). learning transferable visual models from natural language supervision. in international conference on machine learning (pp. 8748-8763). pmlr. [7] Goodfellow, i., pouget-abadie, j., mirza, m., xu, b., warde-farley, d., ozair, s., ... bengio, y. (2014). generative adversarial nets. advances in neural information processing systems, 27. [8] Pejhan, e., ghasemzadeh, m. (2021). multi-sentence hierarchical generative adversarial network gan (msh-gan) for automatic text-to-image generation. journal of ai and data mining, 9(4), 475-485. [9] Zhu, m., pan, p., chen, w., yang, y. (2019). dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. in proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 5802-5810). [10] Sharma, s., suhubdy, d., michalski, v., kahou, s. e., bengio, y. (2018). chatpainter: Improving text to image generation using dialogue. arxiv preprint arxiv:1802.08216. [11] Johnson, j., gupta, a., fei-fei, l. (2018). image generation from scene graphs. in proceedings of the ieee conference on computer vision and pattern recognition (pp. 1219-1228). [12] Vo, d. m., sugimoto, a. (2020, august). visual-relation conscious image generation from structured-text. in european conference on computer vision (pp. 290-306). springer, cham. [13] Brock, a., donahue, j., simonyan, k. (2018). large scale gan training for high fidelity natural image synthesis. arxiv preprint arxiv:1809.11096. [14] Ulyanov, d., vedaldi, a., lempitsky, v. (2018). deep image prior. in proceedings of the ieee conference on computer vision and pattern recognition (pp. 9446- 9454). [15] Dhariwal, p., nichol, a. (2021). diffusion models beat gans on image synthesis. advances in neural information processing systems, 34. [16] Vaswani, a., shazeer, n., parmar, n., uszkoreit, j., jones, l., gomez, a. n., ... polosukhin, i. (2017). attention is all you need. advances in neural information processing systems, 30. [17] Crowson, k., biderman, s., kornis, d., stander, d., hallahan, e., castricato, l., raff, e. (2022). vqgan-clip: Open domain image generation and editing with natural language guidance. arxiv preprint arxiv:2204.08583. [18] Van den oord, a., vinyals, o. (2017). neural discrete representation learning. advances in neural information processing systems, 30. [19] He, k., zhang, x., ren, s., sun, j. (2016). deep residual learning for image recognition. in proceedings of the ieee conference on computer vision and pattern recognition (pp. 770-778). [20] Dosovitskiy, a., beyer, l., kolesnikov, a., weissenborn, d., zhai, x., unterthiner, t., ... houlsby, n. (2020). an image is worth 16x16 words: Transformers for image recognition at scale. arxiv preprint arxiv:2010.11929. [21] Nikolai ilinykh, sina zarrieß, and david schlangen. 2019. tell me more: A dataset of visual scene description sequences. in proceedings of the 12th international conference on natural language generation, pages 152–157, tokyo, japan. association for computational linguistics. [22] Deng, j., dong, w., socher, r., li, l. j., li, k., fei-fei, l. (2009, june). imagenet: A large-scale hierarchical image database. in 2009 ieee conference on computer vision and pattern recognition (pp. 248-255). ieee. [23] Karras, t., laine, s., aila, t. (2019). a style-based generator architecture for generative adversarial networks. in proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 4401-4410). [24] El-kassas, w. s., salama, c. r., rafea, a. a., mohamed, h. k. (2021). automatic text summarization: A comprehensive survey. expert systems with applications, 165, 113679. [25] Lewis, m., liu, y., goyal, n., ghazvininejad, m., mohamed, a., levy, o., ... zettlemoyer, l. (2019). bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arxiv preprint arxiv:1910.13461. [26] Barratt, s., sharma, r. (2018). a note on the inception score. arxiv preprint arxiv:1801.01973. [27] Heusel, m., ramsauer, h., unterthiner, t., nessler, b., hochreiter, s. (2017). gans trained by a two time-scale update rule converge to a local nash equilibrium. advances in neural information processing systems, 30. [28] Barratt, s., sharma, r. (2018). a note on the inception score. arxiv preprint arxiv:1801.01973. [29] Frolov, s., hinz, t., raue, f., hees, j., dengel, a. (2021). adversarial text-toimage synthesis: A review. neural networks, 144, 187-209. [30] Mu, n., kirillov, a., wagner, d., xie, s. (2021). slip: Self-supervision meets language-image pre-training. arxiv preprint arxiv:2112.12750.
|