[Open DMQA Seminar] Controllable Diffusion Model

No video

[Open DMQA Seminar] Controllable Diffusion Model

Рет қаралды 774

Күн бұрын

Diffusion model 기반의 text-to-image 모델이 고품질의 이미지 생성 능력을 보여주며 많은 관심을 받고 있다. 그러나 기존 모델들은 텍스트 입력에 크게 의존하여 이미지를 생성하기 때문에, 때로는 사용자가 의도한 지시를 정확하게 반영하는 데 어려움이 있다. 이에 따라 본 세미나에서는 텍스트 뿐 아니라 다양한 입력 조건에 대해서도 유연하게 제어 가능한(controllable) diffusion 방법론을 소개하고자 한다.
참고자료 :
[1] Nichol, A. Q., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., Mcgrew, B., ... & Chen, M. (2022, June). GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In International Conference on Machine Learning (pp. 16784-16804). PMLR.
[2] Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E. L., ... & Norouzi, M. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, 36479-36494.
[3] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
[4] Zhang, L., Rao, A., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3836-3847).
[5] Huang, L., Chen, D., Liu, Y., Shen, Y., Zhao, D., & Zhou, J. (2023). Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778.
[6] Zhao, S., Chen, D., Chen, Y. C., Bao, J., Hao, S., Yuan, L., & Wong, K. Y. K. (2023). Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models. arXiv preprint arXiv:2305.16322.