[Open DMQA Seminar] RLHF-Preference-based Reinforcement Learning

No video

[Open DMQA Seminar] RLHF-Preference-based Reinforcement Learning

Рет қаралды 1,483

‍김성범[ 교수 / 산업경영공학부 ]

‍김성범[ 교수 / 산업경영공학부 ]

Күн бұрын

RLHF(reinforcement from human feedback)란 인간의 피드백만을 가지고 강화학습 에이전트를 학습하는 분야를 가리키며, 도메인 지식을 기반한 구체적인 보상 함수 설계없이 복잡한 태스크를 수행할 수 있음을 시사한다. 금일 세미나에서는 RLHF의 다양한 분야 중 '이진 비교'에 기반한 Preference-based Reinforcement Learning (PbRL)에 대해 소개한다. 또한 PbRL에서 존재하는 문제점은 무엇이며 이러한 문제점들을 해결한 다양한 알고리즘들에 대해 살펴보고자 한다.
참고 자료:
[1] Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
[2] Lee, K., Smith, L., & Abbeel, P. (2021). Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. arXiv preprint arXiv:2106.05091.
[3] Park, J., Seo, Y., Shin, J., Lee, H., Abbeel, P., & Lee, K. (2021, October). SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning. In International Conference on Learning Representations.
[4] Liang, X., Shu, K., Lee, K., & Abbeel, P. (2021, October). Reward Uncertainty for Exploration in Preference-based Reinforcement Learning. In International Conference on Learning Representations.

Пікірлер

[Open DMQA Seminar] Accelerating Diffusion Models - Consistency Models and Hybrid Approach

37:11

[Open DMQA Seminar] Accelerating Diffusion Models - Consistency Models and Hybrid Approach

‍김성범[ 교수 / 산업경영공학부 ]

Рет қаралды 1,5 М.

[Open DMQA Seminar] Domain Adaptation

1:27:57

[Open DMQA Seminar] Domain Adaptation

‍김성범[ 교수 / 산업경영공학부 ]

Рет қаралды 1,3 М.

Just Give me my Money!

00:18

Just Give me my Money!

GL Show Russian

Рет қаралды 651 М.

00:20

wow so cute 🥰

dednahype

Рет қаралды 31 МЛН

This Kind Couple Gave Me a New Home! 🏡💖 #heartwarming #storytime #creative

00:35

This Kind Couple Gave Me a New Home! 🏡💖 #heartwarming #storytime #creative

Friendeez

Рет қаралды 21 МЛН

Gli occhiali da sole non mi hanno coperto! 😎

00:13

Gli occhiali da sole non mi hanno coperto! 😎

Senza Limiti

Рет қаралды 22 МЛН

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

15:31

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Serrano.Academy

Рет қаралды 10 М.

😎ChatGPT 핵심기술 RLHF 코드리뷰 feat ChatLLaMA😎

33:56

😎ChatGPT 핵심기술 RLHF 코드리뷰 feat ChatLLaMA😎

한국인공지능아카데미

Рет қаралды 11 М.

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

2:15:13

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Umar Jamil

Рет қаралды 17 М.

[핵심 머신러닝] Self-supervised Learning - Part 6 (Information Maximization Methods)

44:07

[핵심 머신러닝] Self-supervised Learning - Part 6 (Information Maximization Methods)

‍김성범[ 교수 / 산업경영공학부 ]

Рет қаралды 1,6 М.

인공지능의 마지막 종착지 | AGI

24:01

인공지능의 마지막 종착지 | AGI

위니버스

Рет қаралды 187 М.

[Open DMQA Seminar] Image Denoising

31:18

[Open DMQA Seminar] Image Denoising

‍김성범[ 교수 / 산업경영공학부 ]

Рет қаралды 793

Why Data Structures and Algorithms Are Important to Learn?

7:21

Why Data Structures and Algorithms Are Important to Learn?

노마드 코더 Nomad Coders

Рет қаралды 188 М.

[DMQA Open Seminar] Transformer

38:04

[DMQA Open Seminar] Transformer

‍김성범[ 교수 / 산업경영공학부 ]

Рет қаралды 15 М.

생성형 AI ChatGPT 원리의 비밀은? 수많은 사람들의 피드백이 핵심! RLHF(인간 피드백기반 강화학습)[최대우 대표/애자일소다, 토크아이티 고우성의 잇터뷰40]

8:16

생성형 AI ChatGPT 원리의 비밀은? 수많은 사람들의 피드백이 핵심! RLHF(인간 피드백기반 강화학습)[최대우 대표/애자일소다, 토크아이티 고우성의 잇터뷰40]

토크아이티(Talk IT)

Рет қаралды 1,8 М.

How AI 'Understands' Images (CLIP) - Computerphile

18:05

How AI 'Understands' Images (CLIP) - Computerphile

Computerphile

Рет қаралды 196 М.

Just Give me my Money!

00:18

Just Give me my Money!

GL Show Russian

Рет қаралды 651 М.