#2024acmprize — Public Fediverse posts

» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «

aka cheap labor unnamed in papers

2/2

#acmprize #2024acmprize #acmturingaward #andrewbarto #richardsutton #reinforcementlearning

Teixi @[email protected] · 2025-03-09 · 00:58 UTC

» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «

aka cheap labor unnamed in papers

2/2

#acmprize #2024acmprize #acmturingaward #andrewbarto #richardsutton #reinforcementlearning

Teixi @[email protected] · 2025-03-09 · 00:58 UTC

» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «

aka cheap labor unnamed in papers

2/2

#acmprize #2024acmprize #acmturingaward #andrewbarto #richardsutton #reinforcementlearning

Teixi @[email protected] · 2025-03-09 · 00:58 UTC

» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «

aka cheap labor unnamed in papers

2/2

#rlhf #llm #chatgpt #go #alphago #rl

Teixi @[email protected] · 2025-03-09 · 00:58 UTC

» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «

aka cheap labor unnamed in papers