#humanfeedback — Public Fediverse posts on home.social

Hacker News @[email protected] · 2025-07-06 · 14:30 UTC

Reinforcement Learning from Human Feedback (RLHF) in Notebooks

https://github.com/ash80/RLHF_in_notebooks

#HackerNews #ReinforcementLearning #HumanFeedback #RLHF #Notebooks #AIResearch

#hackernews #reinforcementlearning #humanfeedback #rlhf #notebooks #airesearch

Dr. Thompson @[email protected] · 2025-06-10 · 22:46 UTC

🎯 Think AI just "learns"? Think again.
Today's smartest models don't memorize — they listen to YOU.
📊 Discover 3 powerful ways human feedback (RLHF) is transforming AI into something far more intuitive.
👇 Don’t just use AI. Understand how you’re shaping it.

🔗 https://medium.com/@rogt.x1997/3-game-changing-ways-rlhf-is-rewiring-ai-behavior-5f082ce6ec01
#RLHF #AIbehavior #HumanFeedback #MachineLearning
https://medium.com/@rogt.x1997/3-game-changing-ways-rlhf-is-rewiring-ai-behavior-5f082ce6ec01

#rlhf #aibehavior #humanfeedback #machinelearning

Dr. Thompson @[email protected] · 2025-06-07 · 20:42 UTC

One poorly delivered joke in 2019 became the catalyst for the most human breakthrough in AI: RLHF.
Now, machines aren’t just answering—they’re understanding us.
This isn’t the future. It’s happening now.
⬇️ See how empathy, feedback, and a little comedy changed everything.
#AIAlignment #RLHF #EthicalAI #HumanFeedback
👉
https://medium.com/@rogt.x1997/the-joke-that-taught-ai-empathy-inside-the-rlhf-breakthrough-174a56d91bf7

#aialignment #rlhf #ethicalai #humanfeedback

Erik Jonker @[email protected] · 2023-10-16 · 13:01 UTC

Good point made by @soumith on X:
"Open LLMs need to get organized and co-ordinated about sharing human feedback. It's the weakest link with Open LLMs right now. They don't have 100m+ people giving feedback like in the case of OpenAI/Anthropic/Bard."
#Opensource #AI #LLM #GenerativeAI #humanfeedback

#opensource #ai #llm #generativeai #humanfeedback

beSpacific @[email protected] · 2023-06-13 · 12:01 UTC

The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. https://www.technologyreview.com/2023/06/13/1074560/we-are-all-ais-free-data-workers

#aichatbots #smart #spew #toxic #humanfeedback #human

beSpacific @[email protected] · 2023-06-13 · 12:01 UTC

The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. https://www.technologyreview.com/2023/06/13/1074560/we-are-all-ais-free-data-workers

#aichatbots #smart #spew #toxic #humanfeedback #human

beSpacific @[email protected] · 2023-06-13 · 12:01 UTC

The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. https://www.technologyreview.com/2023/06/13/1074560/we-are-all-ais-free-data-workers

#aichatbots #smart #spew #toxic #humanfeedback #human

beSpacific @[email protected] · 2023-06-13 · 12:01 UTC

The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. https://www.technologyreview.com/2023/06/13/1074560/we-are-all-ais-free-data-workers

#annotators #data #human #humanfeedback #toxic #spew

beSpacific @[email protected] · 2023-06-13 · 12:01 UTC

The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. https://www.technologyreview.com/2023/06/13/1074560/we-are-all-ais-free-data-workers

#aichatbots #smart #spew #toxic #humanfeedback #human

Harald Sack @[email protected] · 2023-05-31 · 13:26 UTC

In the intro to his keynote on Reasoning with Realistically Imperfect Knowledge, Alexander Gray is comparing gpt-3 rlhf to Shub-Niggurath, a mythical goddess from the Lovecraftian monster universe
#eswc2023 #lovecraft #reinforcementlearning #humanfeedback #gpt #rlhf

#eswc2023 #lovecraft #reinforcementlearning #humanfeedback #gpt #rlhf

Harald Sack @[email protected] · 2023-05-31 · 13:26 UTC

In the intro to his keynote on Reasoning with Realistically Imperfect Knowledge, Alexander Gray is comparing gpt-3 rlhf to Shub-Niggurath, a mythical goddess from the Lovecraftian monster universe
#eswc2023 #lovecraft #reinforcementlearning #humanfeedback #gpt #rlhf

#eswc2023 #lovecraft #reinforcementlearning #humanfeedback #gpt #rlhf