home.social

#humanfeedback — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #humanfeedback, aggregated by home.social.

  1. 🎯 Think AI just "learns"? Think again.
    Today's smartest models don't memorize — they listen to YOU.
    📊 Discover 3 powerful ways human feedback (RLHF) is transforming AI into something far more intuitive.
    👇 Don’t just use AI. Understand how you’re shaping it.

    🔗 medium.com/@rogt.x1997/3-game-
    #RLHF #AIbehavior #HumanFeedback #MachineLearning
    medium.com/@rogt.x1997/3-game-

  2. One poorly delivered joke in 2019 became the catalyst for the most human breakthrough in AI: RLHF.
    Now, machines aren’t just answering—they’re understanding us.
    This isn’t the future. It’s happening now.
    ⬇️ See how empathy, feedback, and a little comedy changed everything.
    #AIAlignment #RLHF #EthicalAI #HumanFeedback
    👉
    medium.com/@rogt.x1997/the-jok

  3. Good point made by @soumith on X:
    "Open LLMs need to get organized and co-ordinated about sharing human feedback. It's the weakest link with Open LLMs right now. They don't have 100m+ people giving feedback like in the case of OpenAI/Anthropic/Bard."
    #Opensource #AI #LLM #GenerativeAI #humanfeedback

  4. The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. technologyreview.com/2023/06/1

  5. The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. technologyreview.com/2023/06/1

  6. The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. technologyreview.com/2023/06/1

  7. The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. technologyreview.com/2023/06/1

  8. The secret to making #AIChatbots sound #smart and #spew less #toxic nonsense is to use a technique called reinforcement learning from #HumanFeedback, which uses input from people to improve the model’s answers. It relies on a small army of #human #data #annotators who evaluate whether a string of text makes sense and sounds fluent and natural. They decide whether a response should be kept in the AI model’s database or removed. technologyreview.com/2023/06/1

  9. In the intro to his keynote on Reasoning with Realistically Imperfect Knowledge, Alexander Gray is comparing gpt-3 rlhf to Shub-Niggurath, a mythical goddess from the Lovecraftian monster universe
    #eswc2023 #lovecraft #reinforcementlearning #humanfeedback #gpt #rlhf

  10. In the intro to his keynote on Reasoning with Realistically Imperfect Knowledge, Alexander Gray is comparing gpt-3 rlhf to Shub-Niggurath, a mythical goddess from the Lovecraftian monster universe
    #eswc2023 #lovecraft #reinforcementlearning #humanfeedback #gpt #rlhf