home.social

#ppo — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #ppo, aggregated by home.social.

  1. Продвинутые RL алгоритмы: Normal Policy, TRPO, PPO

    Большой конспект по продвинутым RL алгоритмам: TRPO и PPO. Автор слегка упоролся в формулах, но это из любви к прозрачности алгоритмов.

    habr.com/ru/articles/991622/

    #Policy_gradient_methods #ActorCritic #reinforcementlearning #ppo #trpo

  2. RL (RLM): Разбираемся вместе

    Всем привет! Недавно я познакомился с курсом по глубокому обучению с подкреплением от HuggingFace Deep Reinforcement Learning Course и захотел сделать выжимку самого интересного. Эта статья — своего рода шпаргалка по основам Reinforcement Learning (RL) и одному из ключевых алгоритмов — PPO, который лежит в основе тонкой настройки современных LLM (Large Language Models).

    habr.com/ru/articles/958062/

    #Искуственный_интеллект #Машинное_обучение #Алгоритмы #RLHF #LLM #Большие_языковые_модели #RL #Reinforcement_learning #PPO #Proxi

  3. A Vulnerable Sector Check (VSC) pre-employment screening can take over 3 months because of a backlog at the OPP.

    cbc.ca/news/canada/toronto/opp
    - - -
    La vérification des antécédents en vue d’un travail auprès de personnels vulnérables (VATPV) peut prendre plus de 3 mois à cause de retards chez la PPO.

    // Article en anglais //

    #Ontario #OPP #PPO

  4. #MedicalInsurance #Medicare #MedicarePlus

    Just received noticed from #BlueShield that #UCSF, my medical provider for the last 15 years, is leaving the #BlueShieldOfCA #PPO medical network as of 7/10/2025. ☹️

    Just started doing some research on which groups are available where I can find a new PCP & all of the "reviews" for all of the medical groups in my area & beyond are dismal. 🤦‍♂️

    That said, I've found that as long as I get a PCP that I get along with & who is responsive to my needs/requests, I'm happy even if the reviews for the group are poor.

    So, I may need to try a couple in various groups before I find the PCP that I like.

    As the member of a PPO, I don't have to worry all about getting referrals for specialized care but the day-to-day medical care -- labs & prescriptions -- is all I generally need & I just need to find another PCP who is on the same page with me for those things.

    Wish me luck! 😉

  5. Mobile PPO groups in Kherson region hit "shaheeds" at night, destroying 6 enemy drones. #Kherson #PPO

  6. In Kyiv, PPO@censor_net operates. They are active on social media platforms. #Kyiv #PPO

  7. Using clever change of variables trick #DPO is a more efficient drop-in replacement for #PPO in #RLHF.

    Using DPO with preference labels from #chatbot panel of judges for virtually embodied agents would be a great way to achieve an unambiguous #AGI.

    [2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model arxiv.org/abs/2305.18290

  8. Today I’ve started to make an humanoid robot learn to walk by itself. Funny to watch when evaluating the model after a few hours.

    #ai #gym #deeplearning #ppo