home.social

#tidymodels — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #tidymodels, aggregated by home.social.

  1. New post with @joshuamarie: Bayesian Neural Networks in {tidymodels} with {kindling} 🔥

    BNNs learn weight distributions instead of fixed values — giving uncertainty estimates alongside predictions, all within a standard {tidymodels} workflow.

    👉 statsandr.com/blog/bayesian-ne

  2. 🚀 New blog post live!

    Together with @joshuamarie, we explore how to do more with neural networks in R using {kindling}, a higher-level interface to {torch} that makes building, training & tuning deep learning models smoother (and tidymodels-friendly)

    👉 statsandr.com/blog/you-can-do-

  3. rOpenSci will be participating in @latinr_conf!

    Here is the key data for your participation:

    🎙️ Keynotes

    - Heather Turner: Lowering Barriers to Contributing to R.

    - Stephanie Zimmer: Transforming a team to open-source first.

    - TRACE-LAC Team. Lo invisible del código abierto: Lecciones desde el proyecto TRACE-LAC / Epiverso para conectar el desarrollo de software con la salud pública

    🎓 Tutorials by rOpenSci members:

    📌 ¡Miércoles, Git! Manejo de errores en Git y no morir en el intento — @maelle , @yabellini. Registration: eventbrite.cl/e/miercoles-git-

    📌 Introducción a #Tidymodels — Francisco Cardozo & Edgar Ruiz. Registration: eventbrite.cl/e/introducion-a-

    📌 Automatización de workflows en R y Python con #targets y #Snakemake — Diana García. Registration: eventbrite.cl/e/automatizacion

    📌 ¿Qué historia vas a contar hoy Herramientas para una comunicación eficaz — Alejandra Bellini. Registration: eventbrite.cl/e/que-historia-v

    📌 Coding with #AI in RStudio — Juan Cruz Rodríguez & @LuisDVerde. Registration: eventbrite.cl/e/coding-with-ai

    List with all tutorials here: latinr.org/en/

    #RStats #RStatsES #openScience #RSE #OpenData #FOSS #DataScience #Analytics

    1/2

  4. @jeremy-data.bsky.social biology based on #bioconductor (and I’m a BioJava contributor) - also model building in health informatics (I’d use #tidymodels and #vetiver now) vs Python

  5. @jeremy-data.bsky.social biology based on #bioconductor (and I’m a BioJava contributor) - also model building in health informatics (I’d use #tidymodels and #vetiver now) vs Python

  6. @jeremy-data.bsky.social biology based on (and I’m a BioJava contributor) - also model building in health informatics (I’d use and now) vs Python

  7. @jeremy-data.bsky.social biology based on #bioconductor (and I’m a BioJava contributor) - also model building in health informatics (I’d use #tidymodels and #vetiver now) vs Python

  8. @jeremy-data.bsky.social biology based on #bioconductor (and I’m a BioJava contributor) - also model building in health informatics (I’d use #tidymodels and #vetiver now) vs Python

  9. With my #tidyAML #RStats #R package you can quickly generate multiple models against a regression problem. This package needs a lot of work and love but I'll get to it when I have time unless someone else wants to step in. It safely fails when libraries don't exist. #tidymodels #recipes

  10. I have recently been enjoying using Gemini 2.5 Pro in Google AI Studio instead of Stack Exchange :omya_google: . The large context window (1M tokens) and the ability to set the temperature to 0, meaning NO CREATIVITY, make this LLM a very good tool for RAG when communicating with your own materials. For example, I recently had a small question about the applicability of a ridge regression model that I trained in using the framework some years ago.

  11. Feeling lucky to be in a job (for now) that I love so much. Just arrived back from a 5 day workshop in Goettingen, Germany. We invited palaeoecologists, palynologists, historians, ethnographers and archaeologists working on the Atlantic Forest of #brazil. We were also honoured to host a Tupi-guarani village leader, who generously offered his unique perspectives. I delivered a 2 day workshop on #gis #rstats #ecology species distribution modelling using #tidymodels and #tidysdm.

  12. New blog post: Spatial Machine Learning with tidymodels 🌍🧠📦

    This post shows how to apply the tidymodels framework to spatial data workflows in R. Part 3 in a series about .

    🔗 geocompx.org/post/2025/sml-bp3/

  13. If you are looking to use the cubist algo for regression and want to get it into shape then use the healthyR.ai package and the hai_cubist_data_prepper() function. Link: www.spsanderson.com/healthyR.ai/... #regression #tidymodels #data #algorithm #cubist

  14. I wrote up a little blog post comparing the runtime and memory allocation of how we used to create dummy variables with the new sparse support I added in tidymodels

    emilhvitfeldt.com/post/sparse-

  15. Happy to share that {recipes} has a new release with many new features and all known bugs exterminated!

    tidyverse.org/blog/2025/04/rec

  16. If you are looking for data processors to get your data in line for the algo in question, then my #R #package { healthyR.ai } has you covered. These are based on using #tidymodels #parsnip from the #tidyverse www.spsanderson.com/healthyR.ai/... #RStats #Data #ModelData

  17. We heard from the community that CatBoost is the way do go. We listened and learned! here is the first batch of updated hexes for

  18. One of the exciting parts of the new sparse data tidymodels work, is that {textrecipes} can now be used as a reproducible way to generate DTM, tf-idf etc etc

  19. Combining two of my favorite things.

    and oysters. My latest blog post is a project to predict New York Harbor water quality using data from Billionoysterproject.org and
    outsiderdata.netlify.app/posts

  20. ok how did I not know until now that you can add se.fit = TRUE to the predict() function to get errors?

    and of course, I now see there is a std_error option and several others in the version

    what do these do for nonparametric models, I wonder?

    No matter how much I think I know, there is always so much more to learn... 🤓

  21. I'll be running an "Introduction to machine learning with {tidymodels}" workshop at RSS Conference in September!

    Session details:
    📅 Wednesday 4 September, 2024
    ⏰ 11:30am - 12:50pm
    📍 Brighton, UK

    More info: virtual.oxfordabstracts.com/#/

    Register: rss.org.uk/training-events/con

  22. This week's blog post is on deploying with using ! 🚀 Dive in to learn how to streamline your machine learning workflows:

    📖✨


    jumpingrivers.com/blog/vetiver

  23. Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).

    See ?NCV in latest {mgcv} for examples (cran.r-universe.dev/mgcv/doc/m)

    I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.

    #rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl

  24. Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).

    See ?NCV in latest {mgcv} for examples (cran.r-universe.dev/mgcv/doc/m)

    I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.

    #rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl

  25. Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).

    See ?NCV in latest {mgcv} for examples (cran.r-universe.dev/mgcv/doc/m)

    I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.

    #rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl

  26. Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).

    See ?NCV in latest {mgcv} for examples (cran.r-universe.dev/mgcv/doc/m)

    I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.

    #rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl

  27. Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).

    See ?NCV in latest {mgcv} for examples (cran.r-universe.dev/mgcv/doc/m)

    I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.

    #rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl

  28. tidymodels has long supported parallelizing model fits across CPU cores. A couple of the modeling engines that supports for gradient boosting— and —have their own tools to parallelize model fits. A new blog post explores whether tidymodels users should use tidymodels' implementation, the engines', or both.

    simonpcouch.com/blog/2024-05-1

  29. 📷 Let's take a moment to relive the moments from our recent in-person gathering through these snapshots!

    We had the pleasure of hosting María Paula Caldas, Data Scientist at OECD, and Julie Aubert, INRAE Research Engineer, who respectively delivered #inspiring talks on the development of #packages and statistical models using {#Tidymodels} in #R.

    You can find the replay here:
    👉 youtu.be/wEVKoPhB25g

    👥@chaimaboughanmi @mouna_belaid @RLadiesGlobal @Posit

    #RStats #RLadies #RLadiesParis

  30. The recording of my "Introduction to machine learning with {tidymodels}" workshop from the R/Pharma Conference is now available! 📹📹

    YouTube: youtu.be/i-Rm2HUWgnc?feature=s

    A reminder that you can also find the workshop materials on GitHub: github.com/nrennie/r-pharma-20

  31. I had a fantastic time running the "Introduction to machine learning with {tidymodels}" workshop as part of the R/Pharma conference today!

    Massive thanks to Phil Bowsher for the invitation, and to @rpodcast and Libby Heeren for helping to answer questions during the session!

    Slides: nrennie.github.io/r-pharma-202

    GitHub: github.com/nrennie/r-pharma-20

    A blog post on any unanswered questions from the chat will be coming soon!

  32. I'm very excited to be running the Introduction to Machine Learning with {tidymodels} workshop at the R/Pharma conference this year!

    The 2 hour workshop will be held online on the 18th October (2pm BST) and is completely free!

    Come and join me by signing up on Eventbrite: eventbrite.com/e/introduction-

  33. The recording of my talk on "Using {tidymodels} to Detect Heart Murmurs" from the R/Medicine Conference is now available on the @RConsortium YouTube channel! 📹📹📹

    youtu.be/xyxbhLb_aEs

  34. 🧑‍💻 New video! Walk through the "whole game" of with :

    👀 Data prep with
    🧠 Model training & eval with
    ✅ Deployment with in 🐳 on @huggingface 🤗
    📌 Monitoring with

    🎥: youtu.be/J32pRt1nuoY

  35. 🧑‍💻 New video! Walk through the "whole game" of #MLOps with #rstats:

    👀 Data prep with #tidyverse
    🧠 Model training & eval with #tidymodels
    ✅ Deployment with #vetiver in #Docker 🐳 on @huggingface 🤗
    📌 Monitoring with #pins

    🎥: youtu.be/J32pRt1nuoY

  36. 🧑‍💻 New video! Walk through the "whole game" of #MLOps with #rstats:

    👀 Data prep with #tidyverse
    🧠 Model training & eval with #tidymodels
    ✅ Deployment with #vetiver in #Docker 🐳 on @huggingface 🤗
    📌 Monitoring with #pins

    🎥: youtu.be/J32pRt1nuoY

  37. 🧑‍💻 New video! Walk through the "whole game" of #MLOps with #rstats:

    👀 Data prep with #tidyverse
    🧠 Model training & eval with #tidymodels
    ✅ Deployment with #vetiver in #Docker 🐳 on @huggingface 🤗
    📌 Monitoring with #pins

    🎥: youtu.be/J32pRt1nuoY

  38. 🧑‍💻 New video! Walk through the "whole game" of #MLOps with #rstats:

    👀 Data prep with #tidyverse
    🧠 Model training & eval with #tidymodels
    ✅ Deployment with #vetiver in #Docker 🐳 on @huggingface 🤗
    📌 Monitoring with #pins

    🎥: youtu.be/J32pRt1nuoY

  39. #RStats issues I'm struggling with that seem impossible to Google: Building a {brms} model within the {tidymodels} framework using {bayesian}.

    The formula is inherently too complex (including splines and random effects) for the typical tidymodels workflow that involves recipes &c., so it must be added in at a later step. Two things:

    1. Complex {brms} multivariate formulas seem to not be possible using {tidymodels}. E.g., literally multivariate or including phi after my formula via brms::bf(). It simply errors. :( This may just need some tweaking of {bayesian}'s scripts or waiting for an update since it's still fairly young.

    2. Using {mgcv} random effect syntax like s(cat1, cat2, bs = "re") seems to not pick up as random effects in the model...I think? And I have never figured out if this is creating hierarchical random effects or not -- or if multilevel random effects just aren't possible in this syntax(?).

    3. Using {lme4} random effect like (1 | cat1 / cat2) to ensure the hierarchy is preserved *does* retain random effects I can pull out of the model later using `ranef`, but for some absurd reason I cannot run this model through cross-validation or a myriad of other steps later because it seems to force-create a complex web of interacting factor levels that don't exist. E.g., if my random effects are '(1 | realm / biome)', this eventually fails because it'll look for tundra biome types in Africa for some absurd reason.*

    Noticed this while trying to solve *separate* issues within broom.mixed:::tidy.brmsfit() -- that it seems to delete the names of all the fixed effects and return them as 'NULL' character strings (???), and its reliance on 'ranef' means it doesn't find the random effects using {mgcv} syntax.

    That's my rambling mess of an essay for the day. Not sure how many of these are real issues or me simply not understanding how these packages differ or wot.

    #brms #mgcv #tidymodels

    * Almost wondering if this might even be a separate {tidymodels} issue right now. Every recipe no matter what seems to factor every single character column regardless of how the recipe is built. Hmmmm.

  40. With R's Infer library, one can test point hypotheses, such as "the work week has 40 hours".

    I think this is a great improvement on testing point null hypotheses of no difference, which we know a priori to be false.

    infer.netlify.app/

    #rstats #statistics part of #tidymodels and #tidydata