home.social

#modelevaluation — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #modelevaluation, aggregated by home.social.

  1. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  2. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  3. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

  4. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  5. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  6. OpenAI Tries To Measure Whether AI Reasoning Can Be Trusted

    Monitorability gets a real test as OpenAI rolls out new evaluations for chain of thought oversight.

    olamnews.com/research-report/3

  7. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  8. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  9. @[email protected] @[email protected] 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

  10. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  11. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  12. I'm thinking about how to relate #ModelBuilding and #ModelEvaluation to #OpenScience. I'm not there yet, but anyone who wants to think along, feel free! Model evaluation, thinking about #validity and how #standardization and #generalisation interact, among others...

    Figure on #standardisation vs #generalisation from doi.org/10.1111/j.1601-183X.20