home.social

#datadev — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #datadev, aggregated by home.social.

  1. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  2. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  3. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

  4. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  5. About metrics for measuring agreement on regression on continuous datasets:
    Reasons to avoid R² and use RMSE instead: feat.engineering/03-Review_of_

    From Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"

    #prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha

  6. "A generalized linear model or #GLM consists of three components:
    1. A random component, specifying the conditional distribution of the response variable, Yᵢ (for the ith of n independently sampled observations). […]
    2. A linear predictor—that is a linear function of regressors,
    ηᵢ = α + Σⱼ Xᵢⱼ*βⱼ
    3. A smooth and invertible link function g(·), which transforms the expectation of the response variable, μᵢ ≡ E(Yᵢ), to the linear predictor:
    g(μᵢ) = ηᵢ"

    sagepub.com/sites/default/file

    #models #dataDev #logNormal #regression #normality #normalDistribution #gamma #Γ #modelling #modeling #AIDev #ML #evaluation

  7. "A generalized linear model or #GLM consists of three components:
    1. A random component, specifying the conditional distribution of the response variable, Yᵢ (for the ith of n independently sampled observations). […]
    2. A linear predictor—that is a linear function of regressors,
    ηᵢ = α + Σⱼ Xᵢⱼ*βⱼ
    3. A smooth and invertible link function g(·), which transforms the expectation of the response variable, μᵢ ≡ E(Yᵢ), to the linear predictor:
    g(μᵢ) = ηᵢ"

    sagepub.com/sites/default/file

    #models #dataDev #logNormal #regression #normality #normalDistribution #gamma #Γ #modelling #modeling #AIDev #ML #evaluation

  8. "A generalized linear model or consists of three components:
    1. A random component, specifying the conditional distribution of the response variable, Yᵢ (for the ith of n independently sampled observations). […]
    2. A linear predictor—that is a linear function of regressors,
    ηᵢ = α + Σⱼ Xᵢⱼ*βⱼ
    3. A smooth and invertible link function g(·), which transforms the expectation of the response variable, μᵢ ≡ E(Yᵢ), to the linear predictor:
    g(μᵢ) = ηᵢ"

    sagepub.com/sites/default/file

  9. "A generalized linear model or #GLM consists of three components:
    1. A random component, specifying the conditional distribution of the response variable, Yᵢ (for the ith of n independently sampled observations). […]
    2. A linear predictor—that is a linear function of regressors,
    ηᵢ = α + Σⱼ Xᵢⱼ*βⱼ
    3. A smooth and invertible link function g(·), which transforms the expectation of the response variable, μᵢ ≡ E(Yᵢ), to the linear predictor:
    g(μᵢ) = ηᵢ"

    sagepub.com/sites/default/file

    #models #dataDev #logNormal #regression #normality #normalDistribution #gamma #Γ #modelling #modeling #AIDev #ML #evaluation

  10. "A generalized linear model or #GLM consists of three components:
    1. A random component, specifying the conditional distribution of the response variable, Yᵢ (for the ith of n independently sampled observations). […]
    2. A linear predictor—that is a linear function of regressors,
    ηᵢ = α + Σⱼ Xᵢⱼ*βⱼ
    3. A smooth and invertible link function g(·), which transforms the expectation of the response variable, μᵢ ≡ E(Yᵢ), to the linear predictor:
    g(μᵢ) = ηᵢ"

    sagepub.com/sites/default/file

    #models #dataDev #logNormal #regression #normality #normalDistribution #gamma #Γ #modelling #modeling #AIDev #ML #evaluation

  11. Logistic regression may be used for classification.

    In order to preserve the convex nature for the loss function, a log-loss cost function has been designed for logistic regression. This cost function extremes at labels True and False.

    The gradient for the loss function of logistic regression comes out to have the same form of terms as the gradient for the Least Squared Error.

    More: baeldung.com/cs/gradient-desce

    #optimization #algebra #linearAlgebra #math #maths #mathematics #mathStodon #ML #dataScience #machineLearning #DeepLearning #neuralNetworks #NLP #modeling #modelling #models #dataDev #AIDev #regression #modelling #dataLearning #probabilities #logisticRegression #logLoss #sigmoid #classification #differentialCalculus #loss

  12. Logistic regression may be used for classification.

    In order to preserve the convex nature for the loss function, a log-loss cost function has been designed for logistic regression. This cost function extremes at labels True and False.

    The gradient for the loss function of logistic regression comes out to have the same form of terms as the gradient for the Least Squared Error.

    More: baeldung.com/cs/gradient-desce

    #optimization #algebra #linearAlgebra #math #maths #mathematics #mathStodon #ML #dataScience #machineLearning #DeepLearning #neuralNetworks #NLP #modeling #modelling #models #dataDev #AIDev #regression #modelling #dataLearning #probabilities #logisticRegression #logLoss #sigmoid #classification #differentialCalculus #loss

  13. Logistic regression may be used for classification.

    In order to preserve the convex nature for the loss function, a log-loss cost function has been designed for logistic regression. This cost function extremes at labels True and False.

    The gradient for the loss function of logistic regression comes out to have the same form of terms as the gradient for the Least Squared Error.

    More: baeldung.com/cs/gradient-desce

  14. Logistic regression may be used for classification.

    In order to preserve the convex nature for the loss function, a log-loss cost function has been designed for logistic regression. This cost function extremes at labels True and False.

    The gradient for the loss function of logistic regression comes out to have the same form of terms as the gradient for the Least Squared Error.

    More: baeldung.com/cs/gradient-desce

    #optimization #algebra #linearAlgebra #math #maths #mathematics #mathStodon #ML #dataScience #machineLearning #DeepLearning #neuralNetworks #NLP #modeling #modelling #models #dataDev #AIDev #regression #modelling #dataLearning #probabilities #logisticRegression #logLoss #sigmoid #classification #differentialCalculus #loss

  15. Logistic regression may be used for classification.

    In order to preserve the convex nature for the loss function, a log-loss cost function has been designed for logistic regression. This cost function extremes at labels True and False.

    The gradient for the loss function of logistic regression comes out to have the same form of terms as the gradient for the Least Squared Error.

    More: baeldung.com/cs/gradient-desce

    #optimization #algebra #linearAlgebra #math #maths #mathematics #mathStodon #ML #dataScience #machineLearning #DeepLearning #neuralNetworks #NLP #modeling #modelling #models #dataDev #AIDev #regression #modelling #dataLearning #probabilities #logisticRegression #logLoss #sigmoid #classification #differentialCalculus #loss

  16. 🎧 What’s trending on Spotify right now?

    Pulled this from the Spotify API at 16:57 PDT
    Next up: more genres, deeper trends, and maybe a map?
    Idk but it's a lot of fun when companies I enjoy let me analyze them.

  17. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  18. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  19. @[email protected] @[email protected] 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

  20. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  21. @data @datadon 🧵

    How to assess a statistical model?
    How to choose between variables?

    Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.

    If monotonic relationship:
    "#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
    "#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
    Ref: statisticseasily.com/kendall-t

    #normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions

  22. #DataViz Decision-Making Guide

    "How do you decide between #Plotly and #Seaborn?
    * If you need interactive and dynamic visualizations, especially for dashboards or 3D data, Plotly is the way to go.
    * If you’re focused on statistical analysis, creating publication-ready visuals, or conducting exploratory data analysis, Seaborn is likely your best choice."
    by Amit Yadav: medium.com/@amit25173/plotly-v

    #dataDev #retrieval #dataMining

  23. #DataViz Decision-Making Guide

    "How do you decide between #Plotly and #Seaborn?
    * If you need interactive and dynamic visualizations, especially for dashboards or 3D data, Plotly is the way to go.
    * If you’re focused on statistical analysis, creating publication-ready visuals, or conducting exploratory data analysis, Seaborn is likely your best choice."
    by Amit Yadav: medium.com/@amit25173/plotly-v

    #dataDev #retrieval #dataMining

  24. Decision-Making Guide

    "How do you decide between and ?
    * If you need interactive and dynamic visualizations, especially for dashboards or 3D data, Plotly is the way to go.
    * If you’re focused on statistical analysis, creating publication-ready visuals, or conducting exploratory data analysis, Seaborn is likely your best choice."
    by Amit Yadav: medium.com/@amit25173/plotly-v

  25. #DataViz Decision-Making Guide

    "How do you decide between #Plotly and #Seaborn?
    * If you need interactive and dynamic visualizations, especially for dashboards or 3D data, Plotly is the way to go.
    * If you’re focused on statistical analysis, creating publication-ready visuals, or conducting exploratory data analysis, Seaborn is likely your best choice."
    by Amit Yadav: medium.com/@amit25173/plotly-v

    #dataDev #retrieval #dataMining

  26. #DataViz Decision-Making Guide

    "How do you decide between #Plotly and #Seaborn?
    * If you need interactive and dynamic visualizations, especially for dashboards or 3D data, Plotly is the way to go.
    * If you’re focused on statistical analysis, creating publication-ready visuals, or conducting exploratory data analysis, Seaborn is likely your best choice."
    by Amit Yadav: medium.com/@amit25173/plotly-v

    #dataDev #retrieval #dataMining

  27. ´Technical people are blind to the fact they automatically solve dozens of problems every day in their regular workflow, any single one big enough to block another user for a few hours. Without even thinking about it.´

    ´There are usually two kinds of coders giving advises. A fresh one that has no idea how complex things really are, yet. Or an experienced one, that forgot it.´

    @bitecode bitecode.dev/p/why-not-tell-pe 🧵

    #dev #dataDev #install #anaconda #packages #Python #tech #packaging #complexity

  28. ´Technical people are blind to the fact they automatically solve dozens of problems every day in their regular workflow, any single one big enough to block another user for a few hours. Without even thinking about it.´

    ´There are usually two kinds of coders giving advises. A fresh one that has no idea how complex things really are, yet. Or an experienced one, that forgot it.´

    @bitecode bitecode.dev/p/why-not-tell-pe 🧵

    #dev #dataDev #install #anaconda #packages #Python #tech #packaging #complexity

  29. ´Technical people are blind to the fact they automatically solve dozens of problems every day in their regular workflow, any single one big enough to block another user for a few hours. Without even thinking about it.´

    ´There are usually two kinds of coders giving advises. A fresh one that has no idea how complex things really are, yet. Or an experienced one, that forgot it.´

    @bitecode bitecode.dev/p/why-not-tell-pe 🧵

  30. ´Technical people are blind to the fact they automatically solve dozens of problems every day in their regular workflow, any single one big enough to block another user for a few hours. Without even thinking about it.´

    ´There are usually two kinds of coders giving advises. A fresh one that has no idea how complex things really are, yet. Or an experienced one, that forgot it.´

    @bitecode bitecode.dev/p/why-not-tell-pe 🧵

    #dev #dataDev #install #anaconda #packages #Python #tech #packaging #complexity

  31. ´Technical people are blind to the fact they automatically solve dozens of problems every day in their regular workflow, any single one big enough to block another user for a few hours. Without even thinking about it.´

    ´There are usually two kinds of coders giving advises. A fresh one that has no idea how complex things really are, yet. Or an experienced one, that forgot it.´

    @bitecode bitecode.dev/p/why-not-tell-pe 🧵

    #dev #dataDev #install #anaconda #packages #Python #tech #packaging #complexity

  32. "The #gamma GLM is a relatively assumption-light means of #modeling non-negative data, given gamma's flexibility.
    […]
    "Explaining what is used and what is not used, despite merits and demerits […]: Loosely, the larger the internal literature in any field on modelling techniques, the less inclined people in that field seem to be to try something different."

    Nick Cox, 2013: stats.stackexchange.com/questi

    #normality #normalDistribution #Γ #modelling #dataDev #AIDev #ML #AIEvaluation #logNormal

  33. "The #gamma GLM is a relatively assumption-light means of #modeling non-negative data, given gamma's flexibility.
    […]
    "Explaining what is used and what is not used, despite merits and demerits […]: Loosely, the larger the internal literature in any field on modelling techniques, the less inclined people in that field seem to be to try something different."

    Nick Cox, 2013: stats.stackexchange.com/questi

    #normality #normalDistribution #Γ #modelling #dataDev #AIDev #ML #AIEvaluation #logNormal

  34. "The GLM is a relatively assumption-light means of non-negative data, given gamma's flexibility.
    […]
    "Explaining what is used and what is not used, despite merits and demerits […]: Loosely, the larger the internal literature in any field on modelling techniques, the less inclined people in that field seem to be to try something different."

    Nick Cox, 2013: stats.stackexchange.com/questi

  35. "The #gamma GLM is a relatively assumption-light means of #modeling non-negative data, given gamma's flexibility.
    […]
    "Explaining what is used and what is not used, despite merits and demerits […]: Loosely, the larger the internal literature in any field on modelling techniques, the less inclined people in that field seem to be to try something different."

    Nick Cox, 2013: stats.stackexchange.com/questi

    #normality #normalDistribution #Γ #modelling #dataDev #AIDev #ML #AIEvaluation #logNormal