#datadev — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #datadev, aggregated by home.social.
-
About metrics for measuring agreement on regression on continuous datasets:
Reasons to avoid R² and use RMSE instead: https://feat.engineering/03-Review_of_the_Modeling_Process.html#sec-reg-metricsFrom Max Kuhn @topepo, Kjell Johnson (2026), "Feature Engineering and Selection: A Practical Approach for Predictive Models"
#prediction #dataDev #modelEvaluation #regression #modelling #linearRegression #modeling #probability #probabilities #statistics #stats #gotcha
-
"A generalized linear model or #GLM consists of three components:
1. A random component, specifying the conditional distribution of the response variable, Yᵢ (for the ith of n independently sampled observations). […]
2. A linear predictor—that is a linear function of regressors,
ηᵢ = α + Σⱼ Xᵢⱼ*βⱼ
3. A smooth and invertible link function g(·), which transforms the expectation of the response variable, μᵢ ≡ E(Yᵢ), to the linear predictor:
g(μᵢ) = ηᵢ"https://www.sagepub.com/sites/default/files/upm-binaries/21121_Chapter_15.pdf
#models #dataDev #logNormal #regression #normality #normalDistribution #gamma #Γ #modelling #modeling #AIDev #ML #evaluation
-
Logistic regression may be used for classification.
In order to preserve the convex nature for the loss function, a log-loss cost function has been designed for logistic regression. This cost function extremes at labels True and False.
The gradient for the loss function of logistic regression comes out to have the same form of terms as the gradient for the Least Squared Error.
More: https://www.baeldung.com/cs/gradient-descent-logistic-regression
#optimization #algebra #linearAlgebra #math #maths #mathematics #mathStodon #ML #dataScience #machineLearning #DeepLearning #neuralNetworks #NLP #modeling #modelling #models #dataDev #AIDev #regression #modelling #dataLearning #probabilities #logisticRegression #logLoss #sigmoid #classification #differentialCalculus #loss
-
Accuracy! To counter regression dilution, a method is to add a constraint on the statistical modeling.
Regression Redress restrains bias by segregating the residual values.
My article: http://data.yt/kit/regression-redress.html#bias #modeling #dataDev #AIDev #modelEvaluation #regression #modelling #dataLearning #linearRegression #probability #probabilities #statistics #stats #correctionRatio #ML #distributions #accuracy #RegressionRedress #Python #RStats
-
How to assess a statistical model?
How to choose between variables?Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.
If monotonic relationship:
"#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
"#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
Ref: https://statisticseasily.com/kendall-tau-b-vs-spearman/#normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions
-
How to assess a statistical model?
How to choose between variables?Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.
If monotonic relationship:
"#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
"#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
Ref: https://statisticseasily.com/kendall-tau-b-vs-spearman/#normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions
-
@[email protected] @[email protected] 🧵
How to assess a statistical model?
How to choose between variables?Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.
If monotonic relationship:
"#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
"#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
Ref: https://statisticseasily.com/kendall-tau-b-vs-spearman/#normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions
-
How to assess a statistical model?
How to choose between variables?Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.
If monotonic relationship:
"#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
"#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
Ref: https://statisticseasily.com/kendall-tau-b-vs-spearman/#normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions
-
How to assess a statistical model?
How to choose between variables?Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.
If monotonic relationship:
"#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
"#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
Ref: https://statisticseasily.com/kendall-tau-b-vs-spearman/#normality #normalDistribution #modeling #dataDev #AIDev #ML #modelEvaluation #regression #modelling #dataLearning #featureEngineering #linearRegression #modeling #probability #probabilities #statistics #stats #correctionRatio #ML #Pearson #bias #regressionRedress #distributions
-
Redressing #Bias: "Correlation Constraints for Regression Models":
Treder et al (2021) https://doi.org/10.3389/fpsyt.2021.615754#dataDev #linearRegression #modeling #probability #probabilities #statistics #stats #modelling #regression #correctionRatio #skLearn #scikitLearn #python #AIDev
-
#DataViz on two requirements:
* zooming, panning and rescaling
* shareable dashboards"Plotly vs. Bokeh: Interactive Python Visualisation Pros and Cons", by Dr Paul Iacomi: https://pauliacomi.com/2020/06/07/plotly-v-bokeh.html
#dataDev #retrieval #dataMining #plotly #Dash #Bokeh #python #dataInteraction #data #dataDon #widgets #ipython #jupyter #dashboards #businessIntelligence
-
#Lasso #LinearRegression "is useful in some contexts due to its tendency to prefer solutions with fewer non-zero coefficients, effectively reducing the number of features upon which the given solution is dependent"
https://scikit-learn.org/stable/modules/linear_model.html#lasso 🧵
-
@data "practitioners can leverage #LASSO regression to construct more interpretable and predictive models that excel in scenarios involving high-dimensional data and intricate feature relationships."
https://datasciencedecoded.com/posts/12_LASSO_Regression_Feature_Selection_Predictive_Models
-
Unfilled cells influence models.
"Handling Missing Data in Machine Learning": https://ml-nn.eu/a1/51.html by Calin Sandu @mlnn#missingData #bias #wealth #dataQuality #complexity #dataDev #machineLearning #dataPrep #EDA #dataWrangling
-
Unfilled cells influence models.
"Handling Missing Data in Machine Learning": https://ml-nn.eu/a1/51.html by Calin Sandu @mlnn#missingData #bias #wealth #dataQuality #complexity #dataDev #machineLearning #dataPrep #EDA #dataWrangling
-
Unfilled cells influence models.
"Handling Missing Data in Machine Learning": https://ml-nn.eu/a1/51.html by Calin Sandu @mlnn#missingData #bias #wealth #dataQuality #complexity #dataDev #machineLearning #dataPrep #EDA #dataWrangling
-
Unfilled cells influence models.
"Handling Missing Data in Machine Learning": https://ml-nn.eu/a1/51.html by Calin Sandu @mlnn#missingData #bias #wealth #dataQuality #complexity #dataDev #machineLearning #dataPrep #EDA #dataWrangling
-
Unfilled cells influence models.
"Handling Missing Data in Machine Learning": https://ml-nn.eu/a1/51.html by Calin Sandu @mlnn#missingData #bias #wealth #dataQuality #complexity #dataDev #machineLearning #dataPrep #EDA #dataWrangling
-
A categorical variable takes on a limited number of values.
The categorical #dataType is useful in the following cases:
- A string variable consisting of only some values. df[["label"]].astype("category") saves memory.
- The lexical order is not the same as the logical order (“one”, “two”, “three”). Sorting and min/max will use the logical order.
- As a signal to other libraries to treat as a category.More: https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html