#data-quality — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #data-quality, aggregated by home.social.
-
Mistaking Quantity for Quality in Tech and Life - Tech Field Day Podcast
@TechFieldDay @TechFieldDayPod @SFoskett @GuyCurriersFeed @DaveGraham #TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality -
Now that AI has enabled us to have an unlimited amount of content, generated on demand and instantly, we find ourselves questioning the quality of the output. 🤖 🎙️
🎙️ This episode of the Tech Field Day Podcast, recorded prior to AI Field Day by delegates Barbara Roos, Guy Currier, Dave Graham, and Stephen Foskett, considers this common trade-off.
#TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality
-
Missing [Survey, etc] Data Can Be A Geographic Phenomenon
--
https://doi.org/10.1080/24694452.2026.2640220 <-- shared paper
--
#GIS #mapping #spatial #DataScience #missing #data #spatial #AAG #autocorrelation #geographicallyweightedregression #GWR #imputation #missingdata #survey #surveynonresponse #incomplete #surveyquestions #ethnicity #income #spatialdata #alldataisspatial #UK #FinancialLives #geography #spatialanalysis #geostatistics #location #imputing #statistics #dataset #DataImputation #MissingData #DataCleaning #DataPreprocessing #DataWrangling #DataQuality #DataEngineering #FinancialData #FinancialAnalytics #FinincialModeling #FinDataScience -
This week we were discussing the main challenges of Machine Learning in the #KDAI2026 lecture. It should be very obvious that "bad data quality leads to bad results" :)
However, we were also talking about insufficient number of data, non-representative data, irrelevant features, overfitting and various forms of bias.@fiz_karlsruhe #AI #machinelearning #unicorn #dataquality #lecture #datascience
-
LintedData is a linter for RDF and Ontologies for easy use in CI pipelines, we recently released. It checks for common violations of best practices in ontology engineering.
GitLab: https://gitlab.com/dlr-dw/linteddata/
Docker: https://hub.docker.com/r/dlrdw/linteddata/Today I present LintedData at the Helmholtz Metadata Conference 2026 demo session.
Abstract & Poster: https://elib.dlr.de/223803/#RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering #HMC2026 @helmholtz_hmc
-
#GESISblog #blog #KODAQS #DataQuality #DBD #DigitalBehavioralData
New on the GESIS Blog: Part 2 of our blog series on the KODQAS Toolbox: Digital Behavioral DataIn the first blog post of the KODAQS Toolbox series, we discussed how data quality issues can affect survey data. Similar challenges arise in digital behavioral data (DBD), though they often manifest differently.
-
Why did people stop responding to federal economic surveys?
https://www.brookings.edu/articles/why-did-people-stop-responding-to-federal-economic-surveys-what-can-be-done/
Declining response rates challenge the precision and bias of economic indicators like unemployment. Surveys remain vital for capturing nuances, such as job-seeking intent, that administrative data cannot track.
Strong data stewardship and reduced respondent burden are necessary to sustain the statistical system.
#surveymethodology #economics #statistics #dataquality #nonresponsebias -
DQaaC embeds testing into pipelines using known tools to ensure reliable, scalable data systems. https://hackernoon.com/automated-data-quality-as-code #dataquality
-
Stop Publishing Garbage Data, It's Embarrassing
https://successfulsoftware.net/2026/03/29/stop-publishing-garbage-data-its-embarrassing/
#HackerNews #StopPublishingGarbageData #DataQuality #DataIntegrity #DataManagement #EmbarrassingData
-
If you are interested in #dataquality and #ai, then this free webinar tomorrow may be of interest/ use: https://www.eventbrite.co.uk/e/author-webinar-series-data-quality-in-an-ai-world-with-julian-and-tim-tickets-1981419239283
-
Executives keep asking "How do we use AI to make better decisions?"
The honest answer: clean your data first. Deduplicate your contacts. Reconcile the three spreadsheets tracking the same metrics with different definitions.
Nobody wants to hear that. So they build dashboards on top of garbage and blame the model when outputs are incoherent.
The intelligence was never the bottleneck.
-
Data observability monitors nulls, drift, and freshness, catching pipeline issues before they corrupt dashboards, models, or business decisions. https://hackernoon.com/building-data-observability-monitoring-nulls-drift-freshness-and-business-impact #dataquality
-
🧵 Poor data quality rarely announces itself loudly.
Are you safe, or can you spot some warning signs in our guide? 👇
-
🚀 Die 8. Ausgabe der Reihe „Workshop Retrodigitalisierung“ findet vom 19.–20.03. im Haus Unter den Linden (Berlin), und unter dem Oberthema „Digitalisierung für die Ewigkeit? – Datenqualität in der Praxis“ statt.
👉 Anmeldungen sind noch bis zum 11.03. möglich: https://pretix.eu/StaatsbibliothekZuBerlin/WS-Retrodigi/
✨ Die @stabi_berlin richtet den Kurs gemeinsam mit @tibhannover, @ZBMED, ZBW – Leibniz-Informationszentrum Wirtschaft und @nfdi4culture aus.
-
The 1-10-100 Rule of data:
$1 to fix it at the source.
$10 to clean it in a report.
$100 to fix the damage of a bad decision made because of it.Most orgs choose the $100 path because their internal processes are red-lining. You don't need more data; you need less friction.
Get your Strategy Triage Score:
https://shaolindataservices.com/#diagnostic#ROI #DataQuality #Business #Efficiency
-
Databricks just showed that clean, deduped data beats fancy model tweaks for faster LLMs. Their paper reveals a simple data pipeline—language filtering, deduplication, and high‑quality datasets—outperforms architecture tweaks on GPU training. Curious how to boost speed without extra compute? Dive in. #LLMTraining #DataQuality #Databricks #Deduplication
🔗 https://aidailypost.com/news/databricks-paper-finds-data-quality-outweighs-model-architecture-llm
-
Nice To Be 'Back In The Data Trenches' Directly
--
https://www.usgs.gov/3d-hydrography-program <-- shared U.S. Geological Survey (USGS) #3DHP home page
--
I don't get to be here that often (and that is OK, #ILikeMyJob) - but it is nice to be back working directly with some individual spatial datasets - be it in #QGIS, #ArcGISPro, #PostgreSQL / #PostGIS or wherever...
#GIS #spatial #mapping #water #dataquality #water #hydrography #3DHP #NHD #opendata #USA #MI #reservoirs #dams #pluvial #fluvial #usecase #SQL -
The Power Of Using A Story For Better Data Comprehension And Hence Decision Making
--
https://doi.org/10.1080/15228053.2021.2016151 <-- shared book review, “Data Story: Explain Data And Inspire Action Through Story”
--
[I encountered this excellent graphic from @saurabh Rai, and went and explored the ideas put so succinctly here; I found, well, a technical story overview (link above) to ‘match’; however, this should not be considered an endorsement of this book]
#data #storytelling #data #comprehension #presentation #story #frameworks #context #setting #dataquality #communication #usecase #robustness #insights #correctness #decisionmaking #narratives #decisions -
https://www.infoworld.com/article/4128925/ai-augmented-data-quality-engineering.html
« How deep learning, generative models and trust scoring are transforming modern data systems. »
-
Can researchers detect #AI bots taking paid surveys?
#Prolific tested humans and #LLM agents with various #dataQuality checks.
- The company says they caught 100% of the non-humans.
- My take-away: #reCAPTCHA and #mouseTracking caught 95% -
Effiziente Datenerfassung mit #QGIS durch smartes #Formulardesign⚡️
🖥️ Gute #DataQuality entsteht bei der Eingabe – nicht erst bei der Auswertung.
💪 Stefan Giese zeigt auf der #FOSSGIS2026, wie Sie Eingabeprozesse intuitiv und sicher gestalten.
📑 Workshop-Inhalte:
• Klare Oberflächen durch Register & Gruppen
• Dynamik via QGIS-Expressions
• Fehlervermeidung durch Constraints & Wertelisten📅 Mi, 25.03. | 16:30 Uhr
📍 Raum WS3 -
@tero #dataquality is key to much of AI. The old 'Garbage In, Garbage Out' maxim has never been more relevant. However, sadly, for many organisations, it has become 'Garbage In, Gospel Out' where the outputs are overly trusted because there is too much belief in the infallibility of the tech
-
Die im Projekt AQinDa entwickelte Webanwendung Constrainify hilft dabei, Datenqualität zu spezifizieren und zu analysieren, ohne dass dafür fortgeschrittenes technisches Know-How erforderlich ist. Unsere Kolleg*innen haben dazu im Rahmen eines Workshops eine Einführung verfasst: https://doi.org/10.5281/zenodo.18430881
#dataquality #metadata #datascience #digitalhumanities #vzg #gbv #AQinDa