#data-quality — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #data-quality, aggregated by home.social.
-
https://futurism.com/artificial-intelligence/ai-companies-contractors-slop-data-training “If these companies want quality data, then they should offer quality contracts” #AIBubble #PoisonAI #DataQuality #CopyOfACopy
-
https://futurism.com/artificial-intelligence/ai-companies-contractors-slop-data-training “If these companies want quality data, then they should offer quality contracts” #AIBubble #PoisonAI #DataQuality #CopyOfACopy
-
Habe etwas im Datenportal der EU nachgesehen. Dabei habe ich zufällig einen Blick auf die Metadatenqualität eines unserer Datensätze geworfen. Excellent 😀
Nun muss ich nur noch herausfinden, warum die Kontaktinformation nicht richtig ankommen.
https://data.europa.eu/data/datasets/c0b506d1-57ba-4088-a257-0d8244256248/quality
-
Habe etwas im Datenportal der EU nachgesehen. Dabei habe ich zufällig einen Blick auf die Metadatenqualität eines unserer Datensätze geworfen. Excellent 😀
Nun muss ich nur noch herausfinden, warum die Kontaktinformation nicht richtig ankommen.
https://data.europa.eu/data/datasets/c0b506d1-57ba-4088-a257-0d8244256248/quality
-
All data is wrong, but some data is wrong on multiple levels. That's why the Data Validation Report Format (DVRF) support precise error locations and nested errors. Version 1.0.0 of the specification has just been published at https://doi.org/10.5281/zenodo.20792191 and https://gbv.github.io/data-validation-report-format/ #dataquality
-
All data is wrong, but some data is wrong on multiple levels. That's why the Data Validation Report Format (DVRF) support precise error locations and nested errors. Version 1.0.0 of the specification has just been published at https://doi.org/10.5281/zenodo.20792191 and https://gbv.github.io/data-validation-report-format/ #dataquality
-
🤖 Gli agenti AI non falliscono per budget limitati, ma per dati incoerenti: senza qualità, anche l’automazione più avanzata perde valore. #AI #DataQuality
🔗 https://www.tomshw.it/business/agenti-ai-dati-infrastruttura-confluent-2026
-
Your £2M Data Problem Becomes a £20M AI Risk by 2030
Subject: The £2M problem that becomes £20M in 2030 Hi Why AI amplification will separate survivors from casualties. By 2030, AI, quantum computing, and IoT will converge into an integrated technological ecosystem. If your data foundation is broken today, the convergence will not save you—it will destroy you. I've spent 20+ years watching organizations invest billions in transformation while ignoring the one thing that determines success: their data foundation. I've seen the £15M transformation disasters. The £7M costs of fear-driven silence. The £2M annual bleeds from "just how things work." But 2030 changes everything. Because AI doesn't just use your data. It amplifies it. Let me show you what this looks like in practice. Most organisations already live with data issues. Inconsistent definitions. Missing fields. Duplicates. Stale records. Quiet reshaping of data as it moves between systems. Today, many of these problems are contained because humans sit in the loop. An analyst questions a number. A manager challenges a report. Someone spots something that doesn't feel right before it triggers major action. As we move toward 2030, that changes. AI-enabled workflows increasingly: make decisions automatically, trigger downstream actions automatically—orders, pricing, eligibility, routing, fraud controls, operate continuously rather than weekly or monthly, rely on multiple systems and external data feeds. In plain terms: the same error creates more consequences before anyone notices. Read more in this blog and my book https://lizhendersondata.wordpress.com/your-unseen/ Best wishes Liz Henderson - Data Queen https://lizhendersondata.wordpress.com/your-unseen/https://lizhendersondata.wordpress.com/2026/06/22/your-2m-data-problem/
-
Your £2M Data Problem Becomes a £20M AI Risk by 2030
Subject: The £2M problem that becomes £20M in 2030 Hi Why AI amplification will separate survivors from casualties. By 2030, AI, quantum computing, and IoT will converge into an integrated technological ecosystem. If your data foundation is broken today, the convergence will not save you—it will destroy you. I've spent 20+ years watching organizations invest billions in transformation while ignoring the one thing that determines success: their data foundation. I've seen the £15M transformation disasters. The £7M costs of fear-driven silence. The £2M annual bleeds from "just how things work." But 2030 changes everything. Because AI doesn't just use your data. It amplifies it. Let me show you what this looks like in practice. Most organisations already live with data issues. Inconsistent definitions. Missing fields. Duplicates. Stale records. Quiet reshaping of data as it moves between systems. Today, many of these problems are contained because humans sit in the loop. An analyst questions a number. A manager challenges a report. Someone spots something that doesn't feel right before it triggers major action. As we move toward 2030, that changes. AI-enabled workflows increasingly: make decisions automatically, trigger downstream actions automatically—orders, pricing, eligibility, routing, fraud controls, operate continuously rather than weekly or monthly, rely on multiple systems and external data feeds. In plain terms: the same error creates more consequences before anyone notices. Read more in this blog and my book https://lizhendersondata.wordpress.com/your-unseen/ Best wishes Liz Henderson - Data Queen https://lizhendersondata.wordpress.com/your-unseen/https://lizhendersondata.wordpress.com/2026/06/22/your-2m-data-problem/
-
Data quality is a pipeline problem, not a form fix. Learn how developers can enforce quality through profiling, matching, and workflow automation at scale. https://hackernoon.com/building-data-quality-into-the-pipeline-instead-of-cleaning-up-after-it #dataquality
-
Data quality is a pipeline problem, not a form fix. Learn how developers can enforce quality through profiling, matching, and workflow automation at scale. https://hackernoon.com/building-data-quality-into-the-pipeline-instead-of-cleaning-up-after-it #dataquality
-
🤖 Can #AI predict the quality of a survey question?
Our new study shows that a fine-tuned multilingual transformer can match the performance of SQP's traditional prediction approach—directly from question text.
📄 Yang, Schonlau, Repke, Felderer, & Sucholutsky (2026)
https://doi.org/10.1093/jrsssa/qnag058*SQP = The Survey Quality Predictor is a web-based tool that predicts the measurement quality of survey questions (https://sqp.gesis.org).
-
🤖 Can #AI predict the quality of a survey question?
Our new study shows that a fine-tuned multilingual transformer can match the performance of SQP's traditional prediction approach—directly from question text.
📄 Yang, Schonlau, Repke, Felderer, & Sucholutsky (2026)
https://doi.org/10.1093/jrsssa/qnag058*SQP = The Survey Quality Predictor is a web-based tool that predicts the measurement quality of survey questions (https://sqp.gesis.org).
-
Strong data catalogs transform the way teams think. They shift work from guesswork to clear action. What’s your take? #DataCatalogs #SingleSourceOfTruth #DataQuality #DataTrust #DataOps #DataStrategy #AIReadyData #DataFuture
https://www.linkedin.com/pulse/building-data-catalogs-quiet-power-behind-single-truth-mohindroo--icosc -
SkaldMaps now shows you a very neat little confidence indicator on how accurate the attribute you're viewing for a particular geography is: Find our how we do it on the brand new 🎉blog 🎉: https://skaldmaps.com/blog/2026/06/building-confidence-in-geospatial-data/
#RealEstate #GIS #DataEngineering #DataScience #Blog #DataQuality
-
SkaldMaps now shows you a very neat little confidence indicator on how accurate the attribute you're viewing for a particular geography is: Find our how we do it on the brand new 🎉blog 🎉: https://skaldmaps.com/blog/2026/06/building-confidence-in-geospatial-data/
#RealEstate #GIS #DataEngineering #DataScience #Blog #DataQuality
-
AI needs clean data 🤖
The quality of AI outputs depends on the quality of the data behind them.
📊 Better data
→ Better outcomes⚠️ Poor data
→ Increased riskMSPs implementing AI must continuously govern and maintain data quality.
𝐇𝐀𝐋𝐄𝐗𝐎 𝐏𝐎𝐕:
Data readiness is critical for AI success.
▸ Swipe
-
AI needs clean data 🤖
The quality of AI outputs depends on the quality of the data behind them.
📊 Better data
→ Better outcomes⚠️ Poor data
→ Increased riskMSPs implementing AI must continuously govern and maintain data quality.
𝐇𝐀𝐋𝐄𝐗𝐎 𝐏𝐎𝐕:
Data readiness is critical for AI success.
▸ Swipe
-
LintedData 3.0.0 is released:
➡️ https://gitlab.com/dlr-dw/linteddata/-/releases/v3.0.0LintedData is a linter for RDF and Ontologies for easy use in CI pipelines. It checks for common violations of best practices in ontology engineering.
Version 3.0.0 enables checking of multiple RDF files with one execution, reduces the need for configuration, and improves or fixes many checks.
#RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering
-
LintedData 3.0.0 is released:
➡️ https://gitlab.com/dlr-dw/linteddata/-/releases/v3.0.0LintedData is a linter for RDF and Ontologies for easy use in CI pipelines. It checks for common violations of best practices in ontology engineering.
Version 3.0.0 enables checking of multiple RDF files with one execution, reduces the need for configuration, and improves or fixes many checks.
#RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering
-
Cloud data control and cloud speed can work in one frame. Here’s how leaders can strike that balance with clarity and purpose. #CloudGovernance #DataTrust #CloudStrategy #DataQuality #TechLeadership
https://www.linkedin.com/pulse/cloud-data-governance-real-art-balance-fast-digital-world-mohindroo--c2ozc -
But is it useful?: The Cloud-Native Geospatial Forum proposes #usefulness as a better measure than openness for #dataQuality, with a 5-dimension, 4-star framework that goes beyond familiar schemes like #FAIR and 5-star Open Data. It’s an interesting read and a good...
https://spatialists.ch/posts/2026/05/31-but-is-it-useful/ #GIS #GISchat #geospatial #SwissGIS -
But is it useful?: The Cloud-Native Geospatial Forum proposes #usefulness as a better measure than openness for #dataQuality, with a 5-dimension, 4-star framework that goes beyond familiar schemes like #FAIR and 5-star Open Data. It’s an interesting read and a good...
https://spatialists.ch/posts/2026/05/31-but-is-it-useful/ #GIS #GISchat #geospatial #SwissGIS -
Collecting, processing and making sense of the public data can be a cumbersome exercise and there are important #dataquality issues to overcome before the results are meaningful.
For this purpose we are developing functionality within the #opensource #Equinox platform to support this task.
In this post there is a brief description of the data models involved with some sample data.
https://www.openriskmanagement.com/using-equinox-to-record-data-center-environmental-impact-data/
The code is here:
-
Collecting, processing and making sense of the public data can be a cumbersome exercise and there are important #dataquality issues to overcome before the results are meaningful.
For this purpose we are developing functionality within the #opensource #Equinox platform to support this task.
In this post there is a brief description of the data models involved with some sample data.
https://www.openriskmanagement.com/using-equinox-to-record-data-center-environmental-impact-data/
The code is here:
-
How complete and reliable is #OpenStreetMap data in your region?
We are introducing country reports on the Humanitarian Data Exchange platform.
The new reports show #OSM #dataquality across a country & identify where important gaps remain. They provide insights on roads, buildings, currentness, and spatial distribution.
The reports are powered by the ohsome quality API & are also available as datasets in CSV & GPKG formats.
More topics are planned.
-
How complete and reliable is #OpenStreetMap data in your region?
We are introducing country reports on the Humanitarian Data Exchange platform.
The new reports show #OSM #dataquality across a country & identify where important gaps remain. They provide insights on roads, buildings, currentness, and spatial distribution.
The reports are powered by the ohsome quality API & are also available as datasets in CSV & GPKG formats.
More topics are planned.
-
@RejoinEU I have worked in data quality for many years. Much of what needs to be done is 'grunt work' - technology cannot assess the accuracy of data etc. Plus transactional data happened at a point in time - unless there is an independent record of that transaction, it is extremely difficult to improve the quality of its data #dataquality
-
@RejoinEU I have worked in data quality for many years. Much of what needs to be done is 'grunt work' - technology cannot assess the accuracy of data etc. Plus transactional data happened at a point in time - unless there is an independent record of that transaction, it is extremely difficult to improve the quality of its data #dataquality
-
Mistaking Quantity for Quality in Tech and Life - Tech Field Day Podcast
@TechFieldDay @TechFieldDayPod @SFoskett @GuyCurriersFeed @DaveGraham #TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality -
Now that AI has enabled us to have an unlimited amount of content, generated on demand and instantly, we find ourselves questioning the quality of the output. 🤖 🎙️
🎙️ This episode of the Tech Field Day Podcast, recorded prior to AI Field Day by delegates Barbara Roos, Guy Currier, Dave Graham, and Stephen Foskett, considers this common trade-off.
#TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality
-
Mistaking Quantity for Quality in Tech and Life - Tech Field Day Podcast
@TechFieldDay @TechFieldDayPod @SFoskett @GuyCurriersFeed @DaveGraham #TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality -
Now that AI has enabled us to have an unlimited amount of content, generated on demand and instantly, we find ourselves questioning the quality of the output. 🤖 🎙️
🎙️ This episode of the Tech Field Day Podcast, recorded prior to AI Field Day by delegates Barbara Roos, Guy Currier, Dave Graham, and Stephen Foskett, considers this common trade-off.
#TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality
-
Missing [Survey, etc] Data Can Be A Geographic Phenomenon
--
https://doi.org/10.1080/24694452.2026.2640220 <-- shared paper
--
#GIS #mapping #spatial #DataScience #missing #data #spatial #AAG #autocorrelation #geographicallyweightedregression #GWR #imputation #missingdata #survey #surveynonresponse #incomplete #surveyquestions #ethnicity #income #spatialdata #alldataisspatial #UK #FinancialLives #geography #spatialanalysis #geostatistics #location #imputing #statistics #dataset #DataImputation #MissingData #DataCleaning #DataPreprocessing #DataWrangling #DataQuality #DataEngineering #FinancialData #FinancialAnalytics #FinincialModeling #FinDataScience -
Missing [Survey, etc] Data Can Be A Geographic Phenomenon
--
https://doi.org/10.1080/24694452.2026.2640220 <-- shared paper
--
#GIS #mapping #spatial #DataScience #missing #data #spatial #AAG #autocorrelation #geographicallyweightedregression #GWR #imputation #missingdata #survey #surveynonresponse #incomplete #surveyquestions #ethnicity #income #spatialdata #alldataisspatial #UK #FinancialLives #geography #spatialanalysis #geostatistics #location #imputing #statistics #dataset #DataImputation #MissingData #DataCleaning #DataPreprocessing #DataWrangling #DataQuality #DataEngineering #FinancialData #FinancialAnalytics #FinincialModeling #FinDataScience -
This week we were discussing the main challenges of Machine Learning in the #KDAI2026 lecture. It should be very obvious that "bad data quality leads to bad results" :)
However, we were also talking about insufficient number of data, non-representative data, irrelevant features, overfitting and various forms of bias.@fiz_karlsruhe #AI #machinelearning #unicorn #dataquality #lecture #datascience
-
This week we were discussing the main challenges of Machine Learning in the #KDAI2026 lecture. It should be very obvious that "bad data quality leads to bad results" :)
However, we were also talking about insufficient number of data, non-representative data, irrelevant features, overfitting and various forms of bias.@fiz_karlsruhe #AI #machinelearning #unicorn #dataquality #lecture #datascience
-
LintedData is a linter for RDF and Ontologies for easy use in CI pipelines, we recently released. It checks for common violations of best practices in ontology engineering.
GitLab: https://gitlab.com/dlr-dw/linteddata/
Docker: https://hub.docker.com/r/dlrdw/linteddata/Today I present LintedData at the Helmholtz Metadata Conference 2026 demo session.
Abstract & Poster: https://doi.org/10.5281/zenodo.20024644 or https://elib.dlr.de/223803/#RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering #HMC2026 @helmholtz_hmc
-
LintedData is a linter for RDF and Ontologies for easy use in CI pipelines, we recently released. It checks for common violations of best practices in ontology engineering.
GitLab: https://gitlab.com/dlr-dw/linteddata/
Docker: https://hub.docker.com/r/dlrdw/linteddata/Today I present LintedData at the Helmholtz Metadata Conference 2026 demo session.
Abstract & Poster: https://doi.org/10.5281/zenodo.20024644 or https://elib.dlr.de/223803/#RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering #HMC2026 @helmholtz_hmc
-
#GESISblog #blog #KODAQS #DataQuality #DBD #DigitalBehavioralData
New on the GESIS Blog: Part 2 of our blog series on the KODQAS Toolbox: Digital Behavioral DataIn the first blog post of the KODAQS Toolbox series, we discussed how data quality issues can affect survey data. Similar challenges arise in digital behavioral data (DBD), though they often manifest differently.
-
#GESISblog #blog #KODAQS #DataQuality #DBD #DigitalBehavioralData
New on the GESIS Blog: Part 2 of our blog series on the KODQAS Toolbox: Digital Behavioral DataIn the first blog post of the KODAQS Toolbox series, we discussed how data quality issues can affect survey data. Similar challenges arise in digital behavioral data (DBD), though they often manifest differently.
-
Why did people stop responding to federal economic surveys?
https://www.brookings.edu/articles/why-did-people-stop-responding-to-federal-economic-surveys-what-can-be-done/
Declining response rates challenge the precision and bias of economic indicators like unemployment. Surveys remain vital for capturing nuances, such as job-seeking intent, that administrative data cannot track.
Strong data stewardship and reduced respondent burden are necessary to sustain the statistical system.
#surveymethodology #economics #statistics #dataquality #nonresponsebias -
Why did people stop responding to federal economic surveys?
https://www.brookings.edu/articles/why-did-people-stop-responding-to-federal-economic-surveys-what-can-be-done/
Declining response rates challenge the precision and bias of economic indicators like unemployment. Surveys remain vital for capturing nuances, such as job-seeking intent, that administrative data cannot track.
Strong data stewardship and reduced respondent burden are necessary to sustain the statistical system.
#surveymethodology #economics #statistics #dataquality #nonresponsebias -
Accurate CRM Data Through Data Enrichment Company Services
Incomplete data reduces the effectiveness of CRM and analytics. Data cleansing removes errors and standardizes formats. A data enrichment company enhances records with validated business information, helping teams improve segmentation, reporting, and customer engagement strategies.
Know more: https://www.hitechdigital.com/data-cleansing-and-enrichment-services
#DataCleansingServices #DataEnrichment #CRMDataCleansing #DataQuality #B2BData #DataManagement #DataDriven
-
DQaaC embeds testing into pipelines using known tools to ensure reliable, scalable data systems. https://hackernoon.com/automated-data-quality-as-code #dataquality