home.social

#data-quality — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #data-quality, aggregated by home.social.

fetched live
  1. Habe etwas im Datenportal der EU nachgesehen. Dabei habe ich zufällig einen Blick auf die Metadatenqualität eines unserer Datensätze geworfen. Excellent 😀

    Nun muss ich nur noch herausfinden, warum die Kontaktinformation nicht richtig ankommen.

    data.europa.eu/data/datasets/c

    #OpenData #DataQuality #metadata

  2. Habe etwas im Datenportal der EU nachgesehen. Dabei habe ich zufällig einen Blick auf die Metadatenqualität eines unserer Datensätze geworfen. Excellent 😀

    Nun muss ich nur noch herausfinden, warum die Kontaktinformation nicht richtig ankommen.

    data.europa.eu/data/datasets/c

    #OpenData #DataQuality #metadata

  3. All data is wrong, but some data is wrong on multiple levels. That's why the Data Validation Report Format (DVRF) support precise error locations and nested errors. Version 1.0.0 of the specification has just been published at doi.org/10.5281/zenodo.20792191 and gbv.github.io/data-validation- #dataquality

  4. All data is wrong, but some data is wrong on multiple levels. That's why the Data Validation Report Format (DVRF) support precise error locations and nested errors. Version 1.0.0 of the specification has just been published at doi.org/10.5281/zenodo.20792191 and gbv.github.io/data-validation- #dataquality

  5. 🤖 Gli agenti AI non falliscono per budget limitati, ma per dati incoerenti: senza qualità, anche l’automazione più avanzata perde valore. #AI #DataQuality

    🔗 tomshw.it/business/agenti-ai-d

  6. Your £2M Data Problem Becomes a £20M AI Risk by 2030

    Subject: The £2M problem that becomes £20M in 2030 Hi Why AI amplification will separate survivors from casualties. By 2030, AI, quantum computing, and IoT will converge into an integrated technological ecosystem. If your data foundation is broken today, the convergence will not save you—it will destroy you. I've spent 20+ years watching organizations invest billions in transformation while ignoring the one thing that determines success: their data foundation. I've seen the £15M transformation disasters. The £7M costs of fear-driven silence. The £2M annual bleeds from "just how things work." But 2030 changes everything. Because AI doesn't just use your data. It amplifies it. Let me show you what this looks like in practice. Most organisations already live with data issues. Inconsistent definitions. Missing fields. Duplicates. Stale records. Quiet reshaping of data as it moves between systems. Today, many of these problems are contained because humans sit in the loop. An analyst questions a number. A manager challenges a report. Someone spots something that doesn't feel right before it triggers major action. As we move toward 2030, that changes. AI-enabled workflows increasingly: make decisions automatically, trigger downstream actions automatically—orders, pricing, eligibility, routing, fraud controls, operate continuously rather than weekly or monthly, rely on multiple systems and external data feeds. In plain terms: the same error creates more consequences before anyone notices. Read more in this blog and my book https://lizhendersondata.wordpress.com/your-unseen/ Best wishes Liz Henderson - Data Queen https://lizhendersondata.wordpress.com/your-unseen/

    lizhendersondata.wordpress.com

  7. Your £2M Data Problem Becomes a £20M AI Risk by 2030

    Subject: The £2M problem that becomes £20M in 2030 Hi Why AI amplification will separate survivors from casualties. By 2030, AI, quantum computing, and IoT will converge into an integrated technological ecosystem. If your data foundation is broken today, the convergence will not save you—it will destroy you. I've spent 20+ years watching organizations invest billions in transformation while ignoring the one thing that determines success: their data foundation. I've seen the £15M transformation disasters. The £7M costs of fear-driven silence. The £2M annual bleeds from "just how things work." But 2030 changes everything. Because AI doesn't just use your data. It amplifies it. Let me show you what this looks like in practice. Most organisations already live with data issues. Inconsistent definitions. Missing fields. Duplicates. Stale records. Quiet reshaping of data as it moves between systems. Today, many of these problems are contained because humans sit in the loop. An analyst questions a number. A manager challenges a report. Someone spots something that doesn't feel right before it triggers major action. As we move toward 2030, that changes. AI-enabled workflows increasingly: make decisions automatically, trigger downstream actions automatically—orders, pricing, eligibility, routing, fraud controls, operate continuously rather than weekly or monthly, rely on multiple systems and external data feeds. In plain terms: the same error creates more consequences before anyone notices. Read more in this blog and my book https://lizhendersondata.wordpress.com/your-unseen/ Best wishes Liz Henderson - Data Queen https://lizhendersondata.wordpress.com/your-unseen/

    lizhendersondata.wordpress.com

  8. Data quality is a pipeline problem, not a form fix. Learn how developers can enforce quality through profiling, matching, and workflow automation at scale. hackernoon.com/building-data-q #dataquality

  9. Data quality is a pipeline problem, not a form fix. Learn how developers can enforce quality through profiling, matching, and workflow automation at scale. hackernoon.com/building-data-q #dataquality

  10. 🤖 Can #AI predict the quality of a survey question?

    Our new study shows that a fine-tuned multilingual transformer can match the performance of SQP's traditional prediction approach—directly from question text.

    📄 Yang, Schonlau, Repke, Felderer, & Sucholutsky (2026)
    doi.org/10.1093/jrsssa/qnag058

    *SQP = The Survey Quality Predictor is a web-based tool that predicts the measurement quality of survey questions (sqp.gesis.org).

    @GESIS
    #SQP #DataQuality #SurveyMethodology

  11. 🤖 Can #AI predict the quality of a survey question?

    Our new study shows that a fine-tuned multilingual transformer can match the performance of SQP's traditional prediction approach—directly from question text.

    📄 Yang, Schonlau, Repke, Felderer, & Sucholutsky (2026)
    doi.org/10.1093/jrsssa/qnag058

    *SQP = The Survey Quality Predictor is a web-based tool that predicts the measurement quality of survey questions (sqp.gesis.org).

    @GESIS
    #SQP #DataQuality #SurveyMethodology

  12. SkaldMaps now shows you a very neat little confidence indicator on how accurate the attribute you're viewing for a particular geography is: Find our how we do it on the brand new 🎉blog 🎉: skaldmaps.com/blog/2026/06/bui

    #RealEstate #GIS #DataEngineering #DataScience #Blog #DataQuality

  13. SkaldMaps now shows you a very neat little confidence indicator on how accurate the attribute you're viewing for a particular geography is: Find our how we do it on the brand new 🎉blog 🎉: skaldmaps.com/blog/2026/06/bui

    #RealEstate #GIS #DataEngineering #DataScience #Blog #DataQuality

  14. AI needs clean data 🤖

    The quality of AI outputs depends on the quality of the data behind them.

    📊 Better data
    → Better outcomes

    ⚠️ Poor data
    → Increased risk

    MSPs implementing AI must continuously govern and maintain data quality.

    𝐇𝐀𝐋𝐄𝐗𝐎 𝐏𝐎𝐕:

    Data readiness is critical for AI success.

    ▸ Swipe

    #DataQuality #AI #MSP #DataOps

  15. AI needs clean data 🤖

    The quality of AI outputs depends on the quality of the data behind them.

    📊 Better data
    → Better outcomes

    ⚠️ Poor data
    → Increased risk

    MSPs implementing AI must continuously govern and maintain data quality.

    𝐇𝐀𝐋𝐄𝐗𝐎 𝐏𝐎𝐕:

    Data readiness is critical for AI success.

    ▸ Swipe

    #DataQuality #AI #MSP #DataOps

  16. LintedData 3.0.0 is released:
    ➡️ gitlab.com/dlr-dw/linteddata/-

    LintedData is a linter for RDF and Ontologies for easy use in CI pipelines. It checks for common violations of best practices in ontology engineering.

    Version 3.0.0 enables checking of multiple RDF files with one execution, reduces the need for configuration, and improves or fixes many checks.

    #RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering

  17. LintedData 3.0.0 is released:
    ➡️ gitlab.com/dlr-dw/linteddata/-

    LintedData is a linter for RDF and Ontologies for easy use in CI pipelines. It checks for common violations of best practices in ontology engineering.

    Version 3.0.0 enables checking of multiple RDF files with one execution, reduces the need for configuration, and improves or fixes many checks.

    #RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering

  18. But is it useful?: The Cloud-Native Geospatial Forum proposes #usefulness as a better measure than openness for #dataQuality, with a 5-dimension, 4-star framework that goes beyond familiar schemes like #FAIR and 5-star Open Data. It’s an interesting read and a good...
    spatialists.ch/posts/2026/05/3 #GIS #GISchat #geospatial #SwissGIS

  19. But is it useful?: The Cloud-Native Geospatial Forum proposes #usefulness as a better measure than openness for #dataQuality, with a 5-dimension, 4-star framework that goes beyond familiar schemes like #FAIR and 5-star Open Data. It’s an interesting read and a good...
    spatialists.ch/posts/2026/05/3 #GIS #GISchat #geospatial #SwissGIS

  20. Collecting, processing and making sense of the public data can be a cumbersome exercise and there are important #dataquality issues to overcome before the results are meaningful.

    For this purpose we are developing functionality within the #opensource #Equinox platform to support this task.

    In this post there is a brief description of the data models involved with some sample data.

    openriskmanagement.com/using-e

    The code is here:

    github.com/open-risk/equinox

  21. Collecting, processing and making sense of the public data can be a cumbersome exercise and there are important #dataquality issues to overcome before the results are meaningful.

    For this purpose we are developing functionality within the #opensource #Equinox platform to support this task.

    In this post there is a brief description of the data models involved with some sample data.

    openriskmanagement.com/using-e

    The code is here:

    github.com/open-risk/equinox

  22. How complete and reliable is #OpenStreetMap data in your region?

    We are introducing country reports on the Humanitarian Data Exchange platform.

    The new reports show #OSM #dataquality across a country & identify where important gaps remain. They provide insights on roads, buildings, currentness, and spatial distribution.

    The reports are powered by the ohsome quality API & are also available as datasets in CSV & GPKG formats.

    More topics are planned.

    👉 heigit.org/introducing-ohsome-

    #humanitarian

  23. How complete and reliable is #OpenStreetMap data in your region?

    We are introducing country reports on the Humanitarian Data Exchange platform.

    The new reports show #OSM #dataquality across a country & identify where important gaps remain. They provide insights on roads, buildings, currentness, and spatial distribution.

    The reports are powered by the ohsome quality API & are also available as datasets in CSV & GPKG formats.

    More topics are planned.

    👉 heigit.org/introducing-ohsome-

    #humanitarian

  24. @RejoinEU I have worked in data quality for many years. Much of what needs to be done is 'grunt work' - technology cannot assess the accuracy of data etc. Plus transactional data happened at a point in time - unless there is an independent record of that transaction, it is extremely difficult to improve the quality of its data #dataquality

  25. @RejoinEU I have worked in data quality for many years. Much of what needs to be done is 'grunt work' - technology cannot assess the accuracy of data etc. Plus transactional data happened at a point in time - unless there is an independent record of that transaction, it is extremely difficult to improve the quality of its data #dataquality

  26. Mistaking Quantity for Quality in Tech and Life - Tech Field Day Podcast
    @TechFieldDay @TechFieldDayPod @SFoskett @GuyCurriersFeed @DaveGraham #TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality

    youtu.be/9CAVQPJTGzM

  27. Now that AI has enabled us to have an unlimited amount of content, generated on demand and instantly, we find ourselves questioning the quality of the output. 🤖 🎙️

    🎙️ This episode of the Tech Field Day Podcast, recorded prior to AI Field Day by delegates Barbara Roos, Guy Currier, Dave Graham, and Stephen Foskett, considers this common trade-off.

    #TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality

    youtu.be/9CAVQPJTGzM

  28. Mistaking Quantity for Quality in Tech and Life - Tech Field Day Podcast
    @TechFieldDay @TechFieldDayPod @SFoskett @GuyCurriersFeed @DaveGraham #TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality

    youtu.be/9CAVQPJTGzM

  29. Now that AI has enabled us to have an unlimited amount of content, generated on demand and instantly, we find ourselves questioning the quality of the output. 🤖 🎙️

    🎙️ This episode of the Tech Field Day Podcast, recorded prior to AI Field Day by delegates Barbara Roos, Guy Currier, Dave Graham, and Stephen Foskett, considers this common trade-off.

    #TFDPodcast #AIFD8 #AI #AgenticAI #AIInfrastructure #AIAgents #AIQuality #DataQuality

    youtu.be/9CAVQPJTGzM

  30. This week we were discussing the main challenges of Machine Learning in the #KDAI2026 lecture. It should be very obvious that "bad data quality leads to bad results" :)
    However, we were also talking about insufficient number of data, non-representative data, irrelevant features, overfitting and various forms of bias.

    @fiz_karlsruhe #AI #machinelearning #unicorn #dataquality #lecture #datascience

  31. This week we were discussing the main challenges of Machine Learning in the #KDAI2026 lecture. It should be very obvious that "bad data quality leads to bad results" :)
    However, we were also talking about insufficient number of data, non-representative data, irrelevant features, overfitting and various forms of bias.

    @fiz_karlsruhe #AI #machinelearning #unicorn #dataquality #lecture #datascience

  32. LintedData is a linter for RDF and Ontologies for easy use in CI pipelines, we recently released. It checks for common violations of best practices in ontology engineering.
    GitLab: gitlab.com/dlr-dw/linteddata/
    Docker: hub.docker.com/r/dlrdw/lintedd

    Today I present LintedData at the Helmholtz Metadata Conference 2026 demo session.
    Abstract & Poster: doi.org/10.5281/zenodo.20024644 or elib.dlr.de/223803/

    #RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering #HMC2026 @helmholtz_hmc

  33. LintedData is a linter for RDF and Ontologies for easy use in CI pipelines, we recently released. It checks for common violations of best practices in ontology engineering.
    GitLab: gitlab.com/dlr-dw/linteddata/
    Docker: hub.docker.com/r/dlrdw/lintedd

    Today I present LintedData at the Helmholtz Metadata Conference 2026 demo session.
    Abstract & Poster: doi.org/10.5281/zenodo.20024644 or elib.dlr.de/223803/

    #RDF #Ontologies #KnowledgeGraphs #DataQuality #OntologyQuality #OntologyEngineering #HMC2026 @helmholtz_hmc

  34. #GESISblog #blog #KODAQS #DataQuality #DBD #DigitalBehavioralData
    New on the GESIS Blog: Part 2 of our blog series on the KODQAS Toolbox: Digital Behavioral Data

    In the first blog post of the KODAQS Toolbox series, we discussed how data quality issues can affect survey data. Similar challenges arise in digital behavioral data (DBD), though they often manifest differently.

  35. #GESISblog #blog #KODAQS #DataQuality #DBD #DigitalBehavioralData
    New on the GESIS Blog: Part 2 of our blog series on the KODQAS Toolbox: Digital Behavioral Data

    In the first blog post of the KODAQS Toolbox series, we discussed how data quality issues can affect survey data. Similar challenges arise in digital behavioral data (DBD), though they often manifest differently.

  36. Why did people stop responding to federal economic surveys?
    brookings.edu/articles/why-did
    Declining response rates challenge the precision and bias of economic indicators like unemployment. Surveys remain vital for capturing nuances, such as job-seeking intent, that administrative data cannot track.
    Strong data stewardship and reduced respondent burden are necessary to sustain the statistical system.
    #surveymethodology #economics #statistics #dataquality #nonresponsebias

  37. Why did people stop responding to federal economic surveys?
    brookings.edu/articles/why-did
    Declining response rates challenge the precision and bias of economic indicators like unemployment. Surveys remain vital for capturing nuances, such as job-seeking intent, that administrative data cannot track.
    Strong data stewardship and reduced respondent burden are necessary to sustain the statistical system.
    #surveymethodology #economics #statistics #dataquality #nonresponsebias

  38. Accurate CRM Data Through Data Enrichment Company Services

    Incomplete data reduces the effectiveness of CRM and analytics. Data cleansing removes errors and standardizes formats. A data enrichment company enhances records with validated business information, helping teams improve segmentation, reporting, and customer engagement strategies.

    Know more: hitechdigital.com/data-cleansi

    #DataCleansingServices #DataEnrichment #CRMDataCleansing #DataQuality #B2BData #DataManagement #DataDriven

  39. DQaaC embeds testing into pipelines using known tools to ensure reliable, scalable data systems. hackernoon.com/automated-data- #dataquality