#data-lineage — Public Fediverse posts on home.social

Germany @[email protected] · 2026-05-25 · 17:30 UTC

SAP acquires Dremio to power enterprise AI data platform

SAP agreed to acquire Dremio, an open data lakehouse platform, to expand its Business Data Cloud’s ability to…
#Germany #DE #Europe #EU #Europa #SAP #DataCloud #datalakehouse #datalineage #datasources #Dremio #sap #SAPHANA
https://www.europesays.com/germany/23414/

#saphana #dremio #datasources #datalineage #datalakehouse #datacloud

Data for Breakfast @[email protected] · 2026-05-08 · 19:57 UTC

Bringing Metadata to Life at Automattic with OpenMetadata

The metadata problem at scale

Automattic’s data ecosystem is large and highly interconnected: thousands of Iceberg tables and views queried via Trino, thousands of Airflow tasks producing and updating them via Spark jobs, and a growing catalogue of Looker and Superset dashboards and charts on top.

Most of the information about these assets exists somewhere, scattered across version control (like Git), schema registries (in some metastore like HiveMetastore), and the heads of whoever built the pipeline.

Anyone working with our data eventually hits the same wall:

Which table should I query?
Where did this data come from?
Was it updated recently? When?
Who owns this pipeline?
Has this table grown recently, and was the latest run healthy?

These questions come up repeatedly. Answering even one usually requires knowing the right system to check, or the right person to ask. That doesn’t scale. It has become a bottleneck not just for humans, but also for the AI agents we increasingly want to put in front of our data.

We tried to address this with internal documentation pages describing our tables. They helped, but two gaps remained: people still struggled to find the right asset, and there was no lineage information. The most important question, “where does this data actually come from?”, kept going unanswered.

Why OpenMetadata

After surveying metadata solutions (where metadata is essentially “data about data”) and discovery solutions (DataHub, Marquez, Amundsen, Apache Atlas, Unity, ODD, OpenMetadata, and more), we picked OpenMetadata as the platform to consolidate everything: one user interface, one API, one graph that connects tables, dashboards, owners, profiles, quality checks, and lineage.

The platform is now live internally, and we are actively ingesting metadata from Trino, Airflow, Superset, and Looker.

What it gives us

A single source of truth for discovery

A centralized UI where anyone can find tables, dashboards, charts, and metrics; explore schemas and column‑level descriptions; browse ownership; check data quality results; and navigate relationships without needing to know where to look.

End-to-end lineage

Trace how a table was produced, which Airflow job ran, which upstream tables it depends on, and which dashboards consume it. Before changing a schema, you can assess downstream impact and understand the blast radius.

Data quality in the catalog

A database catalog is a centralized, self‑describing repository storing metadata about database objects such as tables, views, columns, users, and constraints. Quality tests check results live alongside the tables they validate, so users can confirm freshness and correctness without leaving the catalog.

Live profiling

Row counts and last‑updated timestamps are captured after each pipeline run, keeping the catalog in sync with the actual state of the data. This also helps answer questions like “how much has this table grown in the last 15 days?”

Auditing and change tracking

Schema, ownership, and annotation changes are versioned for debugging and compliance. Users can also be notified of schema updates, critical for catching issues that might break dashboards.

Why this matters for AI agents

OpenMetadata is not just for humans.

An AI agent querying Trino without metadata context produces unreliable results: it cannot know which of twelve similarly named tables is the authoritative one, whether a column is still maintained, or what a given metric actually represents. At scale, this becomes more than a discovery problem. It becomes an operational risk.

Increasingly, we want AI agents and automation systems to interact directly with our data platform: generate queries, investigate incidents, validate transformations, understand lineage, and reason about data quality. Without metadata, those systems are effectively blind and more prone to AI agent hallucinations.

The same context humans rely on, ownership, lineage, freshness, semantics, and quality signals, is also what allows agents to operate safely and intelligently on top of a modern data platform.

With a rich metadata graph in place—table descriptions, column semantics, lineage, ownership, and quality results—agents can make the same informed decisions as a data engineer.

A few concrete things a metadata‑aware agent can do that a blind agent cannot:

Resolve ambiguous table references by checking descriptions, ownership, and lineage before writing a query.
Warn before suggesting a transformation that would break a downstream dashboard by walking the lineage graph first.
Surface data quality failures as context (e.g., “this table failed a column check 3 days ago”), and, if the agent has access to the repository and understands the pipeline, even fix the issue.
Identify the right owner to contact when a table needs to be dropped or its schema changed.

We are currently testing an MCP server that exposes OpenMetadata to AI agents, allowing them to search assets, fetch schemas, navigate lineage, and retrieve quality results, with semantic search for more precise answers.

Semantic search in OpenMetadata currently only works with OpenSearch. Since we run Elasticsearch, we are contributing the Elasticsearch implementation upstream so the same capability works on both backends.

How we keep the catalog fresh

Two complementary ingestion paths feed OpenMetadata, keeping the catalog continuously up to date:

Async (DAG‑based)
An Airflow DAG (openmetadata_ingestion) crawls Trino, Airflow, Superset, and Looker on a schedule. Each source runs as a YARN task (isolated) via a reusable OpenMetadataWorkflowOperator, using the official openmetadata‑ingestion library. This keeps the broader catalog up to date without coupling ingestion to individual pipeline runs.
Sync (per‑run)
An OpenMetadataSync module integrated into the Spark task execution lifecycle posts metadata updates after every job: table schema, column descriptions, ownership, lineage, row counts from the Iceberg snapshot, and the results of any data quality checks.

High-value, frequently updated tables stay continuously fresh without waiting for the next scheduled crawl (which still captures deleted and static tables).

Where out‑of‑the‑box connectors fall short, particularly for lineage edges they cannot infer, we close the gap with custom integrations using the OpenMetadata API directly. We also noticed that some Looker and Superset lineage and ownership mappings do not work as expected and require additional development.

Where we are, and what’s next

The OpenMetadata infrastructure and async ingestion DAG were recently deployed in our internal system. Even in its early stages, the catalog is already indexing tens of thousands of data assets across our analytics ecosystem, including Looker, Trino, Superset, and Airflow.

33k+ assets already indexed

The remaining work falls into a few areas:

Sync ingestion of per‑run metadata from Spark jobs.
Ownership and lineage enrichment for Superset and Looker.
Column‑level lineage via the OpenLineage Spark connector.
MCP and agent integration to fully connect the catalog with AI workflows.

We have also started contributing fixes and improvements back to the OpenMetadata open‑source project.

If you build on top of Automattic’s data, whether as a human being or an agent, this is the layer that should make every next answer faster, safer, and easier to find.

#ai #DataLineage #DataScience #Metadata #OpenMetadata

#openmetadata #metadata #datascience #datalineage #ai

Sanjay Mohindroo @[email protected] · 2025-04-03 · 06:04 UTC

Discover how data governance shapes business success. Real stories, best practices, and debate on data quality and security. #DataGovernance #DataQuality #DataSecurity #DataStewardship #DataManagementBestPractices #DataCompliance #DataOwnership #DataLineage #DataCulture #DataDrivenDecisions #DataAudit #DataTrust #DataProtection #DataStewardshipTeam #DataDictionary
https://medium.com/@sanjay.mohindroo66/data-governance-best-practices-ensuring-data-quality-and-security-62cc1aae0f1f

#datagovernance #dataquality #datasecurity #datastewardship #datamanagementbestpractices #datacompliance

Digitale Overheid (geautomatiseerd account) @[email protected] · 2025-03-24 · 09:30 UTC

Data lineage vergroot vertrouwen in overheidsdata

Overheden maken vaak gebruik van data om beleid te maken, dienstverlening te verbeteren en maatschappelijke vraagstukken aan te pakken. Maar hoe weet je of die data betrouwbaar is? Volgens een nieuw rapport van het Wetenschappelijk Onderzoek- en Documentatiecentrum (WODC) kan data lineage daarbij helpen.

Wat is data lineage?

Data lineage betekent letterlijk ‘afstamming van data’. Het gaat om het in kaart brengen van de volledige reis die data aflegt: van het moment dat het wordt verzameld (bijvoorbeeld via een formulier), tot aan de verwerking, bewerking en het uiteindelijke gebruik in bijvoorbeeld dashboards of rapportages. Met data lineage kun je nagaan:

waar de data vandaan komt;
welke bewerkingen of transformaties zijn toegepast;
in welke systemen of rapporten de data uiteindelijk terecht komt.

Waarom is dit belangrijk voor de overheid?

Data lineage helpt om fouten vroegtijdig te signaleren, risico’s in beeld te brengen en het vertrouwen in beleidsinformatie te vergroten, zowel binnen als buiten de organisatie. Het WODC benadrukt dat data lineage niet alleen een technisch hulpmiddel is, maar ook een stap richting professionalisering van datamanagement binnen de overheid.

Lees het nieuwsbericht van het WODC op hun website en bekijk het Engelstalige rapport.

Dit is een automatisch geplaatst bericht. Vragen of opmerkingen kun je richten aan @[email protected]

#BetrouwbareData #DataLineage #nieuwsbrief62025 #WODC

#betrouwbaredata #datalineage #wodc #nieuwsbrief62025

Digitale Overheid (geautomatiseerd account) @[email protected] · 2025-03-24 · 09:30 UTC

Data lineage vergroot vertrouwen in overheidsdata

Overheden maken vaak gebruik van data om beleid te maken, dienstverlening te verbeteren en maatschappelijke vraagstukken aan te pakken. Maar hoe weet je of die data betrouwbaar is? Volgens een nieuw rapport van het Wetenschappelijk Onderzoek- en Documentatiecentrum (WODC) kan data lineage daarbij helpen.

Wat is data lineage?

Data lineage betekent letterlijk ‘afstamming van data’. Het gaat om het in kaart brengen van de volledige reis die data aflegt: van het moment dat het wordt verzameld (bijvoorbeeld via een formulier), tot aan de verwerking, bewerking en het uiteindelijke gebruik in bijvoorbeeld dashboards of rapportages. Met data lineage kun je nagaan:

waar de data vandaan komt;
welke bewerkingen of transformaties zijn toegepast;
in welke systemen of rapporten de data uiteindelijk terecht komt.

Waarom is dit belangrijk voor de overheid?

Data lineage helpt om fouten vroegtijdig te signaleren, risico’s in beeld te brengen en het vertrouwen in beleidsinformatie te vergroten, zowel binnen als buiten de organisatie. Het WODC benadrukt dat data lineage niet alleen een technisch hulpmiddel is, maar ook een stap richting professionalisering van datamanagement binnen de overheid.

Lees het nieuwsbericht van het WODC op hun website en bekijk het Engelstalige rapport.

Dit is een automatisch geplaatst bericht. Vragen of opmerkingen kun je richten aan @[email protected]

#BetrouwbareData #DataLineage #nieuwsbrief62025 #WODC

#nieuwsbrief62025 #betrouwbaredata #datalineage #wodc

Digitale Overheid (geautomatiseerd account) @[email protected] · 2025-03-24 · 09:30 UTC

Data lineage vergroot vertrouwen in overheidsdata

Overheden maken vaak gebruik van data om beleid te maken, dienstverlening te verbeteren en maatschappelijke vraagstukken aan te pakken. Maar hoe weet je of die data betrouwbaar is? Volgens een nieuw rapport van het Wetenschappelijk Onderzoek- en Documentatiecentrum (WODC) kan data lineage daarbij helpen.

Wat is data lineage?

Data lineage betekent letterlijk ‘afstamming van data’. Het gaat om het in kaart brengen van de volledige reis die data aflegt: van het moment dat het wordt verzameld (bijvoorbeeld via een formulier), tot aan de verwerking, bewerking en het uiteindelijke gebruik in bijvoorbeeld dashboards of rapportages. Met data lineage kun je nagaan:

waar de data vandaan komt;
welke bewerkingen of transformaties zijn toegepast;
in welke systemen of rapporten de data uiteindelijk terecht komt.

Waarom is dit belangrijk voor de overheid?

Data lineage helpt om fouten vroegtijdig te signaleren, risico’s in beeld te brengen en het vertrouwen in beleidsinformatie te vergroten, zowel binnen als buiten de organisatie. Het WODC benadrukt dat data lineage niet alleen een technisch hulpmiddel is, maar ook een stap richting professionalisering van datamanagement binnen de overheid.

Lees het nieuwsbericht van het WODC op hun website en bekijk het Engelstalige rapport.

Dit is een automatisch geplaatst bericht. Vragen of opmerkingen kun je richten aan @[email protected]

#BetrouwbareData #DataLineage #nieuwsbrief62025 #WODC

#betrouwbaredata #datalineage #wodc #nieuwsbrief62025

Digitale Overheid (geautomatiseerd account) @[email protected] · 2025-03-24 · 09:30 UTC

Data lineage vergroot vertrouwen in overheidsdata

Overheden maken vaak gebruik van data om beleid te maken, dienstverlening te verbeteren en maatschappelijke vraagstukken aan te pakken. Maar hoe weet je of die data betrouwbaar is? Volgens een nieuw rapport van het Wetenschappelijk Onderzoek- en Documentatiecentrum (WODC) kan data lineage daarbij helpen.

Wat is data lineage?

Data lineage betekent letterlijk ‘afstamming van data’. Het gaat om het in kaart brengen van de volledige reis die data aflegt: van het moment dat het wordt verzameld (bijvoorbeeld via een formulier), tot aan de verwerking, bewerking en het uiteindelijke gebruik in bijvoorbeeld dashboards of rapportages. Met data lineage kun je nagaan:

waar de data vandaan komt;
welke bewerkingen of transformaties zijn toegepast;
in welke systemen of rapporten de data uiteindelijk terecht komt.

Waarom is dit belangrijk voor de overheid?

Data lineage helpt om fouten vroegtijdig te signaleren, risico’s in beeld te brengen en het vertrouwen in beleidsinformatie te vergroten, zowel binnen als buiten de organisatie. Het WODC benadrukt dat data lineage niet alleen een technisch hulpmiddel is, maar ook een stap richting professionalisering van datamanagement binnen de overheid.

Lees het nieuwsbericht van het WODC op hun website en bekijk het Engelstalige rapport.

Dit is een automatisch geplaatst bericht. Vragen of opmerkingen kun je richten aan @[email protected]

#BetrouwbareData #DataLineage #nieuwsbrief62025 #WODC

#nieuwsbrief62025 #wodc #datalineage #betrouwbaredata

Digitale Overheid (geautomatiseerd account) @[email protected] · 2025-03-24 · 09:30 UTC

Data lineage vergroot vertrouwen in overheidsdata

Overheden maken vaak gebruik van data om beleid te maken, dienstverlening te verbeteren en maatschappelijke vraagstukken aan te pakken. Maar hoe weet je of die data betrouwbaar is? Volgens een nieuw rapport van het Wetenschappelijk Onderzoek- en Documentatiecentrum (WODC) kan data lineage daarbij helpen.

Wat is data lineage?

Data lineage betekent letterlijk ‘afstamming van data’. Het gaat om het in kaart brengen van de volledige reis die data aflegt: van het moment dat het wordt verzameld (bijvoorbeeld via een formulier), tot aan de verwerking, bewerking en het uiteindelijke gebruik in bijvoorbeeld dashboards of rapportages. Met data lineage kun je nagaan:

waar de data vandaan komt;
welke bewerkingen of transformaties zijn toegepast;
in welke systemen of rapporten de data uiteindelijk terecht komt.

Waarom is dit belangrijk voor de overheid?

Data lineage helpt om fouten vroegtijdig te signaleren, risico’s in beeld te brengen en het vertrouwen in beleidsinformatie te vergroten, zowel binnen als buiten de organisatie. Het WODC benadrukt dat data lineage niet alleen een technisch hulpmiddel is, maar ook een stap richting professionalisering van datamanagement binnen de overheid.

Lees het nieuwsbericht van het WODC op hun website en bekijk het Engelstalige rapport.

Dit is een automatisch geplaatst bericht. Vragen of opmerkingen kun je richten aan @[email protected]

#BetrouwbareData #DataLineage #nieuwsbrief62025 #WODC

#betrouwbaredata #datalineage #wodc #nieuwsbrief62025

Miguel Afonso Caetano @[email protected] · 2025-01-06 · 11:15 UTC

"AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.

The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.

Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies."

https://www.technologyreview.com/2024/12/18/1108796/this-is-where-the-data-to-build-ai-comes-from/

#AI #GenerativeAI #AITraining #DataLineage

#ai #generativeai #aitraining #datalineage

Miguel Afonso Caetano @[email protected] · 2025-01-06 · 11:15 UTC

"AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.

The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.

Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies."

https://www.technologyreview.com/2024/12/18/1108796/this-is-where-the-data-to-build-ai-comes-from/

#AI #GenerativeAI #AITraining #DataLineage

#ai #generativeai #aitraining #datalineage

Miguel Afonso Caetano @[email protected] · 2025-01-06 · 11:15 UTC

"AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.

The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.

Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies."

https://www.technologyreview.com/2024/12/18/1108796/this-is-where-the-data-to-build-ai-comes-from/

#AI #GenerativeAI #AITraining #DataLineage

#ai #generativeai #aitraining #datalineage

Miguel Afonso Caetano @[email protected] · 2025-01-06 · 11:15 UTC

"AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.

The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.

Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies."

https://www.technologyreview.com/2024/12/18/1108796/this-is-where-the-data-to-build-ai-comes-from/

#AI #GenerativeAI #AITraining #DataLineage

#datalineage #aitraining #generativeai #ai

Miguel Afonso Caetano @[email protected] · 2025-01-06 · 11:15 UTC

"AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.

The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.

Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies."

https://www.technologyreview.com/2024/12/18/1108796/this-is-where-the-data-to-build-ai-comes-from/

#AI #GenerativeAI #AITraining #DataLineage

#ai #generativeai #aitraining #datalineage

Coach Sankhavaram ® @[email protected] · 2024-06-02 · 03:51 UTC

#ModelExplainability, #DataLineage, and editing the #TrainingData set are topics that will be in the news next year…assuming we make it.
https://social.lol/@rom/112543674749743641

#modelexplainability #datalineage #trainingdata

Coach Sankhavaram ® @[email protected] · 2024-06-02 · 03:51 UTC

#ModelExplainability, #DataLineage, and editing the #TrainingData set are topics that will be in the news next year…assuming we make it.
https://social.lol/@rom/112543674749743641

#modelexplainability #datalineage #trainingdata

Coach Sankhavaram ® @[email protected] · 2024-06-02 · 03:51 UTC

#ModelExplainability, #DataLineage, and editing the #TrainingData set are topics that will be in the news next year…assuming we make it.
https://social.lol/@rom/112543674749743641

#modelexplainability #datalineage #trainingdata

Coach Sankhavaram ® @[email protected] · 2024-06-02 · 03:51 UTC

#ModelExplainability, #DataLineage, and editing the #TrainingData set are topics that will be in the news next year…assuming we make it.
https://social.lol/@rom/112543674749743641

#trainingdata #datalineage #modelexplainability

Coach Sankhavaram ® @[email protected] · 2024-06-02 · 03:51 UTC

#ModelExplainability, #DataLineage, and editing the #TrainingData set are topics that will be in the news next year…assuming we make it.
https://social.lol/@rom/112543674749743641

#modelexplainability #datalineage #trainingdata

The Datanista @[email protected] · 2024-04-19 · 01:59 UTC

𝐔𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐒𝐩𝐞𝐜𝐭𝐫𝐮𝐦 𝐨𝐟 𝐃𝐚𝐭𝐚 𝐋𝐢𝐧𝐞𝐚𝐠𝐞 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬

#Datalineage analysis is the backbone of #datagovernance, its the journey of data from origin to consumption. It not only ensures #dataintegrity & #compliance but also aids in decision-making processes & enhances data-driven strategies. Within the realm of data lineage analysis, various methodologies & approaches exist, each tailored to specific needs & objectives: https://www.foxconsulting.co/post/understanding-the-spectrum-of-data-lineage-analysis

#dataflow #dataquality

#dataquality #dataflow #compliance #dataintegrity #datagovernance #datalineage

Mr.Trunk @[email protected] · 2023-09-27 · 07:15 UTC

SecurityAffairs: Top 5 Problems Solved by Data Lineage https://securityaffairs.com/151541/security/top-5-problems-solved-by-data-lineage.html #ITInformationSecurity #PierluigiPaganini #SecurityAffairs #BreakingNews #Datalineage #Security #Hacking

#itinformationsecurity #pierluigipaganini #securityaffairs #breakingnews #datalineage #security

Barrett @[email protected] · 2023-09-05 · 21:59 UTC

TFW you realize the dataset you’re pulling from for your analysis project drops data after 5 years AND THEY DON’T BOTHER TO SAY THAT IN THE DOCUMENTATION. 🤬

#DataGovernance #DataLineage

#datagovernance #datalineage

Barrett @ba66e77 · 2023-09-05 · 21:59 UTC

TFW you realize the dataset you’re pulling from for your analysis project drops data after 5 years AND THEY DON’T BOTHER TO SAY THAT IN THE DOCUMENTATION. 🤬

#DataGovernance #DataLineage

#datagovernance #datalineage

Barrett @[email protected] · 2023-09-05 · 21:59 UTC

TFW you realize the dataset you’re pulling from for your analysis project drops data after 5 years AND THEY DON’T BOTHER TO SAY THAT IN THE DOCUMENTATION. 🤬

#DataGovernance #DataLineage

#datalineage #datagovernance

Barrett @[email protected] · 2023-09-05 · 21:59 UTC

TFW you realize the dataset you’re pulling from for your analysis project drops data after 5 years AND THEY DON’T BOTHER TO SAY THAT IN THE DOCUMENTATION. 🤬

#DataGovernance #DataLineage

#datagovernance #datalineage

Barrett @[email protected] · 2023-08-14 · 14:10 UTC

"[#DataAnalysts]..should know how the data was born, with all details of measurement... Few things have more devastating consequences ... than someone in the audience pointing out...measurement issues the analyst didn't consider." Békés and Kézdi, 2021: Data Analysis for Business, Economics, and Policy

If you're having trouble helping your org understand the value of #datalineage and #metadata, share this with them and ask if they know how all the data they're using was gathered and measured.

#dataanalysts #datalineage #metadata

Barrett @ba66e77 · 2023-08-14 · 14:10 UTC

"[#DataAnalysts]..should know how the data was born, with all details of measurement... Few things have more devastating consequences ... than someone in the audience pointing out...measurement issues the analyst didn't consider." Békés and Kézdi, 2021: Data Analysis for Business, Economics, and Policy

If you're having trouble helping your org understand the value of #datalineage and #metadata, share this with them and ask if they know how all the data they're using was gathered and measured.

#dataanalysts #datalineage #metadata

Barrett @[email protected] · 2023-08-14 · 14:10 UTC

"[#DataAnalysts]..should know how the data was born, with all details of measurement... Few things have more devastating consequences ... than someone in the audience pointing out...measurement issues the analyst didn't consider." Békés and Kézdi, 2021: Data Analysis for Business, Economics, and Policy

If you're having trouble helping your org understand the value of #datalineage and #metadata, share this with them and ask if they know how all the data they're using was gathered and measured.

#metadata #datalineage #dataanalysts

Barrett @[email protected] · 2023-08-14 · 14:10 UTC

"[#DataAnalysts]..should know how the data was born, with all details of measurement... Few things have more devastating consequences ... than someone in the audience pointing out...measurement issues the analyst didn't consider." Békés and Kézdi, 2021: Data Analysis for Business, Economics, and Policy

If you're having trouble helping your org understand the value of #datalineage and #metadata, share this with them and ask if they know how all the data they're using was gathered and measured.

#dataanalysts #datalineage #metadata