home.social

#datacentricai — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #datacentricai, aggregated by home.social.

  1. Beyond the Dataset

    On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is  in answering the question: How was this place put together? 

    A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

    Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

    Two load-bearing pillars

    While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

    1. How we measure — the trip from reality to raw numbers. Feature extraction.
    2. How we compare — the rules that let those numbers answer a question. Statistics and causality.

    Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

    How we measure

    A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

    In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

    How we compare

    Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

    This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.   

    The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished. 

    Why the pillars get skipped

    Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

    The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

    Opening day

    Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

    #AI #causalInference #cleanData #dataCentricAI #dataProvenance #dataQuality #dataScience #evidenceBasedDecisionMaking #experimentDesign #featureExtraction #foundationEngineering #instrumentation #measurementError #science #startupAnalytics #statisticalAnalysis #statistics

  2. Beyond the Dataset

    On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is  in answering the question: How was this place put together? 

    A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

    Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

    Two load-bearing pillars

    While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

    1. How we measure — the trip from reality to raw numbers. Feature extraction.
    2. How we compare — the rules that let those numbers answer a question. Statistics and causality.

    Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

    How we measure

    A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

    In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

    How we compare

    Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

    This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.   

    The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished. 

    Why the pillars get skipped

    Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

    The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

    Opening day

    Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

    #AI #causalInference #cleanData #dataCentricAI #dataProvenance #dataQuality #dataScience #evidenceBasedDecisionMaking #experimentDesign #featureExtraction #foundationEngineering #instrumentation #measurementError #science #startupAnalytics #statisticalAnalysis #statistics

  3. Beyond the Dataset

    On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is  in answering the question: How was this place put together? 

    A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

    Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

    Two load-bearing pillars

    While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

    1. How we measure — the trip from reality to raw numbers. Feature extraction.
    2. How we compare — the rules that let those numbers answer a question. Statistics and causality.

    Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

    How we measure

    A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

    In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

    How we compare

    Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

    This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.   

    The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished. 

    Why the pillars get skipped

    Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

    The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

    Opening day

    Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

    #AI #causalInference #cleanData #dataCentricAI #dataProvenance #dataQuality #dataScience #evidenceBasedDecisionMaking #experimentDesign #featureExtraction #foundationEngineering #instrumentation #measurementError #science #startupAnalytics #statisticalAnalysis #statistics

  4. Beyond the Dataset

    On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is  in answering the question: How was this place put together? 

    A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

    Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

    Two load-bearing pillars

    While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

    1. How we measure — the trip from reality to raw numbers. Feature extraction.
    2. How we compare — the rules that let those numbers answer a question. Statistics and causality.

    Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

    How we measure

    A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

    In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

    How we compare

    Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

    This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.   

    The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished. 

    Why the pillars get skipped

    Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

    The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

    Opening day

    Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

    #AI #causalInference #cleanData #dataCentricAI #dataProvenance #dataQuality #dataScience #evidenceBasedDecisionMaking #experimentDesign #featureExtraction #foundationEngineering #instrumentation #measurementError #science #startupAnalytics #statisticalAnalysis #statistics

  5. Beyond the Dataset

    On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is  in answering the question: How was this place put together? 

    A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

    Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

    Two load-bearing pillars

    While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

    1. How we measure — the trip from reality to raw numbers. Feature extraction.
    2. How we compare — the rules that let those numbers answer a question. Statistics and causality.

    Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

    How we measure

    A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

    In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

    How we compare

    Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

    This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.   

    The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished. 

    Why the pillars get skipped

    Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

    The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

    Opening day

    Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

    #AI #causalInference #cleanData #dataCentricAI #dataProvenance #dataQuality #dataScience #evidenceBasedDecisionMaking #experimentDesign #featureExtraction #foundationEngineering #instrumentation #measurementError #science #startupAnalytics #statisticalAnalysis #statistics

  6. We fine-tune custom #LLMs for two main reasons:
    - To conserve precious context tokens, and
    - To introduce the #LLM to some new knowledge or skill that wasn't available for its generalist training set.

    Fine-tuning is not a solution for utilizing personal or confidential data! The fine-tuned models will leak this information.

    So let's assume we aren't working with private data.

    In general, because of transfer learning, it would in principle make more sense to incorporate the new knowledge into the base model corpus, because that tends to create better models. But still, even if the generalist model knows your data and the task, if you're going to put that generalist model into a component of your larger system where it will always perform the same task, it makes sense to fine-tune it for this task only rather than to feed the same prompt prefix to it for every inference round.

    Now with data-centric #AI it might even be that the data you want to use doesn't meet the high quality standards large generalist models require. Perhaps in these cases it might make sense to let a chatbot rewrite your specialist corpus into a higher quality form, even if you're not aiming to incorporate your data into generalist corpuses.

    There is a new use case emerging though, #RecursiveSelfImprovement. I believe we can do this in a synergistic generalist fashion as well, but curiously it's now something even smaller organizations can do for specialized tasks by fine-tuning.

    Much like #alignment, it went from niche philosophical topic into standard engineering practices overnight.

    Recursive self-improvement is done by #DataCentricAI principles where a fine-tuned task is trained by examples, but those examples are generated and filtered recursively by the LLM. In principle the model is fine-tuned in rounds, using e.g. #DPO. In a round, the model is first fine-tuned with the existing good data. Then it's asked to generate new variations for those examples. Then its asked to rank pairs of training data examples and the worse ones are filtered out. Then the resulting dataset now has more task examples but of better quality than before. This is again used for fine-tuning and the cycle starts again.

    As this isn't human-imitative, the chatbots can exceed human parity.

    It requires a bit of nuance though. There is not only one task this specialist bot is taught but a set:
    1. Generate variations of tasks (including this task itself).
    2. Rank pairs of task performances (including ranking task).
    3. Perform the task proper.

  7. #SymLink: The article emphasizes the importance of data-centric AI in achieving higher accuracy and extracting valuable insights, emphasizing the need to improve data quality and management while utilizing tools and automation to streamline the process. #DataCentricAI #AI #DataQuality #DataManagement
    highfens.com/2023/06/23/discov

  8. #TheDataExchangePod 🎧 the amazing Jeff Jonas of Senzing explains how #BigData, #AI, & real-time processing redefine #EntityResolution and #MasterDataManagement. Learn valuable insights and leverage lessons in accuracy, scale, and complexity. Expand the scope of your AI applications and boost efficiency like never before!
    #datascience #dataquality #datacentricai #machinelearning #ai
    🔗 thedataexchange.media/using-da

  9. #TheDataExchangePod 🎧 the amazing Jeff Jonas of Senzing explains how #BigData, #AI, & real-time processing redefine #EntityResolution and #MasterDataManagement. Learn valuable insights and leverage lessons in accuracy, scale, and complexity. Expand the scope of your AI applications and boost efficiency like never before!
    #datascience #dataquality #datacentricai #machinelearning #ai
    🔗 thedataexchange.media/using-da

  10. #TheDataExchangePod 🎧 the amazing Jeff Jonas of Senzing explains how #BigData, #AI, & real-time processing redefine #EntityResolution and #MasterDataManagement. Learn valuable insights and leverage lessons in accuracy, scale, and complexity. Expand the scope of your AI applications and boost efficiency like never before!
    #datascience #dataquality #datacentricai #machinelearning #ai
    🔗 thedataexchange.media/using-da

  11. #TheDataExchangePod 🎧 the amazing Jeff Jonas of Senzing explains how #BigData, #AI, & real-time processing redefine #EntityResolution and #MasterDataManagement. Learn valuable insights and leverage lessons in accuracy, scale, and complexity. Expand the scope of your AI applications and boost efficiency like never before!
    #datascience #dataquality #datacentricai #machinelearning #ai
    🔗 thedataexchange.media/using-da

  12. #TheDataExchangePod 🎧 the amazing Jeff Jonas of Senzing explains how #BigData, #AI, & real-time processing redefine #EntityResolution and #MasterDataManagement. Learn valuable insights and leverage lessons in accuracy, scale, and complexity. Expand the scope of your AI applications and boost efficiency like never before!
    #datascience #dataquality #datacentricai #machinelearning #ai
    🔗 thedataexchange.media/using-da

  13. Attended The Future of Data Centric AI 2023 (#datacentricAI) conference organized by
    @snorkelai, here is my trip report, hope you find it useful -- sujitpal.blogspot.com/2023/06/

  14. 🆕 collaboration with Assaf Araki of Intel Capital: We Explore the Evolving Landscape of Hardware for Artificial Intelligence.

    We discuss hardware implications of recent trends in #AI, including: #datacentricAI, increasingly larger models, #GenerativeAI & #FoundationModels, & the emergence of decentralized custom models.
    #AI #MachineLearning #Deeplearning
    gradientflow.com/ai-hardware-2

  15. 🆕 collaboration with Assaf Araki of Intel Capital: We Explore the Evolving Landscape of Hardware for Artificial Intelligence.

    We discuss hardware implications of recent trends in #AI, including: #datacentricAI, increasingly larger models, #GenerativeAI & #FoundationModels, & the emergence of decentralized custom models.
    #AI #MachineLearning #Deeplearning
    gradientflow.com/ai-hardware-2

  16. 🆕 collaboration with Assaf Araki of Intel Capital: We Explore the Evolving Landscape of Hardware for Artificial Intelligence.

    We discuss hardware implications of recent trends in #AI, including: #datacentricAI, increasingly larger models, #GenerativeAI & #FoundationModels, & the emergence of decentralized custom models.
    #AI #MachineLearning #Deeplearning
    gradientflow.com/ai-hardware-2

  17. 🆕 collaboration with Assaf Araki of Intel Capital: We Explore the Evolving Landscape of Hardware for Artificial Intelligence.

    We discuss hardware implications of recent trends in #AI, including: #datacentricAI, increasingly larger models, #GenerativeAI & #FoundationModels, & the emergence of decentralized custom models.
    #AI #MachineLearning #Deeplearning
    gradientflow.com/ai-hardware-2

  18. 🆕 collaboration with Assaf Araki of Intel Capital: We Explore the Evolving Landscape of Hardware for Artificial Intelligence.

    We discuss hardware implications of recent trends in #AI, including: #datacentricAI, increasingly larger models, #GenerativeAI & #FoundationModels, & the emergence of decentralized custom models.
    #AI #MachineLearning #Deeplearning
    gradientflow.com/ai-hardware-2

  19. 🆕 collaboration with Assaf Araki of Intel Capital: We Explore the Evolving Landscape of Hardware for Artificial Intelligence.

    We discuss hardware implications of recent trends in #AI, including: #datacentricAI, increasingly larger models, #GenerativeAI & #FoundationModels, & the emergence of decentralized custom models.
    #AI #MachineLearning #Deeplearning

    gradientflow.com/ai-hardware-2

  20. 🦅 The eagle has landed! Excited for tomorrow's Data Day Texas lineup & events!

    🤯 The caliber of speakers is absolutely phenomenal, if you're in the Austin area definitely consider dropping by & checking it out!

    👏🏻 Congrats to Lynn Bender & team for an incredible line-up!

    To see the latest schedule of sessions, check this link: datadaytexas.com/2023/schedule

    #mlops #dataops #datacentricai #dataengineering #machinelearning #datamodeling #nlp

  21. On #TheDataExchangePod: Watchful CEO, Shayan Mohanty, discusses the problem of data labeling and the modern, interactive solution they developed to put control back in the hands of experts. Learn about the data management system they built with Rust and their plans to open source key components. #datamanagement #opensource #AI #machinelearning #data #datascience #datacentricAI

    thedataexchange.media/building

  22. ❓❓ What is the difference between Model-Centric AI vs Data-Centric AI ❓❓

    By analogy:

    👉🏻 #ModelCentricAI ➡️ The workout matters 🏋🏻‍♀️
    👉🏻 #DataCentricAI: ➡️ The diet matters 🥗

    So the difference between Model-Centric AI and Data-Centric AI is like optimizing on the workout (types of lifts, cardio, reps & intensity, etc) versus optimizing the diet (caloric intake, macros, timing, etc).

    #mlops #dataengineering #productionml #mlsystems