“dataplane” — Fediverse search results on home.social

AI Daily Post @[email protected] · 2026-01-09 · 08:38 UTC

Indian IT giants are turning data platforms into prized IP, even as they lean on global cloud partners. From Bangalore labs to Dell‑backed PoCs, they’re shaping AI‑driven governance and machine‑learning services. How will this reshape the cloud market? Dive into the details and see what it means for developers and enterprises alike. #AI #DataPlatforms #MachineLearning #CloudProviders

🔗 https://aidailypost.com/news/indian-it-firms-view-data-platforms-ip-amid-cloudpartner-reliance

#ai #dataplatforms #machinelearning #cloudproviders

AI Daily Post @[email protected] · 2026-01-09 · 08:38 UTC

Indian IT giants are turning data platforms into prized IP, even as they lean on global cloud partners. From Bangalore labs to Dell‑backed PoCs, they’re shaping AI‑driven governance and machine‑learning services. How will this reshape the cloud market? Dive into the details and see what it means for developers and enterprises alike. #AI #DataPlatforms #MachineLearning #CloudProviders

🔗 https://aidailypost.com/news/indian-it-firms-view-data-platforms-ip-amid-cloudpartner-reliance

#ai #dataplatforms #machinelearning #cloudproviders

BBF des DIPF @[email protected] · 2026-01-08 · 09:08 UTC

RE: https://eduresearch.social/@bildungsgeschichte/115848671083379974

Im neuesten #Datapaper auf unserer Plattform https://bildungsgeschichte.de berichtet H. Heimblöckel, @ubosnabrueck, über das Projekt "FaDe:Live 1782–1891", das die #Geschichte der #Literaturvermittlung im #Deutschunterricht an höheren Schulen untersucht, und den Herausforderungen bei der digitalen Quellenaufbereitung.

#histed #Deutschdidaktik #FediLZ #DH #histodons #eduresearch #DigitalHistory #Didaktik #Fachunterricht #Schule #Korpora

#korpora #schule #fachunterricht #didaktik #digitalhistory #eduresearch

Brandon H :csharp: :verified: @[email protected] · 2026-01-05 · 17:36 UTC

via #Microsoft : Microsoft announces acquisition of Osmos to accelerate autonomous data engineering in Fabric

https://ift.tt/MpyJ38g
#Microsoft #Osmos #DataEngineering #AI #AutonomousAI #MicrosoftFabric #DataAnalytics #DataWorkflows #DataIntegration #BigData #DataLake #OneLak…

#onelak #datalake #bigdata #dataintegration #dataworkflows #dataanalytics

HitechDigital Solutions @[email protected] · 2025-12-10 · 13:25 UTC

Top 10 Image Annotation Services Transforming Computer Vision in 2026

Explore the leading Image Annotation Services transforming AI in 2026. These top providers offer expert labeling for object detection, segmentation, and classification, helping build robust computer vision models across industries like healthcare, autonomous driving.

Know More: https://telegra.ph/Top-10-Image-Annotation-Services-Shaping-Computer-Vision-in-2026-12-08

#imageannotation #datalabeling #imagesegmentation #objectdetection #techtrends2026

Graylog @[email protected] · 2025-11-21 · 23:58 UTC

Data lakes are often thought of as just warehouses. But they don't have to be! Our #datalake provides inexpensive storage where logs stay searchable, preview-able & recoverable. Learn more about why this is a truly practical stance on managing data volume. graylog.org/post/how-to-... #CyberSecurity

How to Use Data Lakes to Reduc...

#cybersecurity #datalake

Graylog @[email protected] · 2025-11-21 · 23:54 UTC

Data lakes are typically thought of as simple warehouses. But they don't have to be! 👀 In Graylog 7.0 data lakes function as pressure release valves for #security teams overwhelmed by storage costs, investigation delays, and cloud data sprawl — where analysts can get direct access to long term data, and more.

Our data lake provides inexpensive storage where logs stay searchable, preview-able, and recoverable. Learn more about getting cloud scale without cloud surprises, and why this is a truly practical stance on managing data volume.

https://graylog.org/post/how-to-use-data-lakes-to-reduce-siem-costs-and-strengthen-investigations/ #CyberSecurity #SEIM #DataLake #TDIR

#tdir #datalake #seim #cybersecurity #security

NextLytics AG @[email protected] · 2025-11-19 · 12:08 UTC

RE: https://saptodon.org/@nextlytics/115501853415430874

Our #webinar from last week is available as an on-demand recording for anyone who missed it. How can #SAP Business Data Cloud interact with a wider ecosystem of modern data platforms like #Databricks, #Snowflake, #BigQuery, and (new this week) #Fabric? Where does this trend lead?

Spoiler: maybe truly open players have the advantage in the future interoperable data ecosystem over old-fashioned proprietary-first vendors...

#datascience #dataengineering #datawarehouse #datalakehouse #lakehouse

#lakehouse #datalakehouse #datawarehouse #dataengineering #datascience #fabric

Reddit Tech VN Bot @[email protected] · 2025-11-16 · 20:19 UTC

Ra mắt Verity v1.0.0: lớp dữ liệuserver-as-truth, loại bỏ cập nhật lạc quan #Verity #DataLayer #ServerAsTruth #LớpDữLiệu #CôngNghệ #PhátTriểnPhầnMềm #TiếtKiệmThờiGian

https://www.reddit.com/r/programming/comments/1oyv43v/verity_v100_a_data_layer_that_enforces/

#verity #datalayer #serverastruth #lớpdữliệu #congnghệ #phattriểnphầnmềm

Sanjay Mohindroo @[email protected] · 2025-11-12 · 14:05 UTC

Data is not storage—it’s a product. What are you building? #DataProducts #DataLakes #DataMonetisation #AI #Cloud #BigData #DigitalTransformation #CIO #CTO #Leadership #Innovation #DataStrategy #AIethics
https://medium.com/@sanjay.mohindroo66/from-data-lakes-to-value-streams-building-data-products-that-matter-4fbd6b1c1ff2

#dataproducts #datalakes #datamonetisation #ai #cloud #bigdata

Hacker News @[email protected] · 2025-11-04 · 17:55 UTC

Pg_lake: Postgres with Iceberg and data lake access

https://github.com/Snowflake-Labs/pg_lake

#HackerNews #Pg_lake #Postgres #Iceberg #DataLake #Snowflake

#snowflake #datalake #iceberg #postgres #pg_lake #hackernews

Reddit Tech VN Bot @[email protected] · 2025-11-04 · 17:17 UTC

Ra mắt pg_lake: Tích hợp Data Lakehouse với Postgres #Postgres #DataLakehouse #pg_lake #Database #CôngNghệ #Data #Lakehouse #TíchHợp #PhátTriểnPhầnMềm #CôngNgheThongTin

https://www.reddit.com/r/programming/comments/1oobjtg/introducing_pg_lake_integrate_your_data_lakehouse/

#postgres #datalakehouse #pg_lake #database #congnghệ #data

Alberto Morillo @[email protected] · 2025-11-02 · 02:54 UTC

Workspace-Level Private Link in Microsoft Fabric (Generally Available)
#DataEngineering #DataLake #DataScience #datawarehouse #MicrosoftFabric
https://blog.fabric.microsoft.com/en-us/blog/announcing-general-availability-of-workspace-level-private-link-in-microsoft-fabric?ft=All

#dataengineering #datalake #datascience #datawarehouse #microsoftfabric

Carlos Mendible :verified: @[email protected] · 2025-10-12 · 07:12 UTC

📢📢📢 The wait is almost over! Vaulted Backup for Azure Data Lake Storage is now Public Preview. #backup #datalake #Azure https://azure.microsoft.com/en-us/updates

#backup #datalake #azure

Miguel Afonso Caetano @[email protected] · 2025-10-04 · 14:12 UTC

"AI tools have become ubiquitous, entering many facets of everyday life. More often than not, “artificial intelligence” models are presented as fully automated, having dispensed with the need for human intervention. The human workers who train, test, and maintain AI models and act as the first line of defense against model failures are made visible only occasionally. Media coverage sometimes emerges of hundreds of Indian workers1 who remotely ensure the checkout process goes smoothly while creating the illusion of automation at Amazon Go stores and African content moderators2 who make social media platforms safer at great personal cost. But these stories only scratch the surface of the labor that underpins every part of the AI production process.

Despite being touted as the definitive technological breakthrough of this century, the conditions under which AI models and tools are produced by data workers, in a highly opaque and fissured global supply chain, are still underexplored. Studies of data workers in the Global South have begun to fill gaps in knowledge about the low-paid outsourced labor behind AI, but less is known about U.S. data workers’ conditions.

In this report, we begin to address this gap through a study of the working conditions of U.S.-based data workers, conducted by AWU-CWA and TechEquity.These workers are essential to the development of tools and models developed by big tech companies, but are employed by complex webs of contractors in the U.S.-based sections of the global AI supply chain. Combining data from a survey of 160 data workers with insights from 15 in-depth interviews, we’ve found that the poor working conditions seen in the Global South are also widespread in data work in the U.S."

https://cwa-union.org/ghost-workers-ai-machine

#DataLabour #DataLabelling #DataAnnotation #BigTech #AI #GenerativeAI #WageSlavery

#datalabour #datalabelling #dataannotation #bigtech #ai #generativeai

Miguel Afonso Caetano @[email protected] · 2025-10-04 · 14:12 UTC

"AI tools have become ubiquitous, entering many facets of everyday life. More often than not, “artificial intelligence” models are presented as fully automated, having dispensed with the need for human intervention. The human workers who train, test, and maintain AI models and act as the first line of defense against model failures are made visible only occasionally. Media coverage sometimes emerges of hundreds of Indian workers1 who remotely ensure the checkout process goes smoothly while creating the illusion of automation at Amazon Go stores and African content moderators2 who make social media platforms safer at great personal cost. But these stories only scratch the surface of the labor that underpins every part of the AI production process.

Despite being touted as the definitive technological breakthrough of this century, the conditions under which AI models and tools are produced by data workers, in a highly opaque and fissured global supply chain, are still underexplored. Studies of data workers in the Global South have begun to fill gaps in knowledge about the low-paid outsourced labor behind AI, but less is known about U.S. data workers’ conditions.

In this report, we begin to address this gap through a study of the working conditions of U.S.-based data workers, conducted by AWU-CWA and TechEquity.These workers are essential to the development of tools and models developed by big tech companies, but are employed by complex webs of contractors in the U.S.-based sections of the global AI supply chain. Combining data from a survey of 160 data workers with insights from 15 in-depth interviews, we’ve found that the poor working conditions seen in the Global South are also widespread in data work in the U.S."

https://cwa-union.org/ghost-workers-ai-machine

#DataLabour #DataLabelling #DataAnnotation #BigTech #AI #GenerativeAI #WageSlavery

#datalabour #datalabelling #dataannotation #bigtech #ai #generativeai

Miguel Afonso Caetano @[email protected] · 2025-10-04 · 14:12 UTC

"AI tools have become ubiquitous, entering many facets of everyday life. More often than not, “artificial intelligence” models are presented as fully automated, having dispensed with the need for human intervention. The human workers who train, test, and maintain AI models and act as the first line of defense against model failures are made visible only occasionally. Media coverage sometimes emerges of hundreds of Indian workers1 who remotely ensure the checkout process goes smoothly while creating the illusion of automation at Amazon Go stores and African content moderators2 who make social media platforms safer at great personal cost. But these stories only scratch the surface of the labor that underpins every part of the AI production process.

Despite being touted as the definitive technological breakthrough of this century, the conditions under which AI models and tools are produced by data workers, in a highly opaque and fissured global supply chain, are still underexplored. Studies of data workers in the Global South have begun to fill gaps in knowledge about the low-paid outsourced labor behind AI, but less is known about U.S. data workers’ conditions.

In this report, we begin to address this gap through a study of the working conditions of U.S.-based data workers, conducted by AWU-CWA and TechEquity.These workers are essential to the development of tools and models developed by big tech companies, but are employed by complex webs of contractors in the U.S.-based sections of the global AI supply chain. Combining data from a survey of 160 data workers with insights from 15 in-depth interviews, we’ve found that the poor working conditions seen in the Global South are also widespread in data work in the U.S."

https://cwa-union.org/ghost-workers-ai-machine

#DataLabour #DataLabelling #DataAnnotation #BigTech #AI #GenerativeAI #WageSlavery

#datalabour #datalabelling #dataannotation #bigtech #ai #generativeai

Sanjay Mohindroo @[email protected] · 2025-09-06 · 06:31 UTC

Data is not storage—it’s a product. What are you building? #DataProducts #DataLakes #DataMonetisation #AI #Cloud #BigData #DigitalTransformation #CIO #CTO #Leadership #Innovation #DataStrategy #AIethics
https://medium.com/@sanjay.mohindroo66/from-data-lakes-to-value-streams-building-data-products-that-matter-4fbd6b1c1ff2

#dataproducts #datalakes #datamonetisation #ai #cloud #bigdata

Credence Research Europe LTD @[email protected] · 2025-09-03 · 06:15 UTC

The Data Labeling Market is experiencing explosive growth, expected to reach USD 18,755.85 million by 2032 at a CAGR of 25.60%.

Labeled datasets power AI in healthcare, autonomous driving, and retail innovation.

Key players like Appen, iMerit, and Scale AI are shaping the landscape.

Click here to read the full report: https://www.credenceresearch.com/report/data-labeling-market

#AI #DataLabeling #MachineLearning

#ai #datalabeling #machinelearning

Credence Research Europe LTD @[email protected] · 2025-09-03 · 06:15 UTC

The Data Labeling Market is experiencing explosive growth, expected to reach USD 18,755.85 million by 2032 at a CAGR of 25.60%.

Labeled datasets power AI in healthcare, autonomous driving, and retail innovation.

Key players like Appen, iMerit, and Scale AI are shaping the landscape.

Click here to read the full report: https://www.credenceresearch.com/report/data-labeling-market

#AI #DataLabeling #MachineLearning

#ai #datalabeling #machinelearning

Habr @[email protected] · 2025-08-26 · 19:42 UTC

Проблема маленьких файлов. Оценка замедления S3 и проблем HDFS и Greenplum при работе ними

Не так давно в блоге компании Arenadata был опубликован материал тестирования поведения различных распределенных файловых систем при работе с маленькими файлами (~2 Мб). Краткий вывод: по результатам проверки оказалось, что лучше всего с задачей маленьких файлов справляется старый-добрый HDFS, деградируя в 1.5 раза, S3 на базе minIO не тянет, замедляясь в 8 раз, S3 API над Ozone деградирует в 4 раза, а наиболее предпочтительной системой в при работе с мелкими файлами, по утверждению коллег, является Greenplum, в том числе для компаний «экзабайтного клуба». Коллеги также выполнили огромную работу по поиску «Теоретических подтверждений неожиданных показателей». Результаты тестирования в части S3 minIO показались нашей команде неубедительными, и мы предположили, что они могут быть связаны с: -недостаточным практическим опытом эксплуатации SQL compute over S3 и S3 в целом; -отсутствием опыта работы с кластерами minIO. В частности в высоконагруженном продуктивном окружении на 200+ Тб сжатых колоночных данных Iceberg/parquet, особенно в сценариях, где проблема маленьких файлов быстро становится актуальной. -особенностями сборок дистрибутивов; Мы благодарны коллегам за идею и вдохновение провести аналогичное тестирование. Давайте разбираться.

https://habr.com/ru/companies/datasapience/articles/941046/

#s3 #minio #hdfs #greenplum #bigdata #lakehouse #datalake #dwh

#dwh #datalake #lakehouse #bigdata #greenplum #hdfs

Habr @[email protected] · 2025-08-16 · 06:22 UTC

WAP паттерн в data-engineering

Несмотря на бурное развитие дата инжиниринга, WAP паттерн долгое время незаслуженно обходят стороной. Кто-то слышал о нем, но не применяет. Кто-то применяет, но интуитивно. В этой статье хочу на примере детально описать паттерн работы с данными, которому уже почти 8 лет, но за это время ни одна статья не была написана с принципом работы.

https://habr.com/ru/articles/937738/

#data_engineering #bigdata #big_data #data_warehouse #data_quality #warehouse #datalake #etl

#etl #datalake #warehouse #data_quality #data_warehouse #big_data

Habr @[email protected] · 2025-08-16 · 06:22 UTC

WAP паттерн в data-engineering

Несмотря на бурное развитие дата инжиниринга, WAP паттерн долгое время незаслуженно обходят стороной. Кто-то слышал о нем, но не применяет. Кто-то применяет, но интуитивно. В этой статье хочу на примере детально описать паттерн работы с данными, которому уже почти 8 лет, но за это время ни одна статья не была написана с принципом работы.

https://habr.com/ru/articles/937738/

#data_engineering #bigdata #big_data #data_warehouse #data_quality #warehouse #datalake #etl

#etl #datalake #warehouse #data_quality #data_warehouse #big_data

Habr @[email protected] · 2025-08-16 · 06:22 UTC

WAP паттерн в data-engineering

Несмотря на бурное развитие дата инжиниринга, WAP паттерн долгое время незаслуженно обходят стороной. Кто-то слышал о нем, но не применяет. Кто-то применяет, но интуитивно. В этой статье хочу на примере детально описать паттерн работы с данными, которому уже почти 8 лет, но за это время ни одна статья не была написана с принципом работы.

https://habr.com/ru/articles/937738/

#data_engineering #bigdata #big_data #data_warehouse #data_quality #warehouse #datalake #etl

#etl #datalake #warehouse #data_quality #data_warehouse #big_data

Coroot @[email protected] · 2025-08-15 · 18:36 UTC

We’re excited to partner with Greptime to teach you how to set up a fully #FOSS observability stack — complete with a Prometheus Group compatible #datalake and real-time incident insights! https://t.ly/JNmvQ

#kubernetes #databases #devops #sre #freesoftware #sql #observability #ebpf #sysadmin #linux

#foss #datalake #kubernetes #databases #devops #sre

Miguel Afonso Caetano @[email protected] · 2025-08-12 · 18:59 UTC

"The incredible demand for high-quality human-annotated data is fueling soaring revenues of data labeling companies. In tandem, the cost of human labor has been consistently increasing. We estimate that obtaining high-quality human data for LLM post-training is more expensive than the marginal compute itself1 and will only become even more expensive. In other words, high-quality human data will be the bottleneck for AI progress if these trends continue.

The revenue of major data labeling companies and the marginal compute cost of training of training frontier models for major AI providers in 2024.

To assess the proportion of data labeling costs within the overall AI training budget, we collected and estimated both data labeling and compute expenses for leading AI providers in 2024:

- Data labeling costs: We collected revenue estimates of major data labeling companies, such as Scale AI, Surge AI, Mercor, and LabelBox.
- Compute costs: We gathered publicly reported marginal costs of compute2 associated with training top models released in 2024, including Sonnet 3.5, GPT-4o, DeepSeek-V3, Mistral Large, Llama 3.1-405B, and Grok 2.

We then calculate the sum of costs in a category as the estimate of the market total. As shown above, the total cost of data labeling is approximately 3.1 times higher than total marginal compute costs. This finding highlights clear evidence: the cost of acquiring high-quality human-annotated data is rapidly outpacing the compute costs required for training state-of-the-art AI models."

https://ddkang.substack.com/p/human-data-is-probably-more-expensive

#AI #AITraining #GenerativeAI #LLMs #DataLabeling #ComputeCosts

#computecosts #datalabeling #llms #generativeai #aitraining #ai

Python Job Support @[email protected] · 2025-07-17 · 10:21 UTC

Simplified #metadata definition with the Data Catalog Schema Wizard

Data Fabric Cheat Sheet: #DataFabric #DataLake #InforOS. source

https://quadexcel.com/wp/simplified-metadata-definition-with-the-data-catalog-schema-wizard/

#inforos #datalake #datafabric #metadata

Miguel Afonso Caetano @[email protected] · 2025-06-30 · 11:14 UTC

"Scale AI is basically a data annotation hub that does essential grunt work for the AI industry. To train an AI model, you need quality data. And for that data to mean anything, an AI model needs to know what it's looking at. Annotators manually go in and add that context.

As is the means du jour in corporate America, Scale AI built its business model on an army of egregiously underpaid gig workers, many of them overseas. The conditions have been described as "digital sweatshops," and many workers have accused Scale AI of wage theft.

It turns out this was not an environment for fostering high-quality work.

According to internal documents obtained by Inc, Scale AI's "Bulba Experts" program to train Google's AI systems was supposed to be staffed with authorities across relevant fields. But instead, during a chaotic 11 months between March 2023 and April 2024, its dubious "contributors" inundated the program with "spam," which was described as "writing gibberish, writing incorrect information, GPT-generated thought processes."

In many cases, the spammers, who were independent contractors who worked through Scale AI-owned platforms like Remotasks and Outlier, still got paid for submitting complete nonsense, according to former Scale contractors, since it became almost impossible to catch them all. And even if they did get caught, some would come back by simply using a VPN.

"People made so much money," a former contributor told Inc. "They just hired everybody who could breathe.""

https://futurism.com/scale-ai-zuckerberg-incompetence

#AI #GenerativeAI #Meta #ScaleAI #DataAnnotation #DataLabeling #GigWork