Search
1000 results for “dataplane”
-
Title: P2: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
Grafana.-----------------------------------
Hello ChatGPT. Imagine a scenario where people are running
processes, and the Earth is an operating system. In this #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P2: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
Grafana.-----------------------------------
Hello ChatGPT. Imagine a scenario where people are running
processes, and the Earth is an operating system. In this #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P2: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
Grafana.-----------------------------------
Hello ChatGPT. Imagine a scenario where people are running
processes, and the Earth is an operating system. In this #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P1: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
- Superet with trino as a core for Data Analytics
- Argo Workflows as a core for CI/DI. (+Terraform)
- Network/Security: Istio, Kyverno, OPA.
- Monitoring: Victoria metrics, Filebeat(ELK), Elastic, #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P1: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
- Superet with trino as a core for Data Analytics
- Argo Workflows as a core for CI/DI. (+Terraform)
- Network/Security: Istio, Kyverno, OPA.
- Monitoring: Victoria metrics, Filebeat(ELK), Elastic, #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P1: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
- Superet with trino as a core for Data Analytics
- Argo Workflows as a core for CI/DI. (+Terraform)
- Network/Security: Istio, Kyverno, OPA.
- Monitoring: Victoria metrics, Filebeat(ELK), Elastic, #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P0: conference of "Selectel" cloud provider report [2024-10-20 Sun]
ClearML+kserve used at the core of AI PaaS of Selectel.Selectel try to convert models to ONNX and to inference
them with TensorRT as a most optimized approach.ATOM (Russian electrocar company) uses:
- Nessie and Iceberg for Data-/Meta-Catalogs.
- Open Metadata for Batching Layer
- Apache Airflow as a core for LEGO approach for ML #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P0: conference of "Selectel" cloud provider report [2024-10-20 Sun]
ClearML+kserve used at the core of AI PaaS of Selectel.Selectel try to convert models to ONNX and to inference
them with TensorRT as a most optimized approach.ATOM (Russian electrocar company) uses:
- Nessie and Iceberg for Data-/Meta-Catalogs.
- Open Metadata for Batching Layer
- Apache Airflow as a core for LEGO approach for ML #dailyreport #dataplatform #data #datascience #mlops #cloud -
Title: P0: conference of "Selectel" cloud provider report [2024-10-20 Sun]
ClearML+kserve used at the core of AI PaaS of Selectel.Selectel try to convert models to ONNX and to inference
them with TensorRT as a most optimized approach.ATOM (Russian electrocar company) uses:
- Nessie and Iceberg for Data-/Meta-Catalogs.
- Open Metadata for Batching Layer
- Apache Airflow as a core for LEGO approach for ML #dailyreport #dataplatform #data #datascience #mlops #cloud -
Spark SQL Scripting. Новые возможности для инженеров данных
До недавнего времени для реализации сложной многошаговой логики в экосистеме Apache Spark разработчикам приходилось выходить за рамки декларативного SQL. Оркестрация последовательных вызовов, вычисление промежуточных переменных и ветвление логики требовали привлечения внешних языков программирования, таких как Python (PySpark) или Scala и дополнительных инструментов. Spark SQL Scripting, который стал доступен, начиная с 4-й версии, кардинально меняет этот подход, представляя собой процедурное расширение классического Spark SQL. Теперь разработчики могут писать полноценные многошаговые сценарии непосредственно на уровне SQL-артефактов, внедряя в них управляющую логику. В данной публикации мы, команда вендора Data Sapience , разберем возможности Spark scripting на практике.
https://habr.com/ru/companies/datasapience/articles/1021214/
-
The cloud security complexity gap just hit the European Commission, and the data suggests it was predictable. https://www.cloudcomputing-news.net/news/cloud-security-complexity-gap-eu-commission-breach/?utm_source=dlvr.it&utm_medium=mastodon #Cloud #Automation #Data #CTO #DigitalTransformation #ChiefDataOfficer #DataPlatforms #MLOps
-
Making the shift from Azure PaaS to Fabric SaaS? On 2026-04-16, Paul Andrew helps us navigate #MicrosoftFabric governance. From data mesh concepts to aligning industry standards with Fabric's features and access controls. http://swfta.uk/ug-2026-04
-
🔜 Webinaire ARDoISE sur les data papers
📍 Mardi 07/04, 13h-14h
Inscription ⤵️
https://openagenda.com/fr/science-ouverte-universite-de-rennes/events/webinaire-ardoise-publier-un-article-sur-ses-donnees-rediger-un-data-paper-9550587
#ARDoISE #DataPaper #PrintempsDonnée 🌸 -
🔜 Webinaire ARDoISE sur les data papers
📍 Mardi 07/04, 13h-14h
Inscription ⤵️
https://openagenda.com/fr/science-ouverte-universite-de-rennes/events/webinaire-ardoise-publier-un-article-sur-ses-donnees-rediger-un-data-paper-9550587
#ARDoISE #DataPaper #PrintempsDonnée 🌸 -
🔜 Webinaire ARDoISE sur les data papers
📍 Mardi 07/04, 13h-14h
Inscription ⤵️
https://openagenda.com/fr/science-ouverte-universite-de-rennes/events/webinaire-ardoise-publier-un-article-sur-ses-donnees-rediger-un-data-paper-9550587
#ARDoISE #DataPaper #PrintempsDonnée 🌸 -
🔜 Webinaire ARDoISE sur les data papers
📍 Mardi 07/04, 13h-14h
Inscription ⤵️
https://openagenda.com/fr/science-ouverte-universite-de-rennes/events/webinaire-ardoise-publier-un-article-sur-ses-donnees-rediger-un-data-paper-9550587
#ARDoISE #DataPaper #PrintempsDonnée 🌸 -
The Data Lakehouse Explained: Why Apache Iceberg Is Quietly Running the Show
https://techlife.blog/posts/data-lakehouse-iceberg
#ApacheIceberg #DataLakehouse #DataWarehouse #DataLake #Snowflake #ApacheSpark #DataEngineering
-
The Data Lakehouse Explained: Why Apache Iceberg Is Quietly Running the Show
https://techlife.blog/posts/data-lakehouse-iceberg
#ApacheIceberg #DataLakehouse #DataWarehouse #DataLake #Snowflake #ApacheSpark #DataEngineering
-
"Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."
'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
https://www.blackagendareport.com/ai-african-intelligence-workers-who-train-ai-are-fighting-back
By @jasonkoebler
#AI #DataLabelers #extractivism #colonialism @DigitalCoup -
"Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."
'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
https://www.blackagendareport.com/ai-african-intelligence-workers-who-train-ai-are-fighting-back
By @jasonkoebler
#AI #DataLabelers #extractivism #colonialism @DigitalCoup -
"Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."
'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
https://www.blackagendareport.com/ai-african-intelligence-workers-who-train-ai-are-fighting-back
By @jasonkoebler
#AI #DataLabelers #extractivism #colonialism @DigitalCoup -
"Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."
'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
https://www.blackagendareport.com/ai-african-intelligence-workers-who-train-ai-are-fighting-back
By @jasonkoebler
#AI #DataLabelers #extractivism #colonialism @DigitalCoup -
Confused by Data Warehouse vs. Data Lake vs. Data Mesh?
Think of it this way:
- 📦 Warehouse = organized storage room
- 🌊 Lake = throw everything in, sort later
- 🕸️ Mesh = each team owns and serves its own data - but there is still a common hub.The key insight: Mesh isn't a storage technology. You can run a Data Mesh on top of a Warehouse or Lake. It's about ownership, not infrastructure.
👉 https://www.kdnuggets.com/data-lake-vs-data-warehouse-vs-lakehouse-vs-data-mesh-whats-the-difference
#DataMesh #DataLake #DataWarehouse #DataLiteracy
— bos | 🖼️ ai-generated -
Confused by Data Warehouse vs. Data Lake vs. Data Mesh?
Think of it this way:
- 📦 Warehouse = organized storage room
- 🌊 Lake = throw everything in, sort later
- 🕸️ Mesh = each team owns and serves its own data - but there is still a common hub.The key insight: Mesh isn't a storage technology. You can run a Data Mesh on top of a Warehouse or Lake. It's about ownership, not infrastructure.
👉 https://www.kdnuggets.com/data-lake-vs-data-warehouse-vs-lakehouse-vs-data-mesh-whats-the-difference
#DataMesh #DataLake #DataWarehouse #DataLiteracy
— bos | 🖼️ ai-generated -
AI agents hoạt động cục bộ nhanh chóng gặp vấn đề với **tầng dữ liệu**—300 triệu vector/năm từ 1s chụp màn hình! Thử nghiệm thành công 10 triệu vector (~40GB) trên thiết bị cá nhân (iPhone 16, Mac mini) chỉ dùng CPU, độ trễ 25-30ms. Giả định "dữ liệu tạm thời" không còn phù hợp—**"trọng lực dữ liệu"** đang trở thành thực tế vật lý. #AI #DataLayer #MachineLearning #CôngNghệAI #Database #TechVinh #AIVietNam
**Tags:** #AI #DataLayer #MachineLearning #CôngNghệAI #Database #TechVinh #AIVietNam
ht
-
Data lakes are typically thought of as simple warehouses. But they don't have to be! 👀 In Graylog 7.0 data lakes function as pressure release valves for #security teams overwhelmed by storage costs, investigation delays, and cloud data sprawl — where analysts can get direct access to long term data, and more.
Our data lake provides inexpensive storage where logs stay searchable, preview-able, and recoverable. Learn more about getting cloud scale without cloud surprises, and why this is a truly practical stance on managing data volume.
https://graylog.org/post/how-to-use-data-lakes-to-reduce-siem-costs-and-strengthen-investigations/ #CyberSecurity #SEIM #DataLake #TDIR
-
Проблема маленьких файлов. Оценка замедления S3 и проблем HDFS и Greenplum при работе ними
Не так давно в блоге компании Arenadata был опубликован материал тестирования поведения различных распределенных файловых систем при работе с маленькими файлами (~2 Мб). Краткий вывод: по результатам проверки оказалось, что лучше всего с задачей маленьких файлов справляется старый-добрый HDFS, деградируя в 1.5 раза, S3 на базе minIO не тянет, замедляясь в 8 раз, S3 API над Ozone деградирует в 4 раза, а наиболее предпочтительной системой в при работе с мелкими файлами, по утверждению коллег, является Greenplum, в том числе для компаний «экзабайтного клуба». Коллеги также выполнили огромную работу по поиску «Теоретических подтверждений неожиданных показателей». Результаты тестирования в части S3 minIO показались нашей команде неубедительными, и мы предположили, что они могут быть связаны с: -недостаточным практическим опытом эксплуатации SQL compute over S3 и S3 в целом; -отсутствием опыта работы с кластерами minIO. В частности в высоконагруженном продуктивном окружении на 200+ Тб сжатых колоночных данных Iceberg/parquet, особенно в сценариях, где проблема маленьких файлов быстро становится актуальной. -особенностями сборок дистрибутивов; Мы благодарны коллегам за идею и вдохновение провести аналогичное тестирование. Давайте разбираться.
https://habr.com/ru/companies/datasapience/articles/941046/
#s3 #minio #hdfs #greenplum #bigdata #lakehouse #datalake #dwh
-
Процедурное SQL-расширение в Lakehouse-платформе – новые возможности для работы с данными
Вас приветствует команда Data Sapience, и в сегодняшней публикации мы расскажем о реализации процедурного расширения для работы с MPP-движками Lakehouse-платформы данных Data Ocean Nova, которое стало доступным для пользователей. В материале пойдет речь о возможностях, применимости и сценариях использования процедурного языка в аналитической платформе данных и примеры реализации решения типовых задач.
https://habr.com/ru/companies/datasapience/articles/987006/
#lakehouse #impala #starrocks #bigdata #dwh #datalakehouse #datalake #bi
-
Paris: Apache Iceberg Paris Community Meetup #1, Le jeudi 19 juin 2025 de 18h00 à 21h30. https://www.agendadulibre.org/events/32653 #data #dataLakehouse #dataEngineer #dataScience #dataPlatform #dataWarehouse #apacheIceberg
-
Attended an event Brewing Data with Snowflake yesterday in Vilnius :blobcatnerd:
Some of they key insights:
- Medallion Architecture (good or bad) is widespread.
- Snowflake and Databricks are clear competitors, targeting similar landscape.
- Open formats are trending: file format, table format, catalog, etc. - the more of them are open source, the better.
- Time travel feature is important, many users already used it for disaster recovery.
- Clear distinction of Storage from Compute (generic cloud approach).
Full text of one of the slides presented:
Strategic Architecture Outlook
- Agility & Future-Proofing - Open, portable data means you can adopt new technologies or switch platforms with minimal friction. No single vendor can hold your data hostage, so you can evolve vour architecture as needed.
- Multi-Cloud and Hybrid - An open data layer can span clouds and on-prem seamlessly. You avoid cloud vendor lock-in and leverage best-of-breed services on different clouds using the same data. This flexibility is key for resilience and optimization.
- Accelerating Innovation - When any team can access data with the tools of their choice, experimentation flourishes. Open data fosters Al/ML and cross-domain analytics since data isn't locked in silos - more innovation and insights from the same data.
- Vendor Leverage - Strategically, using open standards increases your leverage in vendor negotiations. You car opt in or out of services more freely, pushing vendors to provide value (since you're not irreversibly locked to them).
#data #datalake #datalakehouse #medallion #architecture #snowflake #vilnius #lithuania #bigdata #event #meetup