home.social

Search

1000 results for “dataplane”

  1. Title: P2: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    Grafana.

    -----------------------------------
    Hello ChatGPT. Imagine a scenario where people are running
    processes, and the Earth is an operating system. In this #dailyreport #dataplatform #data #datascience #mlops #cloud

  2. Title: P2: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    Grafana.

    -----------------------------------
    Hello ChatGPT. Imagine a scenario where people are running
    processes, and the Earth is an operating system. In this #dailyreport #dataplatform #data #datascience #mlops #cloud

  3. Title: P2: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    Grafana.

    -----------------------------------
    Hello ChatGPT. Imagine a scenario where people are running
    processes, and the Earth is an operating system. In this #dailyreport #dataplatform #data #datascience #mlops #cloud

  4. Title: P1: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    - Superet with trino as a core for Data Analytics
    - Argo Workflows as a core for CI/DI. (+Terraform)
    - Network/Security: Istio, Kyverno, OPA.
    - Monitoring: Victoria metrics, Filebeat(ELK), Elastic, #dailyreport #dataplatform #data #datascience #mlops #cloud

  5. Title: P1: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    - Superet with trino as a core for Data Analytics
    - Argo Workflows as a core for CI/DI. (+Terraform)
    - Network/Security: Istio, Kyverno, OPA.
    - Monitoring: Victoria metrics, Filebeat(ELK), Elastic, #dailyreport #dataplatform #data #datascience #mlops #cloud

  6. Title: P1: P1: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    - Superet with trino as a core for Data Analytics
    - Argo Workflows as a core for CI/DI. (+Terraform)
    - Network/Security: Istio, Kyverno, OPA.
    - Monitoring: Victoria metrics, Filebeat(ELK), Elastic, #dailyreport #dataplatform #data #datascience #mlops #cloud

  7. Title: P0: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    ClearML+kserve used at the core of AI PaaS of Selectel.

    Selectel try to convert models to ONNX and to inference
    them with TensorRT as a most optimized approach.

    ATOM (Russian electrocar company) uses:
    - Nessie and Iceberg for Data-/Meta-Catalogs.
    - Open Metadata for Batching Layer
    - Apache Airflow as a core for LEGO approach for ML #dailyreport #dataplatform #data #datascience #mlops #cloud

  8. Title: P0: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    ClearML+kserve used at the core of AI PaaS of Selectel.

    Selectel try to convert models to ONNX and to inference
    them with TensorRT as a most optimized approach.

    ATOM (Russian electrocar company) uses:
    - Nessie and Iceberg for Data-/Meta-Catalogs.
    - Open Metadata for Batching Layer
    - Apache Airflow as a core for LEGO approach for ML #dailyreport #dataplatform #data #datascience #mlops #cloud

  9. Title: P0: conference of "Selectel" cloud provider report [2024-10-20 Sun]
    ClearML+kserve used at the core of AI PaaS of Selectel.

    Selectel try to convert models to ONNX and to inference
    them with TensorRT as a most optimized approach.

    ATOM (Russian electrocar company) uses:
    - Nessie and Iceberg for Data-/Meta-Catalogs.
    - Open Metadata for Batching Layer
    - Apache Airflow as a core for LEGO approach for ML #dailyreport #dataplatform #data #datascience #mlops #cloud

  10. Spark SQL Scripting. Новые возможности для инженеров данных

    До недавнего времени для реализации сложной многошаговой логики в экосистеме Apache Spark разработчикам приходилось выходить за рамки декларативного SQL. Оркестрация последовательных вызовов, вычисление промежуточных переменных и ветвление логики требовали привлечения внешних языков программирования, таких как Python (PySpark) или Scala и дополнительных инструментов. Spark SQL Scripting, который стал доступен, начиная с 4-й версии, кардинально меняет этот подход, представляя собой процедурное расширение классического Spark SQL. Теперь разработчики могут писать полноценные многошаговые сценарии непосредственно на уровне SQL-артефактов, внедряя в них управляющую логику. В данной публикации мы, команда вендора Data Sapience , разберем возможности Spark scripting на практике.

    habr.com/ru/companies/datasapi

    #spark #datalake #datalakehouse #lakehouse #dwh #script

  11. Making the shift from Azure PaaS to Fabric SaaS? On 2026-04-16, Paul Andrew helps us navigate #MicrosoftFabric governance. From data mesh concepts to aligning industry standards with Fabric's features and access controls. swfta.uk/ug-2026-04

    #DataPlatform #Governance

  12. "Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."

    'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
    blackagendareport.com/ai-afric
    By @jasonkoebler
    #AI #DataLabelers #extractivism #colonialism @DigitalCoup

  13. "Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."

    'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
    blackagendareport.com/ai-afric
    By @jasonkoebler
    #AI #DataLabelers #extractivism #colonialism @DigitalCoup

  14. "Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."

    'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
    blackagendareport.com/ai-afric
    By @jasonkoebler
    #AI #DataLabelers #extractivism #colonialism @DigitalCoup

  15. "Artificial intelligence is an extractive technology that relies on the brutal labor of underpaid workers around the world. For years, the work of African data labelers has been more or less “ghost work,” the unseen, hidden labor that lets American tech companies build their products."

    'AI Is African Intelligence': The Workers Who Train AI Are Fighting Back
    blackagendareport.com/ai-afric
    By @jasonkoebler
    #AI #DataLabelers #extractivism #colonialism @DigitalCoup

  16. Confused by Data Warehouse vs. Data Lake vs. Data Mesh?

    Think of it this way:
    - 📦 Warehouse = organized storage room
    - 🌊 Lake = throw everything in, sort later
    - 🕸️ Mesh = each team owns and serves its own data - but there is still a common hub.

    The key insight: Mesh isn't a storage technology. You can run a Data Mesh on top of a Warehouse or Lake. It's about ownership, not infrastructure.

    👉 kdnuggets.com/data-lake-vs-dat

    #DataMesh #DataLake #DataWarehouse #DataLiteracy
    — bos | 🖼️ ai-generated

  17. Confused by Data Warehouse vs. Data Lake vs. Data Mesh?

    Think of it this way:
    - 📦 Warehouse = organized storage room
    - 🌊 Lake = throw everything in, sort later
    - 🕸️ Mesh = each team owns and serves its own data - but there is still a common hub.

    The key insight: Mesh isn't a storage technology. You can run a Data Mesh on top of a Warehouse or Lake. It's about ownership, not infrastructure.

    👉 kdnuggets.com/data-lake-vs-dat

    #DataMesh #DataLake #DataWarehouse #DataLiteracy
    — bos | 🖼️ ai-generated

  18. AI agents hoạt động cục bộ nhanh chóng gặp vấn đề với **tầng dữ liệu**—300 triệu vector/năm từ 1s chụp màn hình! Thử nghiệm thành công 10 triệu vector (~40GB) trên thiết bị cá nhân (iPhone 16, Mac mini) chỉ dùng CPU, độ trễ 25-30ms. Giả định "dữ liệu tạm thời" không còn phù hợp—**"trọng lực dữ liệu"** đang trở thành thực tế vật lý. #AI #DataLayer #MachineLearning #CôngNghệAI #Database #TechVinh #AIVietNam

    **Tags:** #AI #DataLayer #MachineLearning #CôngNghệAI #Database #TechVinh #AIVietNam

    ht

  19. Data lakes are typically thought of as simple warehouses. But they don't have to be! 👀 In Graylog 7.0 data lakes function as pressure release valves for #security teams overwhelmed by storage costs, investigation delays, and cloud data sprawl — where analysts can get direct access to long term data, and more.

    Our data lake provides inexpensive storage where logs stay searchable, preview-able, and recoverable. Learn more about getting cloud scale without cloud surprises, and why this is a truly practical stance on managing data volume.

    graylog.org/post/how-to-use-da #CyberSecurity #SEIM #DataLake #TDIR

  20. Проблема маленьких файлов. Оценка замедления S3 и проблем HDFS и Greenplum при работе ними

    Не так давно в блоге компании Arenadata был опубликован материал тестирования поведения различных распределенных файловых систем при работе с маленькими файлами (~2 Мб). Краткий вывод: по результатам проверки оказалось, что лучше всего с задачей маленьких файлов справляется старый-добрый HDFS, деградируя в 1.5 раза, S3 на базе minIO не тянет, замедляясь в 8 раз, S3 API над Ozone деградирует в 4 раза, а наиболее предпочтительной системой в при работе с мелкими файлами, по утверждению коллег, является Greenplum, в том числе для компаний «экзабайтного клуба». Коллеги также выполнили огромную работу по поиску «Теоретических подтверждений неожиданных показателей». Результаты тестирования в части S3 minIO показались нашей команде неубедительными, и мы предположили, что они могут быть связаны с: -недостаточным практическим опытом эксплуатации SQL compute over S3 и S3 в целом; -отсутствием опыта работы с кластерами minIO. В частности в высоконагруженном продуктивном окружении на 200+ Тб сжатых колоночных данных Iceberg/parquet, особенно в сценариях, где проблема маленьких файлов быстро становится актуальной. -особенностями сборок дистрибутивов; Мы благодарны коллегам за идею и вдохновение провести аналогичное тестирование. Давайте разбираться.

    habr.com/ru/companies/datasapi

    #s3 #minio #hdfs #greenplum #bigdata #lakehouse #datalake #dwh

  21. Процедурное SQL-расширение в Lakehouse-платформе – новые возможности для работы с данными

    Вас приветствует команда Data Sapience, и в сегодняшней публикации мы расскажем о реализации процедурного расширения для работы с MPP-движками Lakehouse-платформы данных Data Ocean Nova, которое стало доступным для пользователей. В материале пойдет речь о возможностях, применимости и сценариях использования процедурного языка в аналитической платформе данных и примеры реализации решения типовых задач.

    habr.com/ru/companies/datasapi

    #lakehouse #impala #starrocks #bigdata #dwh #datalakehouse #datalake #bi

  22. Attended an event Brewing Data with Snowflake yesterday in Vilnius :blobcatnerd:

    Some of they key insights:

    • Medallion Architecture (good or bad) is widespread.
    • Snowflake and Databricks are clear competitors, targeting similar landscape.
    • Open formats are trending: file format, table format, catalog, etc. - the more of them are open source, the better.
    • Time travel feature is important, many users already used it for disaster recovery.
    • Clear distinction of Storage from Compute (generic cloud approach).

    Full text of one of the slides presented:

    Strategic Architecture Outlook

    • Agility & Future-Proofing - Open, portable data means you can adopt new technologies or switch platforms with minimal friction. No single vendor can hold your data hostage, so you can evolve vour architecture as needed.
    • Multi-Cloud and Hybrid - An open data layer can span clouds and on-prem seamlessly. You avoid cloud vendor lock-in and leverage best-of-breed services on different clouds using the same data. This flexibility is key for resilience and optimization.
    • Accelerating Innovation - When any team can access data with the tools of their choice, experimentation flourishes. Open data fosters Al/ML and cross-domain analytics since data isn't locked in silos - more innovation and insights from the same data.
    • Vendor Leverage - Strategically, using open standards increases your leverage in vendor negotiations. You car opt in or out of services more freely, pushing vendors to provide value (since you're not irreversibly locked to them).

    #data #datalake #datalakehouse #medallion #architecture #snowflake #vilnius #lithuania #bigdata #event #meetup