home.social

#observability — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #observability, aggregated by home.social.

  1. 🚀 How to Install and Configure Node Exporter on #Debian #VPS This article will provide a guide for how to install and configure Node Exporter on Debian VPS.
    What is Node Exporter?
    Node Exporter is a #Prometheus exporter that collects and exposes hardware and OS-level metrics from Linux and Unix-like systems. It runs as a background service and makes these metrics available ...
    Continued 👉 blog.radwebhosting.com/install #selfhosted #nodeexporter #letsencrypt #opensource #selfhosting #observability

  2. 🚀 How to Install and Configure Node Exporter on #Debian #VPS This article will provide a guide for how to install and configure Node Exporter on Debian VPS.
    What is Node Exporter?
    Node Exporter is a #Prometheus exporter that collects and exposes hardware and OS-level metrics from Linux and Unix-like systems. It runs as a background service and makes these metrics available ...
    Continued 👉 blog.radwebhosting.com/install #selfhosted #nodeexporter #letsencrypt #opensource #selfhosting #observability

  3. 🚀 How to Install and Configure Node Exporter on #Debian #VPS This article will provide a guide for how to install and configure Node Exporter on Debian VPS.
    What is Node Exporter?
    Node Exporter is a #Prometheus exporter that collects and exposes hardware and OS-level metrics from Linux and Unix-like systems. It runs as a background service and makes these metrics available ...
    Continued 👉 blog.radwebhosting.com/install #selfhosted #nodeexporter #letsencrypt #opensource #selfhosting #observability

  4. Девять испытаний роста нагрузки: от стартапа к приложению для 25 миллионов пользователей

    Эта статья совсем не технический анализ, а увлекательный рассказ о том, как маленький, но очень перспективный стартап стал топовым приложением, а также о том, какие сложности встали на пути команды разработки, DevOps и тестирования X5 Tech. Мы сразу заложили основные принципы нагруженного приложения: микросервисы как основа всего, полное покрытие метриками, асинхронность, кэширование на максималках. Какую-то функциональность разрабатывали сами, где-то задействовали сервисы других техкоманд из X5, а где-то и сторонние решения с рынка. Весь код писали на Python, использовали FastAPI и другие популярные на тот момент фреймворки и технологии.

    habr.com/ru/companies/X5Tech/a

    #highload #микросервисы #latency #postgresql #elasticsearch #kubernetes #hpa #балансировка_нагрузки #нагрузочное_тестирование #observability

  5. Девять испытаний роста нагрузки: от стартапа к приложению для 25 миллионов пользователей

    Эта статья совсем не технический анализ, а увлекательный рассказ о том, как маленький, но очень перспективный стартап стал топовым приложением, а также о том, какие сложности встали на пути команды разработки, DevOps и тестирования X5 Tech. Мы сразу заложили основные принципы нагруженного приложения: микросервисы как основа всего, полное покрытие метриками, асинхронность, кэширование на максималках. Какую-то функциональность разрабатывали сами, где-то задействовали сервисы других техкоманд из X5, а где-то и сторонние решения с рынка. Весь код писали на Python, использовали FastAPI и другие популярные на тот момент фреймворки и технологии.

    habr.com/ru/companies/X5Tech/a

    #highload #микросервисы #latency #postgresql #elasticsearch #kubernetes #hpa #балансировка_нагрузки #нагрузочное_тестирование #observability

  6. Девять испытаний роста нагрузки: от стартапа к приложению для 25 миллионов пользователей

    Эта статья совсем не технический анализ, а увлекательный рассказ о том, как маленький, но очень перспективный стартап стал топовым приложением, а также о том, какие сложности встали на пути команды разработки, DevOps и тестирования X5 Tech. Мы сразу заложили основные принципы нагруженного приложения: микросервисы как основа всего, полное покрытие метриками, асинхронность, кэширование на максималках. Какую-то функциональность разрабатывали сами, где-то задействовали сервисы других техкоманд из X5, а где-то и сторонние решения с рынка. Весь код писали на Python, использовали FastAPI и другие популярные на тот момент фреймворки и технологии.

    habr.com/ru/companies/X5Tech/a

    #highload #микросервисы #latency #postgresql #elasticsearch #kubernetes #hpa #балансировка_нагрузки #нагрузочное_тестирование #observability

  7. Девять испытаний роста нагрузки: от стартапа к приложению для 25 миллионов пользователей

    Эта статья совсем не технический анализ, а увлекательный рассказ о том, как маленький, но очень перспективный стартап стал топовым приложением, а также о том, какие сложности встали на пути команды разработки, DevOps и тестирования X5 Tech. Мы сразу заложили основные принципы нагруженного приложения: микросервисы как основа всего, полное покрытие метриками, асинхронность, кэширование на максималках. Какую-то функциональность разрабатывали сами, где-то задействовали сервисы других техкоманд из X5, а где-то и сторонние решения с рынка. Весь код писали на Python, использовали FastAPI и другие популярные на тот момент фреймворки и технологии.

    habr.com/ru/companies/X5Tech/a

    #highload #микросервисы #latency #postgresql #elasticsearch #kubernetes #hpa #балансировка_нагрузки #нагрузочное_тестирование #observability

  8. Зачем нужны APM-платформы, если есть Prometheus и Grafana

    Зачем APM-платформы, если есть Prometheus и Grafana Всем привет! Мы разрабатываем APM-платформу и регулярно сталкиваемся с вопросом — зачем платить, если есть стек с открытым исходным кодом вроде Prometheus и Grafana? Поэтому давайте посмотрим на достаточно интересную тему, где я как разработчик, знаю продукт изнутри и топлю за него, но и не могу отрицать стандарты индустрии на open-source. Вокруг наблюдаемости давно есть устойчивый стек из open-source продуктов: Prometheus, Grafana, Loki, Jaeger/Tempo и других. Для многих команд это дефолтный выбор — гибкий и контролируемый. В то же время, когда речь идёт о мониторинге сложных, распределенных систем и более быстром внедрении, APM-платформы (Application Performance Monitoring and Observability) предлагают другой подход: готовый продукт с уже встроенной корреляцией данных, автоматизацией и минимальной настройкой. Буду сравнивать по четырем ключевым метрикам: функциональные возможности, скорость развертывания, поддержка и адаптация к изменениям.

    habr.com/ru/companies/rkt/arti

    #apm #apmмониторинг #observability #monitoring #мониторинг #zabbix #prometheus #grafana #opensource #opentracing

  9. Зачем нужны APM-платформы, если есть Prometheus и Grafana

    Зачем APM-платформы, если есть Prometheus и Grafana Всем привет! Мы разрабатываем APM-платформу и регулярно сталкиваемся с вопросом — зачем платить, если есть стек с открытым исходным кодом вроде Prometheus и Grafana? Поэтому давайте посмотрим на достаточно интересную тему, где я как разработчик, знаю продукт изнутри и топлю за него, но и не могу отрицать стандарты индустрии на open-source. Вокруг наблюдаемости давно есть устойчивый стек из open-source продуктов: Prometheus, Grafana, Loki, Jaeger/Tempo и других. Для многих команд это дефолтный выбор — гибкий и контролируемый. В то же время, когда речь идёт о мониторинге сложных, распределенных систем и более быстром внедрении, APM-платформы (Application Performance Monitoring and Observability) предлагают другой подход: готовый продукт с уже встроенной корреляцией данных, автоматизацией и минимальной настройкой. Буду сравнивать по четырем ключевым метрикам: функциональные возможности, скорость развертывания, поддержка и адаптация к изменениям.

    habr.com/ru/companies/rkt/arti

    #apm #apmмониторинг #observability #monitoring #мониторинг #zabbix #prometheus #grafana #opensource #opentracing

  10. Зачем нужны APM-платформы, если есть Prometheus и Grafana

    Зачем APM-платформы, если есть Prometheus и Grafana Всем привет! Мы разрабатываем APM-платформу и регулярно сталкиваемся с вопросом — зачем платить, если есть стек с открытым исходным кодом вроде Prometheus и Grafana? Поэтому давайте посмотрим на достаточно интересную тему, где я как разработчик, знаю продукт изнутри и топлю за него, но и не могу отрицать стандарты индустрии на open-source. Вокруг наблюдаемости давно есть устойчивый стек из open-source продуктов: Prometheus, Grafana, Loki, Jaeger/Tempo и других. Для многих команд это дефолтный выбор — гибкий и контролируемый. В то же время, когда речь идёт о мониторинге сложных, распределенных систем и более быстром внедрении, APM-платформы (Application Performance Monitoring and Observability) предлагают другой подход: готовый продукт с уже встроенной корреляцией данных, автоматизацией и минимальной настройкой. Буду сравнивать по четырем ключевым метрикам: функциональные возможности, скорость развертывания, поддержка и адаптация к изменениям.

    habr.com/ru/companies/rkt/arti

    #apm #apmмониторинг #observability #monitoring #мониторинг #zabbix #prometheus #grafana #opensource #opentracing

  11. Зачем нужны APM-платформы, если есть Prometheus и Grafana

    Зачем APM-платформы, если есть Prometheus и Grafana Всем привет! Мы разрабатываем APM-платформу и регулярно сталкиваемся с вопросом — зачем платить, если есть стек с открытым исходным кодом вроде Prometheus и Grafana? Поэтому давайте посмотрим на достаточно интересную тему, где я как разработчик, знаю продукт изнутри и топлю за него, но и не могу отрицать стандарты индустрии на open-source. Вокруг наблюдаемости давно есть устойчивый стек из open-source продуктов: Prometheus, Grafana, Loki, Jaeger/Tempo и других. Для многих команд это дефолтный выбор — гибкий и контролируемый. В то же время, когда речь идёт о мониторинге сложных, распределенных систем и более быстром внедрении, APM-платформы (Application Performance Monitoring and Observability) предлагают другой подход: готовый продукт с уже встроенной корреляцией данных, автоматизацией и минимальной настройкой. Буду сравнивать по четырем ключевым метрикам: функциональные возможности, скорость развертывания, поддержка и адаптация к изменениям.

    habr.com/ru/companies/rkt/arti

    #apm #apmмониторинг #observability #monitoring #мониторинг #zabbix #prometheus #grafana #opensource #opentracing

  12. Как превратить тысячи лог-групп в десятки с помощью Grok-паттернов

    Представьте типичную картину: приложение генерирует тысячи логов в минуту, и в интерфейсе мониторинга вы видите сотни групп, хотя по факту проблема одна. Причина проста: в каждое сообщение вшит уникальный идентификатор, имя продукта или число, и система воспринимает каждый вариант как отдельное событие.

    habr.com/ru/articles/1034152/

    #logs #observability #log_grouping #apm

  13. 🚀 How to Install and Configure Node Exporter on #Debian #VPS This article will provide a guide for how to install and configure Node Exporter on Debian VPS.
    What is Node Exporter?
    Node Exporter is a #Prometheus exporter that collects and exposes hardware and OS-level metrics from Linux and Unix-like systems. It runs as a background service and makes these metrics available ...
    Continued 👉 blog.radwebhosting.com/install #observability #letsencrypt #selfhosted #nodeexporter #selfhosting #opensource

  14. How to Install Uptime Kuma on #Debian #VPS (5 Minute Quick-Start Guide) Here’s how to install Uptime Kuma on Debian VPS—the fastest and cleanest way to deploy it.
    What is Uptime Kuma?
    Uptime Kuma is a self-hosted, open-source #monitoring tool that allows you to track the availability and uptime of websites, services, and servers in real time. It is often ...
    Continued 👉 blog.radwebhosting.com/install #selfhosted #opensource #reverseproxy #letsencrypt #observability #uptimekuma #selfhosting

  15. How to Install Uptime Kuma on #Debian #VPS (5 Minute Quick-Start Guide) Here’s how to install Uptime Kuma on Debian VPS—the fastest and cleanest way to deploy it.
    What is Uptime Kuma?
    Uptime Kuma is a self-hosted, open-source #monitoring tool that allows you to track the availability and uptime of websites, services, and servers in real time. It is often ...
    Continued 👉 blog.radwebhosting.com/install #selfhosted #opensource #reverseproxy #letsencrypt #observability #uptimekuma #selfhosting

  16. Part 3 of the StyloBot Release Series is up.

    mostlylucid.net/blog/stylobot-

    This one is less about bots and more about the reality of long-running .NET based systems: everything that learns from traffic eventually accumulates.

    Came from one of my periodic reliability reviews where StyloBot’s vector layer had drifted to 13GB on the .NET Large Object Heap due to the wrong abstraction (in-process HNSW behaving like an unbounded cache).

    The interesting part wasn’t the fix. It was recognising that the architecture itself was wrong for the runtime pattern.

    Covers:

    how I periodically review long-running services

    using dotnet-counters, dotMemory and dotTrace to find growth

    why “just add a cap” is often the wrong answer

    replacing unbounded ANN structures with bounded hot caches + compacted persistence

    taking the vector layer from 13GB LOH to <6MB

    The broader point applies to any system that “remembers”:

    bot detection, fraud scoring, recommendations, anomaly detection, RAG pipelines, adaptive systems.

    Fix the shape, not the symptom.

    #dotnet #aspnetcore #performance #architecture #ai #rag #observability

  17. Agent demos love static diagrams. Production gives you a different graph.

    This post shows how to expose a live LangChain4j topology from Quarkus with AgentMonitor, HtmlReportGenerator, and an SSE feed for recent runs.

    the-main-thread.com/p/quarkus-

    #Java #Quarkus #LangChain4j #Observability

  18. Reduce developer friction – Configuring tools like Fluent Bit (and Fluentd)

    Something that vendors like Microsoft have been really good at is reducing the friction on getting started – from simplifying installations with MSI files and defaulted options through to very informative error messages in Excel when you’ve got a function slightly wrong. Apple is another good example of this; while no two Android phones are the same, my experience is that setting up an iPhone is just so much easier than setting up an Android phone. It is also the setup/configuration where most friction comes from.

    Open-Source Software (OSS), as a generalisation, tend to be a bit weaker at minimising friction – this comes from several factors:

    • When OSS is part of a business model, vendors can reduce that friction, making their enhanced version more attractive.
    • OSS contributors are typically focused on the core problem space and are usually close enough to the fine details to not need those fancy features to keep the rest of us out of trouble.
    • The expectation is that tools to make configuration easy are embedded in the application, making it heavier, when the aim is to keep things as light as possible.
    • Occasionally, a little bit of intellectual snobbery can creep in

    The common challenge

    The issue that I have observed is that we often go through cycles of working with a technology. For example, you’re building a microservice. Chances are, you’ll start writing and running it locally, without worrying about containerization. Once you’re pretty happy with things, you’ll Dockerize the service, start testing it locally, and then you’ll be ready to deploy it to a cluster. Now you’ll need your YAML. It may well be weeks since you last looked at Helm charts. You end up cutting and pasting your last configuration. But now you need to use another feature of Helm, can you remember the exact settings for the feature. So now you’re trawling the net for documentation, and then it takes several tries to get it right.

    AI may well step in to help developers in this area, where solutions and products are well-documented. But with the wrong model or insufficient detail in the prompt, it’s easy to make a mistake. Personally, I’d turn to AI when it becomes necessary to trawl code to better understand the configuration and its behaviour, and to set options.

    Experimental Solution

    Solution – well, that depends upon the configuration syntax. We have been experimenting with RJSF (React JSON Schema Form), which provides a React-based UI that can be dynamically driven by a JSON schema and validate data with AJV (an alternative stack considered would have been around JSON Forms).

     {    "type": "object",    "title": "Dummy",    "properties": {      "name": {        "type": "string",        "const": "dummy",        "title": "Plugin"      },      "copies": {        "type": "integer",        "description": "Number of messages to generate each time messages are generated.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "integer",        "default": 1      },      "dummy": {        "type": "string",        "description": "Dummy JSON record.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "string",        "default": "{\"message\":\"dummy\"}"      },      "fixed_timestamp": {        "type": "boolean",        "description": "If enabled, use a fixed timestamp.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "boolean",        "default": false      }    }  }  

    The above fragment shows part of the Schema definition for the Dummy plugin for Fluent Bit.

    By then creating a schema that defines the different plugins, attributes, etc., we can drive validation and menu items easily in the UI. Admittedly, the config file is significant given all the plugins and configuration options, but it is a fair price to pay for a UI that validates the data. Establishing the schema to start with, we’ve covered it through scripting the retrieval and scraping of the Fluent Bit pages, which are pretty consistent in structure.

    We have added some custom elements into the definition, for example, x-doc-reference, which allows us to extend the React components to provide features such as a link back to the original documentation as you select attributes or plugins.

    As a result, we very quickly have a UI that can look like this:

    A lot easier to view and tweak, with no need to hunt for valid options. Even if we want more information, we’re just a button click away from the open-source data. Perhaps we should provide a version that hyperlinks to the Manning Live Books on Fluent Bit, etc.

    There are a few other factors to consider; for example, Fluent Bit configuration is YAML, not JSON, which can be easily resolved given the relationship between the two standards. Then there are processors that can embed Lua code or a SQL-like syntax. As we’ve chosen to provide a Python backend, we’ve addressed this by providing REST endpoints which can query out of the JSON the code or SQL and perform validation using the Python Lua Parser, and the SQL syntax can be addressed using the Lark library for processing the SQL, as the syntax is simple enough to define and maintain the syntax.

    Outstanding Gaps for Fluent Bit

    We still need to address several features that Fluent Bit has, specifically:

    • Environment variables
    • Includes

    These issues should be straightforward to overcome, although dynamically including the included elements into the UI view elements can be done. The challenge is: if any changes need to go into something that has been included, how do we push them back to the included file? Particularly if there are multiple layers of inclusion.

    What about Fluentd?

    Fluentd configuration isn’t JSON-based notation, but it is structured. So, to apply the same mechanism, we’ll need to define a schema and a mapping mechanism. The tricky part of the schema is that Fluentd supports nesting plugins, since the way pipelines are defined for routing differs. While JSON schema will enable this with constructs such as anyOf, oneOf, object nesting, and bounded object arrays, the structure will be more complex.

    The second challenge will be the transformer/renderer, so we don’t introduce issues from having to escape and unescape characters, since JSON Schema is stricter about character use.

    Then What?

    Well, if we get this going, we’ll probably incorporate the capability into our OpAMP project and maybe create a build that lets the configuration tool run independently. Lastly, perhaps we should look to see if we can make the different layers a little more abstract, so we can plug in editors for other configurations, such as OTel Collectors or the ELK Stack.

    As a bonus, perhaps transform the Schema into a quick reference web document?

    #AI #artificialIntelligence #configuration #development #ELK #FluentBit #Fluentd #LLM #observability #OpAMP #Technology
  19. Reduce developer friction – Configuring tools like Fluent Bit (and Fluentd)

    Something that vendors like Microsoft have been really good at is reducing the friction on getting started – from simplifying installations with MSI files and defaulted options through to very informative error messages in Excel when you’ve got a function slightly wrong. Apple is another good example of this; while no two Android phones are the same, my experience is that setting up an iPhone is just so much easier than setting up an Android phone. It is also the setup/configuration where most friction comes from.

    Open-Source Software (OSS), as a generalisation, tend to be a bit weaker at minimising friction – this comes from several factors:

    • When OSS is part of a business model, vendors can reduce that friction, making their enhanced version more attractive.
    • OSS contributors are typically focused on the core problem space and are usually close enough to the fine details to not need those fancy features to keep the rest of us out of trouble.
    • The expectation is that tools to make configuration easy are embedded in the application, making it heavier, when the aim is to keep things as light as possible.
    • Occasionally, a little bit of intellectual snobbery can creep in

    The common challenge

    The issue that I have observed is that we often go through cycles of working with a technology. For example, you’re building a microservice. Chances are, you’ll start writing and running it locally, without worrying about containerization. Once you’re pretty happy with things, you’ll Dockerize the service, start testing it locally, and then you’ll be ready to deploy it to a cluster. Now you’ll need your YAML. It may well be weeks since you last looked at Helm charts. You end up cutting and pasting your last configuration. But now you need to use another feature of Helm, can you remember the exact settings for the feature. So now you’re trawling the net for documentation, and then it takes several tries to get it right.

    AI may well step in to help developers in this area, where solutions and products are well-documented. But with the wrong model or insufficient detail in the prompt, it’s easy to make a mistake. Personally, I’d turn to AI when it becomes necessary to trawl code to better understand the configuration and its behaviour, and to set options.

    Experimental Solution

    Solution – well, that depends upon the configuration syntax. We have been experimenting with RJSF (React JSON Schema Form), which provides a React-based UI that can be dynamically driven by a JSON schema and validate data with AJV (an alternative stack considered would have been around JSON Forms).

     {    "type": "object",    "title": "Dummy",    "properties": {      "name": {        "type": "string",        "const": "dummy",        "title": "Plugin"      },      "copies": {        "type": "integer",        "description": "Number of messages to generate each time messages are generated.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "integer",        "default": 1      },      "dummy": {        "type": "string",        "description": "Dummy JSON record.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "string",        "default": "{\"message\":\"dummy\"}"      },      "fixed_timestamp": {        "type": "boolean",        "description": "If enabled, use a fixed timestamp.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "boolean",        "default": false      }    }  }  

    The above fragment shows part of the Schema definition for the Dummy plugin for Fluent Bit.

    By then creating a schema that defines the different plugins, attributes, etc., we can drive validation and menu items easily in the UI. Admittedly, the config file is significant given all the plugins and configuration options, but it is a fair price to pay for a UI that validates the data. Establishing the schema to start with, we’ve covered it through scripting the retrieval and scraping of the Fluent Bit pages, which are pretty consistent in structure.

    We have added some custom elements into the definition, for example, x-doc-reference, which allows us to extend the React components to provide features such as a link back to the original documentation as you select attributes or plugins.

    As a result, we very quickly have a UI that can look like this:

    A lot easier to view and tweak, with no need to hunt for valid options. Even if we want more information, we’re just a button click away from the open-source data. Perhaps we should provide a version that hyperlinks to the Manning Live Books on Fluent Bit, etc.

    There are a few other factors to consider; for example, Fluent Bit configuration is YAML, not JSON, which can be easily resolved given the relationship between the two standards. Then there are processors that can embed Lua code or a SQL-like syntax. As we’ve chosen to provide a Python backend, we’ve addressed this by providing REST endpoints which can query out of the JSON the code or SQL and perform validation using the Python Lua Parser, and the SQL syntax can be addressed using the Lark library for processing the SQL, as the syntax is simple enough to define and maintain the syntax.

    Outstanding Gaps for Fluent Bit

    We still need to address several features that Fluent Bit has, specifically:

    • Environment variables
    • Includes

    These issues should be straightforward to overcome, although dynamically including the included elements into the UI view elements can be done. The challenge is: if any changes need to go into something that has been included, how do we push them back to the included file? Particularly if there are multiple layers of inclusion.

    What about Fluentd?

    Fluentd configuration isn’t JSON-based notation, but it is structured. So, to apply the same mechanism, we’ll need to define a schema and a mapping mechanism. The tricky part of the schema is that Fluentd supports nesting plugins, since the way pipelines are defined for routing differs. While JSON schema will enable this with constructs such as anyOf, oneOf, object nesting, and bounded object arrays, the structure will be more complex.

    The second challenge will be the transformer/renderer, so we don’t introduce issues from having to escape and unescape characters, since JSON Schema is stricter about character use.

    Then What?

    Well, if we get this going, we’ll probably incorporate the capability into our OpAMP project and maybe create a build that lets the configuration tool run independently. Lastly, perhaps we should look to see if we can make the different layers a little more abstract, so we can plug in editors for other configurations, such as OTel Collectors or the ELK Stack.

    As a bonus, perhaps transform the Schema into a quick reference web document?

    #AI #artificialIntelligence #configuration #development #ELK #FluentBit #Fluentd #LLM #observability #OpAMP #Technology
  20. Reduce developer friction – Configuring tools like Fluent Bit (and Fluentd)

    Something that vendors like Microsoft have been really good at is reducing the friction on getting started – from simplifying installations with MSI files and defaulted options through to very informative error messages in Excel when you’ve got a function slightly wrong. Apple is another good example of this; while no two Android phones are the same, my experience is that setting up an iPhone is just so much easier than setting up an Android phone. It is also the setup/configuration where most friction comes from.

    Open-Source Software (OSS), as a generalisation, tend to be a bit weaker at minimising friction – this comes from several factors:

    • When OSS is part of a business model, vendors can reduce that friction, making their enhanced version more attractive.
    • OSS contributors are typically focused on the core problem space and are usually close enough to the fine details to not need those fancy features to keep the rest of us out of trouble.
    • The expectation is that tools to make configuration easy are embedded in the application, making it heavier, when the aim is to keep things as light as possible.
    • Occasionally, a little bit of intellectual snobbery can creep in

    The common challenge

    The issue that I have observed is that we often go through cycles of working with a technology. For example, you’re building a microservice. Chances are, you’ll start writing and running it locally, without worrying about containerization. Once you’re pretty happy with things, you’ll Dockerize the service, start testing it locally, and then you’ll be ready to deploy it to a cluster. Now you’ll need your YAML. It may well be weeks since you last looked at Helm charts. You end up cutting and pasting your last configuration. But now you need to use another feature of Helm, can you remember the exact settings for the feature. So now you’re trawling the net for documentation, and then it takes several tries to get it right.

    AI may well step in to help developers in this area, where solutions and products are well-documented. But with the wrong model or insufficient detail in the prompt, it’s easy to make a mistake. Personally, I’d turn to AI when it becomes necessary to trawl code to better understand the configuration and its behaviour, and to set options.

    Experimental Solution

    Solution – well, that depends upon the configuration syntax. We have been experimenting with RJSF (React JSON Schema Form), which provides a React-based UI that can be dynamically driven by a JSON schema and validate data with AJV (an alternative stack considered would have been around JSON Forms).

     {    "type": "object",    "title": "Dummy",    "properties": {      "name": {        "type": "string",        "const": "dummy",        "title": "Plugin"      },      "copies": {        "type": "integer",        "description": "Number of messages to generate each time messages are generated.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "integer",        "default": 1      },      "dummy": {        "type": "string",        "description": "Dummy JSON record.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "string",        "default": "{\"message\":\"dummy\"}"      },      "fixed_timestamp": {        "type": "boolean",        "description": "If enabled, use a fixed timestamp.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "boolean",        "default": false      }    }  }  

    The above fragment shows part of the Schema definition for the Dummy plugin for Fluent Bit.

    By then creating a schema that defines the different plugins, attributes, etc., we can drive validation and menu items easily in the UI. Admittedly, the config file is significant given all the plugins and configuration options, but it is a fair price to pay for a UI that validates the data. Establishing the schema to start with, we’ve covered it through scripting the retrieval and scraping of the Fluent Bit pages, which are pretty consistent in structure.

    We have added some custom elements into the definition, for example, x-doc-reference, which allows us to extend the React components to provide features such as a link back to the original documentation as you select attributes or plugins.

    As a result, we very quickly have a UI that can look like this:

    A lot easier to view and tweak, with no need to hunt for valid options. Even if we want more information, we’re just a button click away from the open-source data. Perhaps we should provide a version that hyperlinks to the Manning Live Books on Fluent Bit, etc.

    There are a few other factors to consider; for example, Fluent Bit configuration is YAML, not JSON, which can be easily resolved given the relationship between the two standards. Then there are processors that can embed Lua code or a SQL-like syntax. As we’ve chosen to provide a Python backend, we’ve addressed this by providing REST endpoints which can query out of the JSON the code or SQL and perform validation using the Python Lua Parser, and the SQL syntax can be addressed using the Lark library for processing the SQL, as the syntax is simple enough to define and maintain the syntax.

    Outstanding Gaps for Fluent Bit

    We still need to address several features that Fluent Bit has, specifically:

    • Environment variables
    • Includes

    These issues should be straightforward to overcome, although dynamically including the included elements into the UI view elements can be done. The challenge is: if any changes need to go into something that has been included, how do we push them back to the included file? Particularly if there are multiple layers of inclusion.

    What about Fluentd?

    Fluentd configuration isn’t JSON-based notation, but it is structured. So, to apply the same mechanism, we’ll need to define a schema and a mapping mechanism. The tricky part of the schema is that Fluentd supports nesting plugins, since the way pipelines are defined for routing differs. While JSON schema will enable this with constructs such as anyOf, oneOf, object nesting, and bounded object arrays, the structure will be more complex.

    The second challenge will be the transformer/renderer, so we don’t introduce issues from having to escape and unescape characters, since JSON Schema is stricter about character use.

    Then What?

    Well, if we get this going, we’ll probably incorporate the capability into our OpAMP project and maybe create a build that lets the configuration tool run independently. Lastly, perhaps we should look to see if we can make the different layers a little more abstract, so we can plug in editors for other configurations, such as OTel Collectors or the ELK Stack.

    As a bonus, perhaps transform the Schema into a quick reference web document?

    #AI #artificialIntelligence #configuration #development #ELK #FluentBit #Fluentd #LLM #observability #OpAMP #Technology
  21. Reduce developer friction – Configuring tools like Fluent Bit (and Fluentd)

    Something that vendors like Microsoft have been really good at is reducing the friction on getting started – from simplifying installations with MSI files and defaulted options through to very informative error messages in Excel when you’ve got a function slightly wrong. Apple is another good example of this; while no two Android phones are the same, my experience is that setting up an iPhone is just so much easier than setting up an Android phone. It is also the setup/configuration where most friction comes from.

    Open-Source Software (OSS), as a generalisation, tend to be a bit weaker at minimising friction – this comes from several factors:

    • When OSS is part of a business model, vendors can reduce that friction, making their enhanced version more attractive.
    • OSS contributors are typically focused on the core problem space and are usually close enough to the fine details to not need those fancy features to keep the rest of us out of trouble.
    • The expectation is that tools to make configuration easy are embedded in the application, making it heavier, when the aim is to keep things as light as possible.
    • Occasionally, a little bit of intellectual snobbery can creep in

    The common challenge

    The issue that I have observed is that we often go through cycles of working with a technology. For example, you’re building a microservice. Chances are, you’ll start writing and running it locally, without worrying about containerization. Once you’re pretty happy with things, you’ll Dockerize the service, start testing it locally, and then you’ll be ready to deploy it to a cluster. Now you’ll need your YAML. It may well be weeks since you last looked at Helm charts. You end up cutting and pasting your last configuration. But now you need to use another feature of Helm, can you remember the exact settings for the feature. So now you’re trawling the net for documentation, and then it takes several tries to get it right.

    AI may well step in to help developers in this area, where solutions and products are well-documented. But with the wrong model or insufficient detail in the prompt, it’s easy to make a mistake. Personally, I’d turn to AI when it becomes necessary to trawl code to better understand the configuration and its behaviour, and to set options.

    Experimental Solution

    Solution – well, that depends upon the configuration syntax. We have been experimenting with RJSF (React JSON Schema Form), which provides a React-based UI that can be dynamically driven by a JSON schema and validate data with AJV (an alternative stack considered would have been around JSON Forms).

     {    "type": "object",    "title": "Dummy",    "properties": {      "name": {        "type": "string",        "const": "dummy",        "title": "Plugin"      },      "copies": {        "type": "integer",        "description": "Number of messages to generate each time messages are generated.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "integer",        "default": 1      },      "dummy": {        "type": "string",        "description": "Dummy JSON record.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "string",        "default": "{\"message\":\"dummy\"}"      },      "fixed_timestamp": {        "type": "boolean",        "description": "If enabled, use a fixed timestamp.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "boolean",        "default": false      }    }  }  

    The above fragment shows part of the Schema definition for the Dummy plugin for Fluent Bit.

    By then creating a schema that defines the different plugins, attributes, etc., we can drive validation and menu items easily in the UI. Admittedly, the config file is significant given all the plugins and configuration options, but it is a fair price to pay for a UI that validates the data. Establishing the schema to start with, we’ve covered it through scripting the retrieval and scraping of the Fluent Bit pages, which are pretty consistent in structure.

    We have added some custom elements into the definition, for example, x-doc-reference, which allows us to extend the React components to provide features such as a link back to the original documentation as you select attributes or plugins.

    As a result, we very quickly have a UI that can look like this:

    A lot easier to view and tweak, with no need to hunt for valid options. Even if we want more information, we’re just a button click away from the open-source data. Perhaps we should provide a version that hyperlinks to the Manning Live Books on Fluent Bit, etc.

    There are a few other factors to consider; for example, Fluent Bit configuration is YAML, not JSON, which can be easily resolved given the relationship between the two standards. Then there are processors that can embed Lua code or a SQL-like syntax. As we’ve chosen to provide a Python backend, we’ve addressed this by providing REST endpoints which can query out of the JSON the code or SQL and perform validation using the Python Lua Parser, and the SQL syntax can be addressed using the Lark library for processing the SQL, as the syntax is simple enough to define and maintain the syntax.

    Outstanding Gaps for Fluent Bit

    We still need to address several features that Fluent Bit has, specifically:

    • Environment variables
    • Includes

    These issues should be straightforward to overcome, although dynamically including the included elements into the UI view elements can be done. The challenge is: if any changes need to go into something that has been included, how do we push them back to the included file? Particularly if there are multiple layers of inclusion.

    What about Fluentd?

    Fluentd configuration isn’t JSON-based notation, but it is structured. So, to apply the same mechanism, we’ll need to define a schema and a mapping mechanism. The tricky part of the schema is that Fluentd supports nesting plugins, since the way pipelines are defined for routing differs. While JSON schema will enable this with constructs such as anyOf, oneOf, object nesting, and bounded object arrays, the structure will be more complex.

    The second challenge will be the transformer/renderer, so we don’t introduce issues from having to escape and unescape characters, since JSON Schema is stricter about character use.

    Then What?

    Well, if we get this going, we’ll probably incorporate the capability into our OpAMP project and maybe create a build that lets the configuration tool run independently. Lastly, perhaps we should look to see if we can make the different layers a little more abstract, so we can plug in editors for other configurations, such as OTel Collectors or the ELK Stack.

    As a bonus, perhaps transform the Schema into a quick reference web document?

    #AI #artificialIntelligence #configuration #development #ELK #FluentBit #Fluentd #LLM #observability #OpAMP #Technology
  22. Reduce developer friction – Configuring tools like Fluent Bit (and Fluentd)

    Something that vendors like Microsoft have been really good at is reducing the friction on getting started – from simplifying installations with MSI files and defaulted options through to very informative error messages in Excel when you’ve got a function slightly wrong. Apple is another good example of this; while no two Android phones are the same, my experience is that setting up an iPhone is just so much easier than setting up an Android phone. It is also the setup/configuration where most friction comes from.

    Open-Source Software (OSS), as a generalisation, tend to be a bit weaker at minimising friction – this comes from several factors:

    • When OSS is part of a business model, vendors can reduce that friction, making their enhanced version more attractive.
    • OSS contributors are typically focused on the core problem space and are usually close enough to the fine details to not need those fancy features to keep the rest of us out of trouble.
    • The expectation is that tools to make configuration easy are embedded in the application, making it heavier, when the aim is to keep things as light as possible.
    • Occasionally, a little bit of intellectual snobbery can creep in

    The common challenge

    The issue that I have observed is that we often go through cycles of working with a technology. For example, you’re building a microservice. Chances are, you’ll start writing and running it locally, without worrying about containerization. Once you’re pretty happy with things, you’ll Dockerize the service, start testing it locally, and then you’ll be ready to deploy it to a cluster. Now you’ll need your YAML. It may well be weeks since you last looked at Helm charts. You end up cutting and pasting your last configuration. But now you need to use another feature of Helm, can you remember the exact settings for the feature. So now you’re trawling the net for documentation, and then it takes several tries to get it right.

    AI may well step in to help developers in this area, where solutions and products are well-documented. But with the wrong model or insufficient detail in the prompt, it’s easy to make a mistake. Personally, I’d turn to AI when it becomes necessary to trawl code to better understand the configuration and its behaviour, and to set options.

    Experimental Solution

    Solution – well, that depends upon the configuration syntax. We have been experimenting with RJSF (React JSON Schema Form), which provides a React-based UI that can be dynamically driven by a JSON schema and validate data with AJV (an alternative stack considered would have been around JSON Forms).

     {    "type": "object",    "title": "Dummy",    "properties": {      "name": {        "type": "string",        "const": "dummy",        "title": "Plugin"      },      "copies": {        "type": "integer",        "description": "Number of messages to generate each time messages are generated.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "integer",        "default": 1      },      "dummy": {        "type": "string",        "description": "Dummy JSON record.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "string",        "default": "{\"message\":\"dummy\"}"      },      "fixed_timestamp": {        "type": "boolean",        "description": "If enabled, use a fixed timestamp.",        "x-doc-reference": "docs.fluentbit.io/manual/data-;,        "x-doc-required": false,        "x-config-data-type": "boolean",        "default": false      }    }  }  

    The above fragment shows part of the Schema definition for the Dummy plugin for Fluent Bit.

    By then creating a schema that defines the different plugins, attributes, etc., we can drive validation and menu items easily in the UI. Admittedly, the config file is significant given all the plugins and configuration options, but it is a fair price to pay for a UI that validates the data. Establishing the schema to start with, we’ve covered it through scripting the retrieval and scraping of the Fluent Bit pages, which are pretty consistent in structure.

    We have added some custom elements into the definition, for example, x-doc-reference, which allows us to extend the React components to provide features such as a link back to the original documentation as you select attributes or plugins.

    As a result, we very quickly have a UI that can look like this:

    A lot easier to view and tweak, with no need to hunt for valid options. Even if we want more information, we’re just a button click away from the open-source data. Perhaps we should provide a version that hyperlinks to the Manning Live Books on Fluent Bit, etc.

    There are a few other factors to consider; for example, Fluent Bit configuration is YAML, not JSON, which can be easily resolved given the relationship between the two standards. Then there are processors that can embed Lua code or a SQL-like syntax. As we’ve chosen to provide a Python backend, we’ve addressed this by providing REST endpoints which can query out of the JSON the code or SQL and perform validation using the Python Lua Parser, and the SQL syntax can be addressed using the Lark library for processing the SQL, as the syntax is simple enough to define and maintain the syntax.

    Outstanding Gaps for Fluent Bit

    We still need to address several features that Fluent Bit has, specifically:

    • Environment variables
    • Includes

    These issues should be straightforward to overcome, although dynamically including the included elements into the UI view elements can be done. The challenge is: if any changes need to go into something that has been included, how do we push them back to the included file? Particularly if there are multiple layers of inclusion.

    What about Fluentd?

    Fluentd configuration isn’t JSON-based notation, but it is structured. So, to apply the same mechanism, we’ll need to define a schema and a mapping mechanism. The tricky part of the schema is that Fluentd supports nesting plugins, since the way pipelines are defined for routing differs. While JSON schema will enable this with constructs such as anyOf, oneOf, object nesting, and bounded object arrays, the structure will be more complex.

    The second challenge will be the transformer/renderer, so we don’t introduce issues from having to escape and unescape characters, since JSON Schema is stricter about character use.

    Then What?

    Well, if we get this going, we’ll probably incorporate the capability into our OpAMP project and maybe create a build that lets the configuration tool run independently. Lastly, perhaps we should look to see if we can make the different layers a little more abstract, so we can plug in editors for other configurations, such as OTel Collectors or the ELK Stack.

    As a bonus, perhaps transform the Schema into a quick reference web document?

    #AI #artificialIntelligence #configuration #development #ELK #FluentBit #Fluentd #LLM #observability #OpAMP #Technology
  23. Stop ctrl c + v’ing metrics into an LLM. Directly query into your production environment to analyze logs, profiles, alerts, traces, and metrics. Then get a diagnosis in seconds: coroot.com/agentic-observabili

    🐧 🐝 Check out our docs for more info: docs.coroot.com/mcp/overview/

    #devops #observability #monitoring #aws #cloud #tech #ai #linux #ebpf #sre #sysadmin #kubernetes #docker #mcp #claudecode #claude #anthropic #openai #codex

  24. #PlatformEngineering succeeds when reliability and ergonomics reinforce each other - not compete.

    Pratik Agarwal explores 3 foundational pillars:
    1️⃣ Automated reliability
    2️⃣ Developer ergonomics
    3️⃣ Operator ergonomics

    The result❓ Together, they create a Virtuous Cycle! 🔄

    Read the #InfoQ article for more insights: bit.ly/4ew5WxK

    #DevOps #Observability #SRE #DevEx

  25. Как Monium приручил GC: разбираемся со сборщиками мусора в observability‑платформе

    Всем привет, меня зовут Антон Рыбочкин, я старший разработчик бэкенда в команде Yandex Monium. Monium — это платформа для сбора, хранения и анализа телеметрии (метрик, логов и трейсов). Она позволяет дать оценку того, как себя чувствует сервис, находить причины сбоев, оперативно уведомлять об аномалиях. Изначально эта платформа развивалась как внутренняя система для мониторинга сервисов в масштабах всего Яндекса. Отсюда высокие требования к надёжности сервиса — телеметрия должна быть доступна, даже когда другие сервисы лежат. И с точки зрения бэкенда в таких кейсах есть свои вызовы, один из них — сборка мусора, или сокращённо GC. В этой статье я расскажу про наш опыт с разными сборщиками мусора: с какими проблемами Java GC мы столкнулись в разных сервисах, как их можно диагностировать и как решить.

    habr.com/ru/companies/yandex_c

    #java #gc #opentelemetry #yandex_monium #monium #observability #generational_zgc #parallelgc #FullGC #shenandoah

  26. Как Monium приручил GC: разбираемся со сборщиками мусора в observability‑платформе

    Всем привет, меня зовут Антон Рыбочкин, я старший разработчик бэкенда в команде Yandex Monium. Monium — это платформа для сбора, хранения и анализа телеметрии (метрик, логов и трейсов). Она позволяет дать оценку того, как себя чувствует сервис, находить причины сбоев, оперативно уведомлять об аномалиях. Изначально эта платформа развивалась как внутренняя система для мониторинга сервисов в масштабах всего Яндекса. Отсюда высокие требования к надёжности сервиса — телеметрия должна быть доступна, даже когда другие сервисы лежат. И с точки зрения бэкенда в таких кейсах есть свои вызовы, один из них — сборка мусора, или сокращённо GC. В этой статье я расскажу про наш опыт с разными сборщиками мусора: с какими проблемами Java GC мы столкнулись в разных сервисах, как их можно диагностировать и как решить.

    habr.com/ru/companies/yandex_c

    #java #gc #opentelemetry #yandex_monium #monium #observability #generational_zgc #parallelgc #FullGC #shenandoah

  27. Как Monium приручил GC: разбираемся со сборщиками мусора в observability‑платформе

    Всем привет, меня зовут Антон Рыбочкин, я старший разработчик бэкенда в команде Yandex Monium. Monium — это платформа для сбора, хранения и анализа телеметрии (метрик, логов и трейсов). Она позволяет дать оценку того, как себя чувствует сервис, находить причины сбоев, оперативно уведомлять об аномалиях. Изначально эта платформа развивалась как внутренняя система для мониторинга сервисов в масштабах всего Яндекса. Отсюда высокие требования к надёжности сервиса — телеметрия должна быть доступна, даже когда другие сервисы лежат. И с точки зрения бэкенда в таких кейсах есть свои вызовы, один из них — сборка мусора, или сокращённо GC. В этой статье я расскажу про наш опыт с разными сборщиками мусора: с какими проблемами Java GC мы столкнулись в разных сервисах, как их можно диагностировать и как решить.

    habr.com/ru/companies/yandex_c

    #java #gc #opentelemetry #yandex_monium #monium #observability #generational_zgc #parallelgc #FullGC #shenandoah

  28. Как Monium приручил GC: разбираемся со сборщиками мусора в observability‑платформе

    Всем привет, меня зовут Антон Рыбочкин, я старший разработчик бэкенда в команде Yandex Monium. Monium — это платформа для сбора, хранения и анализа телеметрии (метрик, логов и трейсов). Она позволяет дать оценку того, как себя чувствует сервис, находить причины сбоев, оперативно уведомлять об аномалиях. Изначально эта платформа развивалась как внутренняя система для мониторинга сервисов в масштабах всего Яндекса. Отсюда высокие требования к надёжности сервиса — телеметрия должна быть доступна, даже когда другие сервисы лежат. И с точки зрения бэкенда в таких кейсах есть свои вызовы, один из них — сборка мусора, или сокращённо GC. В этой статье я расскажу про наш опыт с разными сборщиками мусора: с какими проблемами Java GC мы столкнулись в разных сервисах, как их можно диагностировать и как решить.

    habr.com/ru/companies/yandex_c

    #java #gc #opentelemetry #yandex_monium #monium #observability #generational_zgc #parallelgc #FullGC #shenandoah

  29. DGX Spark: мониторинг unified memory, когда NVML и dcgm‑exporter молчат

    Свежепоставленный мониторинг на DGX Spark. Открываю NVIDIA‑дашборд в Grafana — половина memory‑панелей пустые, прямые линии по нулю. Сначала кажется, что что‑то не настроил. Через полчаса доходит: это не у меня сломалось, это NVML на GB10 так работает. Это та область, где на GB10 половина стандартного observability‑стека просто не работает: NVML отдаёт [N/A] на memory.used и memory.total, dcgm‑exporter не ставится, nvtop в memory‑колонке показывает пустоту. В Grafana NVIDIA‑дашборды по умолчанию выглядят так, будто GPU вообще нет — и это не очевидно, потому что Grafana при отсутствии данных не кричит, а молча рисует ровную линию по нулю. Статья — про то, как я это место обошёл и что в итоге увидел в Grafana. Трёхуровневая схема: textfile collector для базовых метрик, per‑container attribution через docker top + nvidia-smi , и CLI‑фоллбэк на /proc/meminfo , который оказался полезен не только на Spark, но и на других Linux‑системах с единой памятью (unified memory) — AMD Strix Halo и подобные.

    habr.com/ru/articles/1031904/

    #dgx_spark #grafana #monitoring #nodeexporter #gb10 #arm64 #prometheus #observability

  30. kubectl describe pod: как читать вывод, в котором Kubernetes уже написал причину

    Статья о том, как читать kubectl describe pod не как длинный вывод, а как историю жизни Pod’а: кто его создал, куда его пытались поставить, скачался ли image, стартовали ли init containers, что случилось с probes, volumes, restarts и Events. Постарался сделать материал дружелюбным для джунов и мидлов, но без упрощения до «введите команду и посмотрите статус». Тут много реальной эксплуатации: Pending , CrashLoopBackOff , ImagePullBackOff , OOMKilled , FailedMount , CreateContainerConfigError , Evicted и любимое «Pod Running, но сервис не работает». Если вам нужна не вся теория, а быстрая шпаргалка для инцидента — в конце статьи есть компактная схема : что смотреть в kubectl describe pod при Pending , CrashLoopBackOff , ImagePullBackOff , OOMKilled , FailedMount и других типовых состояниях. Можно сразу перейти к ней, сохранить и использовать как чек-лист. А если хочется понять не только «куда смотреть», но и почему Kubernetes ведёт себя именно так — дальше разберём describe вместе по шагам.

    habr.com/ru/articles/1031454/

    #devops #kubernetes #pod #дебаг #девопс #траблшутинг #кубер #debug #observability #oomkill

  31. Loki «Next Wave»: как Grafana Labs переписала правила логирования на GrafanaCON 2026

    Всем привет. В этой статье расскажу о новостях касаемо Loki. О том что было представлено на GrafanaCON 2026 в Барселоне. Чего нам ждать от новой архитектуры Loki, как она будет работать, и что прячет под капотом.

    habr.com/ru/articles/1030716/

    #grafana #loki #logs #observability #kafka #логи #графана #мониторинг

  32. Loki «Next Wave»: как Grafana Labs переписала правила логирования на GrafanaCON 2026

    Всем привет. В этой статье расскажу о новостях касаемо Loki. О том что было представлено на GrafanaCON 2026 в Барселоне. Чего нам ждать от новой архитектуры Loki, как она будет работать, и что прячет под капотом.

    habr.com/ru/articles/1030716/

    #grafana #loki #logs #observability #kafka #логи #графана #мониторинг

  33. Collecting and analyzing log data becomes challenging in a multi-tiered architecture or a dynamic microservice environment. The LPI DevOps Tools Engineer 2.0 exam covers log management and analysis in objective 704.3.

    Learn more from Fabian Thorns and Uirá Ribeiro: lpi.org/5swa

  34. 🚀 How to Install and Configure Node Exporter on #Debian #VPS This article will provide a guide for how to install and configure Node Exporter on Debian VPS.
    What is Node Exporter?
    Node Exporter is a #Prometheus exporter that collects and exposes hardware and OS-level metrics from Linux and Unix-like systems. It runs as a background service and makes these metrics available ...
    Continued 👉 blog.radwebhosting.com/install #opensource #letsencrypt #selfhosted #nodeexporter #observability #selfhosting

  35. 🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    ✅️ "It works on my laptop" isn't a strategy.
    ✅ Need IaC, partitioning, clustering, and strict error handling.
    ✅ dbt ensures reproducible, tested models.
    ✅ Infra is invisible work—if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  36. I caught up recently with #groundcover CEO Shahar Azulay to discuss the shifting requirements – and growing role -- for #observability tools in #AI development. From his point of view, #o11y has evolved from a post-production downtime prevention system to "the source of truth for everything from code creation to shipping and testing code, remediation and production."

    In today’s episode, we’ll cover…

    -- Coping with a further influx of observability data from #AIagents

    -- Observability for #costmanagement

    -- Data collection for AI agent workflows using #eBPF

    -- Groundcover's #AIobservability roadmap

    And more!

    Watch on YouTube: youtu.be/wjYj7gskPJA

  37. I caught up recently with #groundcover CEO Shahar Azulay to discuss the shifting requirements – and growing role -- for #observability tools in #AI development. From his point of view, #o11y has evolved from a post-production downtime prevention system to "the source of truth for everything from code creation to shipping and testing code, remediation and production."

    In today’s episode, we’ll cover…

    -- Coping with a further influx of observability data from #AIagents

    -- Observability for #costmanagement

    -- Data collection for AI agent workflows using #eBPF

    -- Groundcover's #AIobservability roadmap

    And more!

    Watch on YouTube: youtu.be/wjYj7gskPJA

  38. I caught up recently with CEO Shahar Azulay to discuss the shifting requirements – and growing role -- for tools in development. From his point of view, has evolved from a post-production downtime prevention system to "the source of truth for everything from code creation to shipping and testing code, remediation and production."

    In today’s episode, we’ll cover…

    -- Coping with a further influx of observability data from

    -- Observability for

    -- Data collection for AI agent workflows using

    -- Groundcover's roadmap

    And more!

    Watch on YouTube: youtu.be/wjYj7gskPJA