home.social

#monitorama24 — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #monitorama24, aggregated by home.social.

  1. Gathering data from lots of different sources, providing it to the model then processing the output through post processing steps? That sounds like an asynchronous workflow, USE TRACING ~ @cartermp lauding the benefits of tracing for deploying and iterating AI applications #monitorama #monitorama24 #observability

  2. “Disintegration is a feature not a bug, of asynchronous systems, until you introduce telemetry and monitoring” ~ Johannes Tax from @grafana #monitorama #monitorama24 #observability

  3. Johannes Tax at @grafana describing the pain of distributed tracing in asynchronous systems “Disintegrated telemetry: The pains of monitoring asynchronous workflows” #monitorama #monitorama24 #observability

  4. “Name your metrics, alerts and dashboards with the language you would use if you were having a conversation about the system at lunch, not the cryptic defaults or uuid hostnames” ~ @danslimmon #monitorama #monitorama24 #observability

  5. We’ve all heard the definition for observability, but, who is one? What is involved in determining? ~ @danslimmon presenting “No observability without theory” You need a theory of the system to build valid inferences
    #monitorama #monitorama24 #observability

  6. Julia Thoreson at Bloomberg sharing “Incident Management: Lessons from Emergency Services” breaking down how the lessons learned in emergency services can apply to incident management in technical systems #monitorama #monitorama24 #incidentmanagement

  7. Pete Fritchman’s Takeaways on managing internal services effectively:
    Internal Services impact customers
    Leverage your observability tools
    Talk to your internal customers
    *APPLY SRE PRINCIPLES*

    #Monitorama #monitorama24 #observability #SRE

  8. “New hires are super value able in your internal customer interviews, they actually expect things to work and aren’t bitter yet” ~ Pete Fritchman #Monitorama #monitorama24 #observability #sre

  9. “Treat your internal tooling outages like the most critical production outages, because they’ll always hit when you’re trying to recover from a critical production outage” ~ Pete Fritchman #monitorama #monitorama24 #observability

  10. “The shoemaker’s children have no shoes - why SRE teams must help themselves” Pete Fritchman making the case for investing in watching the watchmen, and techniques for accomplishing it. #monitorama #monitorama24 #observability #sre

  11. Hashmaps to counts work great for small sets, but what happens when you need to count sets larger than memory? You need HyperLogLog or Disjunctive Normal Form (CVM) ~ @phredmoyer #monitorama #monitorama24 #observability

  12. “Use counters to count things” @phredmoyer providing some examples where counting things by processing petabytes of log or trace data is prohibitively expensive and justify spending a bit more on dealing with higher cardinality metrics.
    #monitorama #monitorama24 #observability

  13. Baggage is bad for your relationships, good for your service graphs. @kalyanaj makes the case for an arbitrary key value metadata store (baggage) to propagate through your services to enable controllability and observability use cases. #monitorama #monitorama24 #observability

  14. “Distributed Context Propagation: How you can use it to Improve Observability, Test in Production, and more...” @kalyanaj explaining the importance of context in interpreting observability data #monitorama #monitorama24

  15. “Every team has a different answer for discovering what the dependencies of their services are, some say firewall rules, some look at network flows, tracing gives us a uniform answer to this” ~ Sudeep Kumar #Monitorama #monitorama24 #observability #tracing

  16. “We have so many microservices, people are always looking for an excuse to create more, and no one knows which ones they’re already dependent on” ~ Sudeep Kumar from Salesforce with “Tracing Service Dependencies at Salesforce”

    #Monitorama #monitorama24 #observability #tracing

  17. “Low cardinality in Prometheus and low cardinality in Clickhouse are vastly different things” - @colind in his talk “Experiments in Backing Prometheus with Clickhouse” #Monitorama #Monitorama24 #observability

  18. ”All of the observability infrastructure in the world is just noise unless someone spends time developing an understanding of your system” ~ David Caudill There’s likely a 10:1 investment return when spending that time up front versus in the middle of the night when you’re troubleshooting an outage.

    #monitorama24 #monitorama #constructionism