home.social

#monitorama — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #monitorama, aggregated by home.social.

  1. More awesomeness from hashtag #monitorama@austinlparker combines excellent storytelling with his first-hand knowledge of the creation of #opentelemetry to give us a better understanding of what an open standard could and should be: youtu.be/yJFYNTq3uCs?si=tanX7y

  2. I'm finally getting to catch up on the #monitorama talks from this year. First up is this deeply informative talk from Colin Douch on @Cloudflare's adventures in using #ClickHouse to replace #Prometheus:

    youtube.com/watch?v=-wIjq_dyzD

  3. Please enjoy in full my Monitorama 2024 presentation, "No Observability Without Theory"! You'll get:

    * A bunch of made-up medical jargon
    * Tips on facilitating a shared mental model in your graph dashboards and incident response processes
    * 24% off my Leading Incidents video course (d2e.engineering/leading-incide) with the discount code at the end!

    youtube.com/watch?v=aJkynVk1k6 #observability #o11y #monitorama #devops #ops #softwareengineering #sre

  4. @reyjrar Agreed, this is a key difference and really supports building a “tribe/community” of practitioners. I’ve spoken at #Monitorama at least five times and attended without speaking several times as well (like this year) because I want to be part of it.

  5. In single track conferences, the speakers see other speakers' talks and they reference those talks in their talks. This unwittingly creates links from different ideas making both of those talks more memorable.

    This year, almost every speaker referenced one of their peers. A few even called back to talks from past #monitorama events.

    This makes the event more fun, but also makes the lessons learned signifcantly more impactful!

  6. Heading home from #Monitorama now. It was great, as always. It's great on purpose. @obfuscurity puts a lot of effort into making the conference be the conference he always wanted. We are all benefactors of his discerning tastes.

    See this thread for some of my favorite things about #Monitorama

  7. Gathering data from lots of different sources, providing it to the model then processing the output through post processing steps? That sounds like an asynchronous workflow, USE TRACING ~ @cartermp lauding the benefits of tracing for deploying and iterating AI applications #monitorama #monitorama24 #observability

  8. “Disintegration is a feature not a bug, of asynchronous systems, until you introduce telemetry and monitoring” ~ Johannes Tax from @grafana #monitorama #monitorama24 #observability

  9. Johannes Tax at @grafana describing the pain of distributed tracing in asynchronous systems “Disintegrated telemetry: The pains of monitoring asynchronous workflows” #monitorama #monitorama24 #observability

  10. sage words from @danslimmon on how to better develop you/your team's #o11y theory by getting practical practice #monitorama #observability

  11. “Name your metrics, alerts and dashboards with the language you would use if you were having a conversation about the system at lunch, not the cryptic defaults or uuid hostnames” ~ @danslimmon #monitorama #monitorama24 #observability

  12. We’ve all heard the definition for observability, but, who is one? What is involved in determining? ~ @danslimmon presenting “No observability without theory” You need a theory of the system to build valid inferences
    #monitorama #monitorama24 #observability

  13. Julia Thoreson at Bloomberg sharing “Incident Management: Lessons from Emergency Services” breaking down how the lessons learned in emergency services can apply to incident management in technical systems #monitorama #monitorama24 #incidentmanagement

  14. Pete Fritchman’s Takeaways on managing internal services effectively:
    Internal Services impact customers
    Leverage your observability tools
    Talk to your internal customers
    *APPLY SRE PRINCIPLES*

    #Monitorama #monitorama24 #observability #SRE

  15. “New hires are super value able in your internal customer interviews, they actually expect things to work and aren’t bitter yet” ~ Pete Fritchman #Monitorama #monitorama24 #observability #sre

  16. “Treat your internal tooling outages like the most critical production outages, because they’ll always hit when you’re trying to recover from a critical production outage” ~ Pete Fritchman #monitorama #monitorama24 #observability

  17. “The shoemaker’s children have no shoes - why SRE teams must help themselves” Pete Fritchman making the case for investing in watching the watchmen, and techniques for accomplishing it. #monitorama #monitorama24 #observability #sre

  18. Hashmaps to counts work great for small sets, but what happens when you need to count sets larger than memory? You need HyperLogLog or Disjunctive Normal Form (CVM) ~ @phredmoyer #monitorama #monitorama24 #observability

  19. “Use counters to count things” @phredmoyer providing some examples where counting things by processing petabytes of log or trace data is prohibitively expensive and justify spending a bit more on dealing with higher cardinality metrics.
    #monitorama #monitorama24 #observability

  20. "Shoutout to Brad, who's still using Perl" #Monitorama

  21. Baggage is bad for your relationships, good for your service graphs. @kalyanaj makes the case for an arbitrary key value metadata store (baggage) to propagate through your services to enable controllability and observability use cases. #monitorama #monitorama24 #observability

  22. “Distributed Context Propagation: How you can use it to Improve Observability, Test in Production, and more...” @kalyanaj explaining the importance of context in interpreting observability data #monitorama #monitorama24

  23. “Every team has a different answer for discovering what the dependencies of their services are, some say firewall rules, some look at network flows, tracing gives us a uniform answer to this” ~ Sudeep Kumar #Monitorama #monitorama24 #observability #tracing

  24. “We have so many microservices, people are always looking for an excuse to create more, and no one knows which ones they’re already dependent on” ~ Sudeep Kumar from Salesforce with “Tracing Service Dependencies at Salesforce”

    #Monitorama #monitorama24 #observability #tracing

  25. “Low cardinality in Prometheus and low cardinality in Clickhouse are vastly different things” - @colind in his talk “Experiments in Backing Prometheus with Clickhouse” #Monitorama #Monitorama24 #observability

  26. drinking our own 🥂 with #elastic observability (it's not dog food): when something is up in your app and you let the automatic correlation do the work for you
    PS: lots of talk about cost, OTel,... at #monitorama — not so much actually getting value out of all the labor...

  27. “If we’re being honest, we all, generally, have a visceral reaction to people trying to get us to adopt new tools” ~ Noa Levi describing how a forced migration and the associated conversations between the engineering and observability teams facilitated adoption without adoption as a stated goal #Monitorama #monitorama24 #observability

  28. Noa Levi presenting “How we tricked engineers into utilizing distributed tracing” on her experience getting tracing adopted at Strava

    #Monitorama #Monitorama24 #Observability

  29. “a little bit of work in reducing latency at the beginning of the data pipeline can provide orders of magnitude lower cost at the query end of the pipeline” ~ @djosephsen
    #Monitorama #monitorama24 #observability

  30. ”We need to have a word about this ‘watershed’ metaphor for data” - @djosephsen presenting “Pugs, Poe's and pipelines; An engineering perspective on big-data streams for operational telemetry”
    #Monitorama #Monitorama24 #data

  31. Thai Wood breaking down the observability lessons from an aerospace story, a time-honored tradition here at #Monitorama “The complexity of success and failure: the story of the Gimli Glider”

    #Monitorama24 #Observability

  32. Ferris Ellis “Is Your Kernel Being Honest? Understanding & measuring low level bottlenecks” breaking down fundamental performance diagnosis in the Linux OS

    #Monitorama #Monitorama24 #Observability

  33. Dave McAllister presenting “The subtle art of misleading with Statistics”
    “If you can’t find the bias in your sampling, look harder”

    #Monitorama #Monitorama24 #Statistics

  34. David Gildeh at Netflix with “The Observability Data Lake - 1 year on” incepting an update to his talk from last year

    #Monitorama #Monitorama24 #Observability

  35. @austinlparker introduces the ARS(e) theory, or CAP thereom for mere mortals, you don’t even get one:

    #Monitorama #monitorama24 #otel

  36. “It’s all just structured data, but it’s boring without aggregation” - @austinlparker on the core values of the #OTEL project, and why the wait for consensus on interoperable observability is worth the wait.

    #monitorama #monitorama24 #OpenTelemetry

  37. “The Hater’s Guide to Open Telemetry” with @austinlparker is up next, we made it three talks into #monitorama24 before the @xkcd Competing Standards comic made an appearance.

    #Monitorama #otel

  38. ”All of the observability infrastructure in the world is just noise unless someone spends time developing an understanding of your system” ~ David Caudill There’s likely a 10:1 investment return when spending that time up front versus in the middle of the night when you’re troubleshooting an outage.

    #monitorama24 #monitorama #constructionism

  39. “Many times addressing DEBUG log noise is just a matter of education” - Alex Hidalgo “There are well defined POSIX standards for the various log levels (WARN, ERROR, DEBUG), many junior team members just aren’t aware of them”

    #monitorama24 #monitorama #AlexHidalgo

  40. First talk today is “Logs are good, actually” by Alex Hidalgo, “not all systems look like easily traced, high query per second systems”

    #Monitorama24 #monitorama #AlexHidalgo

  41. If you’re at #Monitorama in #PDX come meet me at the #Chronosphere booth!

    I’m demoing a really neat feature, the Metrics Usage Analyzer, that you can use to get insight into your data usage. This allows you to identify unused data, make decisions, and understand impact.

  42. 🎥 Events and CFPs

    #FOSDEM #ConfigManagementCamp #GitLabContributorDays - see you there!
    Kubernetes Community Days Amsterdam - see you there!
    #KubeConEU - see you there!

    Chaos Carnival 2023, Cloud Native Rejekts, #OSSummitNA, Kubernetes Community Days Zürich, #Kubernetes Community Days Munich, #CloudLand, #monitorama

    opsindev.news/archive/2023-01-