home.social

Search

41 results for “thekiwisre”

  1. This week on #SlightReliability I had the honour of chatting with @honeycombio Field CTO @lizthegrey about the role of developer advocacy in #SRE.

    #DevRel

    🗣️ What is developer relations (DevRel)?
    🎵 Is DevRel and developer advocacy the same thing?
    💰 What value does developer advocacy add to organisations and the community?
    📖 Storytelling and the power of visuals
    🥇 ...and some tips on getting SRE traction in your organisation!

    youtube.com/watch?v=loVZWgVpnF

  2. This week on Slight Reliability @paigerduty is back! This time we dive into sampling of distributed traces. We cover...

    🕸️ What is distributed tracing? What are spans?
    🧪 What is sampling? And why do we need it?
    🤯 What constitutes an interesting trace?
    🦘 No sampling VS head based VS tail based
    👩🏾‍🔬 Non-traditional use cases of tracing such as CI/CD
    🧻 The power of napkin math to make informed decisions
    ...and much more.

    #SRE #observability #SlightReliability

    youtube.com/watch?v=GYwjeE9reb

  3. This week on #SlightReliability I chat with Dr. Vlad Ukis (author of the book "Establishing SRE Foundations" and head of R&D at Siemens Healthineers) about implementing #SRE.

    One of my big takeaways from the conversation was the power of selling SRE practices internally, showcasing success, and the "SRE marketing funnel". The social side of SRE is overlooked but very important.

    Also in this episode: SLOs and how to get started with them.

    #SLO #SLI #DevOps

    youtube.com/watch?v=PPiCm_k03H

  4. This week on #SlightReliability Amin Astaneh from Certo Modo is back! This time we discuss his #sre (production engineering) experiences at #meta. We cover:

    🏢 What it's like interviewing for big tech
    🦶 Voting with your feet (as an incentive to prioritise reliability)
    💍 SRE engagement models
    🏅 Socialising SRE wins to grow the practice (the sales part of SRE)
    🇹 Wide VS deep skillsets in different sized orgs
    🚒 The time Amin burned down a data centre...

    (and much more!)

    youtube.com/watch?v=YIptrW0SZa

  5. This week on Slight Reliability I chat to Eric Schabell from Chronosphere about how dashboards fit into modern #observability.

    We discuss the Know > Triage > Understand process, the similarities between good documentation and good dashboards, leveraging dashboards to empower less experienced #SREs to be on call, and much more.

    #sre #monitoring

    youtube.com/watch?v=-annvqpYCA

  6. This week on Slight Reliability I revisit the concept of the single pane of glass (#SPOG) with Jamie Allen from EPAM Systems and Adam Kinniburgh from SquaredUp.

    👁️ What is a SPOG supposed to be?
    🌏 Can it work at massive scale?
    💼 Is it a tool for engineers or executives?
    🤖 What is the future of dashboards in the #AI era?

    (and much more) #SRE #observability #dashboard #monitoring #SlightReliability

    youtube.com/watch?v=H5bsC8CvQh

  7. This week on #SlightReliability I drill into the myths and truths about #AI with Kyle Forster from RunWhen.

    Can we bring single player mode to pair programming using AI? Are IT jobs at risk of being displaced? How (as consumers) do we make informed decisions about purchasing products with AI? (and of course, much more).

    I hope you enjoy my drawing of Vision (from the MCU)... it took me quite some time :)

    youtube.com/watch?v=CvsljSP1Xf

  8. This week on Slight Reliability I had the honour of interviewing Courtney Nash about why mean time to recover (#MTTR) is an unhelpful metric, what she learned by analysing 10+ incident reports, and much more.

    🕵🏽‍♀️ Instead of MTTR, let's focus on learning from incidents, observing patterns and themes, involving leadership, and adding an "accident investigator" lens after the fact to enhance the learning.

    #SRE #DevOps #incidents #SlightReliability

    youtube.com/watch?v=k-tuE9aMg3

  9. This week on Slight Reliability I had the honour of interviewing Courtney Nash about why mean time to recover () is an unhelpful metric, what she learned by analysing 10+ incident reports, and much more.

    🕵🏽‍♀️ Instead of MTTR, let's focus on learning from incidents, observing patterns and themes, involving leadership, and adding an "accident investigator" lens after the fact to enhance the learning.

    youtube.com/watch?v=k-tuE9aMg3U

  10. This week on Slight Reliability I had the honour of interviewing Courtney Nash about why mean time to recover (#MTTR) is an unhelpful metric, what she learned by analysing 10+ incident reports, and much more.

    🕵🏽‍♀️ Instead of MTTR, let's focus on learning from incidents, observing patterns and themes, involving leadership, and adding an "accident investigator" lens after the fact to enhance the learning.

    #SRE #DevOps #incidents #SlightReliability

    youtube.com/watch?v=k-tuE9aMg3

  11. This week on #SlightReliability I chat with Martin Thwaites from Honeycomb.io about #observability during #development (#ODD). Some of my takeaways:

    💻 How observability in development frees up developers to spend less time debugging and more time writing code.

    🤖 That manual instrumentation is where the power is.

    💰 Keeping the cost of observability data down through a combination of head and tail based sampling. "Keeping every span of trace data is irresponsible".

    youtube.com/watch?v=dsLVtqILbH

  12. This week on #SlightReliability... how do we prevent #observability from only generating value for a small set of engineers? How do executives, product managers, and other stakeholders leverage its power?

    youtube.com/watch?v=rH0U1sKr-T

    (You can also listen to Slight Reliability via most podcast platforms, or check out slightreliability.com/)

  13. With recent EU law changes (along with global sentiment) I see #GreenEngineering becoming an emerging field of work. I can't think of a better group of skills to build more efficient and sustainable software systems than #PerformanceEngineers.

    Has anyone been involved in projects where carbon emissions or sustainability was part of the formal requirements? Or have you been involved in monitoring the environment impact of the systems you build and operate?

  14. Unfortunately there is no #SlightReliability episode this week... So as is tradition, I have a haiku for you. #sre

  15. Who else is going to be at AWS Summit in London on June 7th? Would be great to meet some of the community in person. #awssummit #aws #slightreliability aws.amazon.com/events/summits/

  16. This week on Slight Reliability Bruce Cullen is back!

    We chat about his experience at #kubeconeu and topics such as:
    - Does #OpenTelemetry live up to the hype?
    - Is #GreenEngineering a future career path for performance engineers?
    - What is eBPF and cilium?
    ...and much more.

    If you've never been to KubeCon before, Bruce gives a great explanation of what it was like, how it was structured, what it's all about, etc.

    youtube.com/watch?v=xysUezVVok

  17. Struggling to get #SRE traction within engineering teams? Gwen Berry and I share our "reliability benchmarking" approach to start the SRE conversation as part of #SLOconf here: youtube.com/watch?v=pGL69abT7r

  18. Have any #SREs / reliability engineers used value stream mapping to improve operational processes? How did it go? Would you be willing to come on the Slight Reliability podcast to discuss it?

  19. This week on Slight Reliability I'm launching a fresh new sound! In the episode I define #toil, discuss how we can (or can't) monitor it, and how we can reduce it. I also take pot shots at the change advisory board (#CAB). #sre

    Shout outs to Steve McGhee, Dom Finn, and Shea Stewart.

    youtube.com/watch?v=6bKUTPn7Px

  20. I've had some chats #SREs in the industry recently that reminded me of the importance of #culture. If teams (and orgs) are self-reflecting, actively seeking out areas that aren't going so well, and approaching them with curiosity and a willingness to experiment... you *almost* can't go wrong.

    If communicating things that aren't going well is met with defensiveness, indifference, or it's not safe to say anything unless it's "positive" (being nice, not kind)... how can we improve?

  21. Reducing #toil is an important part of #SRE, but do you measure how much toil work your engineers are doing? If so, *how* do you measure it?

    Confession: I've never measured it or been part of measuring it (yet). I genuinely want to know how teams are measuring it in a non-intrusive and accurate way.

    (PS. I drew this art using a the trackpad on my laptop and I'm unjustifiably proud of it...)

  22. This week on Slight Reliability I chat to Ivan Merrill about his experiences implementing #observability in the real world. We discuss making observability part of onboarding, discussing risk to get leadership buy-in, inviting over inflicting practices, and much more.

    #sre #SlightReliability #reliability

    youtube.com/watch?v=6osDq8DSxc

  23. Yesterday #SlightReliability reached 1k subscribers on YouTube! Just wanted to say thank you to everyone who has listened and joined in the discussion about #sre!

  24. It was a pleasure to chat with Chris and Nate about #performanceengineering and SRE. It's got a bit of everything... the way performance engineers are pigeon holed into load testers, getting to the essence what #SRE is and how to apply the concepts in any organisation.

    ttcglobal.wistia.com/medias/7q

  25. This week on #SlightReliability... what is "insight" in #observability? Are tool vendors lying to us about being able to provide it? Is it science? Art? Or magic? #sre youtube.com/watch?v=i2GFEobj2g

  26. This week on #SlightReliability I reminisce from my #performancetesting days when I used to analyse complete sets of raw data using scatterplots, and ponder how we could apply this in #observability #sre youtube.com/watch?v=f1GSGWGUEG

  27. Last week on #SlightReliability I chated to Paige Cruz from Chronosphere about cognitive overload in #SRE. We chated about how SREs are often used as the Swiss army knives of the IT department, how as humans our RAM is maxed out, why you shouldn’t give your team a name like “The Lobsters”, and a whole lot more.

    This was one of my very favourite interviews I've ever done. youtube.com/watch?v=CDhGgnIGGQ

  28. This week on #SlightReliability I talk about how I think #observability promises more than what we're getting. I argue that it needs to look at more than technology in order to help us negotiate the ocean of chaos in the Digital Era. #sre youtube.com/watch?v=da3o2QSxVe

  29. This week on #SlightReliability... what do we do with all our #telemetry data? Should we put it all in a data lake? Or is there another way we can pull insight together? #sre #observability youtube.com/watch?v=Mv55p1kXz6

  30. My second official #SlightReliability blog, focusing on my #SRE takeaways from #reinvent. I explore serverless, observability data lakes, topologies (technology maps), FinOps, and more. Oh, and lots of #mspaint art! squaredup.com/blog/slight-reli