home.social

#binarydiffing — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #binarydiffing, aggregated by home.social.

  1. The code is also published (in github) already and #Diaphora now can use an already trained model to try to improve binary diffing results (matching). I haven't made yet a new release just yet as these changes are considered a bit experimental for now.

    The datasets and tools for training and testing are here: github.com/joxeankoret/diaphor
    And Diaphora, is here: github.com/joxeankoret/diaphor

    #Diaphora #BinaryDiffing #Bindiffing #ReverseEngineering #MachineLearning

  2. The code is also published (in github) already and #Diaphora now can use an already trained model to try to improve binary diffing results (matching). I haven't made yet a new release just yet as these changes are considered a bit experimental for now.

    The datasets and tools for training and testing are here: github.com/joxeankoret/diaphor
    And Diaphora, is here: github.com/joxeankoret/diaphor

    #Diaphora #BinaryDiffing #Bindiffing #ReverseEngineering #MachineLearning

  3. The code is also published (in github) already and #Diaphora now can use an already trained model to try to improve binary diffing results (matching). I haven't made yet a new release just yet as these changes are considered a bit experimental for now.

    The datasets and tools for training and testing are here: github.com/joxeankoret/diaphor
    And Diaphora, is here: github.com/joxeankoret/diaphor

    #Diaphora #BinaryDiffing #Bindiffing #ReverseEngineering #MachineLearning

  4. The code is also published (in github) already and #Diaphora now can use an already trained model to try to improve binary diffing results (matching). I haven't made yet a new release just yet as these changes are considered a bit experimental for now.

    The datasets and tools for training and testing are here: github.com/joxeankoret/diaphor
    And Diaphora, is here: github.com/joxeankoret/diaphor

    #Diaphora #BinaryDiffing #Bindiffing #ReverseEngineering #MachineLearning

  5. The code is also published (in github) already and #Diaphora now can use an already trained model to try to improve binary diffing results (matching). I haven't made yet a new release just yet as these changes are considered a bit experimental for now.

    The datasets and tools for training and testing are here: github.com/joxeankoret/diaphor
    And Diaphora, is here: github.com/joxeankoret/diaphor

    #Diaphora #BinaryDiffing #Bindiffing #ReverseEngineering #MachineLearning

  6. This is not at all my own idea and this is, basically, the only thing that academia researches as of today: almost every single academic paper published in the last years talking about binary diffing (or, as academia calls it "Binary Code Similarity Analysis") is based on "machine learning" techniques.

    Some popular academic examples: DeepBinDiff or BindiffNN. Don't worry if you don't know them. Nobody uses them. At all.

    #BinDiff #BinaryDiffing #BinaryCodeSimilarityAnalysis

  7. This is not at all my own idea and this is, basically, the only thing that academia researches as of today: almost every single academic paper published in the last years talking about binary diffing (or, as academia calls it "Binary Code Similarity Analysis") is based on "machine learning" techniques.

    Some popular academic examples: DeepBinDiff or BindiffNN. Don't worry if you don't know them. Nobody uses them. At all.

    #BinDiff #BinaryDiffing #BinaryCodeSimilarityAnalysis

  8. This is not at all my own idea and this is, basically, the only thing that academia researches as of today: almost every single academic paper published in the last years talking about binary diffing (or, as academia calls it "Binary Code Similarity Analysis") is based on "machine learning" techniques.

    Some popular academic examples: DeepBinDiff or BindiffNN. Don't worry if you don't know them. Nobody uses them. At all.

    #BinDiff #BinaryDiffing #BinaryCodeSimilarityAnalysis

  9. This is not at all my own idea and this is, basically, the only thing that academia researches as of today: almost every single academic paper published in the last years talking about binary diffing (or, as academia calls it "Binary Code Similarity Analysis") is based on "machine learning" techniques.

    Some popular academic examples: DeepBinDiff or BindiffNN. Don't worry if you don't know them. Nobody uses them. At all.

    #BinDiff #BinaryDiffing #BinaryCodeSimilarityAnalysis

  10. This is not at all my own idea and this is, basically, the only thing that academia researches as of today: almost every single academic paper published in the last years talking about binary diffing (or, as academia calls it "Binary Code Similarity Analysis") is based on "machine learning" techniques.

    Some popular academic examples: DeepBinDiff or BindiffNN. Don't worry if you don't know them. Nobody uses them. At all.

    #BinDiff #BinaryDiffing #BinaryCodeSimilarityAnalysis

  11. It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.

    An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.

    #BinaryDiffing #BinDiffing #BinaryCodeSimilarityAnalysis

  12. It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.

    An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.

    #BinaryDiffing #BinDiffing #BinaryCodeSimilarityAnalysis

  13. It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.

    An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.

    #BinaryDiffing #BinDiffing #BinaryCodeSimilarityAnalysis

  14. It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.

    An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.

    #BinaryDiffing #BinDiffing #BinaryCodeSimilarityAnalysis

  15. It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.

    An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.

    #BinaryDiffing #BinDiffing #BinaryCodeSimilarityAnalysis

  16. Fun Reverse Engineering problem du jour. A compilation unit is a set of functions. Cool. However, a function might belong to one or many compilation units.

    For example, in #Diaphora, I used to think that once I have a compilation unit name for a function, that function belongs to just that one CU. However, if a function from, for example, a header file is in-lined inside a function, what compilation unit does that function belong to?

    #ReverseEngineering #BinaryDiffing #BinDiffing

  17. Dear everyone in the academia using "Machine Learning" for Binary Code Similarity Analysis (ie, bindiffing): AI is bad for anything that requires exact results. It will generate a huge amount of false positives mixed with a varying degree of similar results and is pretty hard to understand its output.

    #bindiffing #BinaryDiffing #ProgramDiffing #MachineLearning #BCSA #ArtificialIntelligence

  18. @joxean I think a fine-grained callgraph including call site information would break this tie.

    Match the call sites in the two binaries, using instruction-level comparisons. You can then differentiate between the two call edges, based on which call site they are from.

  19. The support for finding fixed signedness issues in #Diaphora is working (to highlight potentially fixed vulnerabilites):

    #BinaryDiffing #PatchDiffing

  20. Me every time I have a "new" idea for doing #BinaryDiffing in #Diaphora with algorithms based on graph theory:

  21. Any cool bug on this Patch Tuesday? Anything cool to diff with #Diaphora and enhance the ability to try to find patched vulnerabilities?

    #PatchTuesday #PatchDiffing #BinaryDiffing #BinDiffing

  22. Did you know that #Diaphora detects patch diffing sessions and tries to help finding where vulnerabilities were fixed? Here are some examples for CVE-2020-1350 and CVE-2023-28231.

    #patchdiffing #binarydiffing #bindiffing #vulnerabilityresearch #vulndev

  23. Also, #SymbolicExecution of even small #binaries is very slow and would only, probably, help for comparing binaries for the same (or compatible) architecture. And in order to compare binaries for the same architectures you have a myriad of different, not terribly slow, ways for doing #BinDiffing.

    #BinaryDiffing

  24. @joxean, really enjoying the presentation of a world-class reference in #BinaryDiffing. Absolute respect for the amount of work behind the scenes in terms of side-technical challenges such as multiplatform testing. Even without deepening on the problem addressed, I can still appreciate the huge amount of time spent, simply, making things work as expected. #Respect

  25. #Bindiffing with #Diaphora CVE-2023-28231. As explained in the linked blog from @thezdi, the vulnerability has been fixed by checking that the number of relay forward messages in "ProcessRelayForwardMessage()" is not bigger or equal than 32 (0x20), as shown in the following pseudo-code diffing:

    #BinDiffing #BinaryDiffing

    zerodayinitiative.com/blog/202

  26. CW: The answer

    Believe it or not, ordering by address and choosing the matches in the order they appear in the binaries, as compilers and linkers usually put functions in the same order. Forget about complex math stuff. Order by address, choose the first corresponding match.

    #Diaphora #BinDiffing #BinaryDiffing

  27. So, let's say that we have 2 functions in binary A matching 2 functions in binary B *but* both A functions and B functions have the exact same score for the 4 matches (and the same callers and callees). This looks like a complex match to resolve, right?

    So, what do you think is (apparently) the best and simplest method in #BinaryDiffing to determine which match is the appropriate one?

    #Diaphora #BinDiffing #BinaryDiffing

  28. If there is a constant in #BinaryDiffing that I saw since the very first steps of #Diaphora and I find it time and time again, is that heuristics based on graph theory, exclusively, are always the ones that cause more false positives.

    Heuristics based on different and usually more simplistic concepts, like 2 functions having the same (big enough) pseudo-code or assembly, or matching the callees using an already existing match, cause a lot less false positives.

  29. What are diffing acceptable times, in your opinion, for medium to big binaries (ie, diffing 2 kernels, something like 70k functions on each database)?
    #bindiffing #binarydiffing #diaphora

  30. 👀 Does all binary code look the same to you? Upgrade your #BinaryDiffing skills and automate your diffing tasks with Joxean Koret’s (@joxean) #infosec #training on Advanced Binary Diffing with #Diaphora!

    🎟️ ringzer0.training/trainings/ad

  31. @joxean The order they are called in is a good heuristic, but you probably have enough information to match code structures beyond that.

    You would need to use the basic block structure of the caller to differentiate call sites, since "first" is only trivial in a linear function with no branches.

    You're decompiling, which should allow you to match the call sites in the AST or an intermediate representation (IR), independent of the arch.

  32. 👀 Does all binary code look the same to you? Upgrade your #BinaryDiffing skills and automate your diffing tasks with Joxean Koret’s (@joxean) #training on Advanced Binary Diffing with #Diaphora!

    🎟️ ringzer0.training/trainings/ad

  33. 👀 Does all binary code look the same to you? Upgrade your #BinaryDiffing skills and automate your diffing tasks with @joxean’s training on Advanced Binary Diffing with #Diaphora!

    🎟️ ringzer0.training/trainings/ad

  34. 👀 Can’t spot the difference? Upgrade your #BinaryDiffing skills and automate your diffing tasks with Joxean Koret’s training on Advanced Binary Diffing with #Diaphora!

    🎟️ ringzer0.training/trainings/ad

    🧵 6/10

  35. Dear everyone in the academia using "Machine Learning" for Binary Code Similarity Analysis (ie, bindiffing): AI is bad for anything that requires exact results. It will generate a huge amount of false positives mixed with a varying degree of similar results and is pretty hard to understand its output.

    #bindiffing #BinaryDiffing #ProgramDiffing #MachineLearning #BCSA #ArtificialIntelligence