#diaphora — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #diaphora, aggregated by home.social.
-
The code is also published (in github) already and #Diaphora now can use an already trained model to try to improve binary diffing results (matching). I haven't made yet a new release just yet as these changes are considered a bit experimental for now.
The datasets and tools for training and testing are here: https://github.com/joxeankoret/diaphora-ml
And Diaphora, is here: https://github.com/joxeankoret/diaphora#Diaphora #BinaryDiffing #Bindiffing #ReverseEngineering #MachineLearning
-
Here are the slides of my "Simple Machine Learning Techniques for Binary Diffing (with Diaphora)" talk given at the @44CON conference last week:
https://github.com/joxeankoret/diaphora-ml/blob/main/docs/diaphora-ml-techniques-44con-final.pdf
#44con #Diaphora #MachineLearning #ReverseEngineering #BinaryDiffing
-
I have just stumbled upon this post diffing some windows driver:
https://www.crowdfense.com/windows-wi-fi-driver-rce-vulnerability-cve-2024-30078/
Why use #BinDiff and see this [first picture] when you can use #Diaphora and see this [second picture]?
Of course, feel free to use whatever tool you prefer but, what's the point of doing more work? Diaphora finds out that only 2 functions are interesting for patch diffing and shows exactly, in the pseudo-code, what new chunk of code was added and what new function is being called. Diffing decompilation.
-
I will be speaking and doing a #Diaphora workshop at this year's #44CON conference (@44CON), in London.
https://44con.com/44con-2024-talks-and-workshops/ -
Dear Machine Learning people: when a problem can be solved using both a regressor and a classifier, which method would you choose? Or you simply try both and then choose whatever worked better? Any rule or set of rules to try to determine which method should work better?
-
Dear Machine Learning + REversing people, one question: I'm working on a model to find better matches in #Diaphora. It's implemented in this way:
- 1st, I find good matches in the 2 binaries being compared with heuristics proven to be working.
- Using these good (and also bad) matches, I build a dataset for the current 2 binaries, and I train a classifier.The model is later on used to predict if future matches are good or bad.
-
Any cool bug in Microsoft's February 2024 Patch Tuesday??
-
It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.
An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.
-
It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.
An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.
-
It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.
An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.
-
It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.
An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.
-
It's very sad, but it's always a damn waste of time reading academic research about binary diffing or, as it's called at the academia, about binary code similarity analysis. It's either all fairytales that cannot be proved or, plainly, false and/or wrong.
An example? One paper that I have re-read today says that #BinDiff and #Diaphora are mono-architecture and totally discard these tools for the paper. LOL.
-
I’ve finally played with the new #BinDiff and even though it requires just a few minutes to diff what #Diaphora is diffing for like 12 hours the output quality is just bad imo. I see some things that are clearly false positives and missing simple stuff like pseudo code or assembly diffing (I know it’s not too precise or even meaningful when there are more or less large changes, but it’s good for visualizing small patches) and buggy interface bring a lot of pain into using it. I sure hope I’m using it wrong and someone could correct me
-
Fun Reverse Engineering problem du jour. A compilation unit is a set of functions. Cool. However, a function might belong to one or many compilation units.
For example, in #Diaphora, I used to think that once I have a compilation unit name for a function, that function belongs to just that one CU. However, if a function from, for example, a header file is in-lined inside a function, what compilation unit does that function belong to?
-
The support for finding fixed signedness issues in #Diaphora is working (to highlight potentially fixed vulnerabilites):
-
Me every time I have a "new" idea for doing #BinaryDiffing in #Diaphora with algorithms based on graph theory:
-
Any cool bug on this Patch Tuesday? Anything cool to diff with #Diaphora and enhance the ability to try to find patched vulnerabilities?
-
In case you didn't know, yes, #Diaphora tries to find compilation units and use them for heuristics, for matching, etc... The current implementation uses both CodeCut's LFA (Local Function Affinity) implementation, for finding the potential boundaries of CUs, and an IDA plugin of mine, #IDAMagicStrings (https://github.com/joxeankoret/idamagicstrings), for finding the names and also for coalescing compilation units (when it makes sense).
-
Did you know that #Diaphora detects patch diffing sessions and tries to help finding where vulnerabilities were fixed? Here are some examples for CVE-2020-1350 and CVE-2023-28231.
#patchdiffing #binarydiffing #bindiffing #vulnerabilityresearch #vulndev
-
Later on, I realised this technology could also be used to fuzzily compare texts to determine if they look similar, I tested this technology in #Diaphora as a mean to determine if two pseudo-codes are 'similar' by comparing the 3 fuzzy hashes #DeepToad calculates and it turns out it worked much better than other approaches I tested (and better than expected!) so, finally I integrated it into the public version of Diaphora.
-
Today I realised that the oldest technology developed by me integrated into #Diaphora dates from 2009.
In case you are curious, it's #DeepToad, a Python library for doing fuzzy hashing. This simplistic library calculates a set of 3 different hashes using a configurable block size (in opposite to, say, ssdeep, that doesn't work for this).
https://github.com/joxeankoret/deeptoad
#FuzzyHashing
#DeepToad
#Diaphora
#BinDiffing
#ProgramDiffing
#BCSA -
#Bindiffing with #Diaphora CVE-2023-28231. As explained in the linked blog from @thezdi, the vulnerability has been fixed by checking that the number of relay forward messages in "ProcessRelayForwardMessage()" is not bigger or equal than 32 (0x20), as shown in the following pseudo-code diffing:
-
CW: The answer
Believe it or not, ordering by address and choosing the matches in the order they appear in the binaries, as compilers and linkers usually put functions in the same order. Forget about complex math stuff. Order by address, choose the first corresponding match.
-
So, let's say that we have 2 functions in binary A matching 2 functions in binary B *but* both A functions and B functions have the exact same score for the 4 matches (and the same callers and callees). This looks like a complex match to resolve, right?
So, what do you think is (apparently) the best and simplest method in #BinaryDiffing to determine which match is the appropriate one?
-
If there is a constant in #BinaryDiffing that I saw since the very first steps of #Diaphora and I find it time and time again, is that heuristics based on graph theory, exclusively, are always the ones that cause more false positives.
Heuristics based on different and usually more simplistic concepts, like 2 functions having the same (big enough) pseudo-code or assembly, or matching the callees using an already existing match, cause a lot less false positives.
-
@themoep
Thank you very much for the explanation! That makes sense.In case you are curious, the idea is to try to build a better matches scoring function for #BinDiffing with #Diaphora. Ie: given 2 functions in 2 binaries determine how close they are and generate a ratio.
So, with what you say, my guess is that I have to train more than anything with bad results, as most of the times such a function to score matches is going to see false matches.
-
@gsuberland I'm looking for a name for this #BinaryDiffing heuristic for #Diaphora
-
What are diffing acceptable times, in your opinion, for medium to big binaries (ie, diffing 2 kernels, something like 70k functions on each database)?
#bindiffing #binarydiffing #diaphora -
👀 Does all binary code look the same to you? Upgrade your #BinaryDiffing skills and automate your diffing tasks with Joxean Koret’s (@joxean) #infosec #training on Advanced Binary Diffing with #Diaphora!
🎟️ https://ringzer0.training/trainings/advanced-binary-diffing-with-diaphora.html
-
@joxean I think a fine-grained callgraph including call site information would break this tie.
Match the call sites in the two binaries, using instruction-level comparisons. You can then differentiate between the two call edges, based on which call site they are from.
-
We’ve just published a new Plugin Focus blog post! Joxean Koret (@joxean) from Activision introduces his binary diffing plugin #Diaphora. Read more 🌐 https://hex-rays.com/blog/plugin-focus-diaphora/?utm_source=Social-Media-Post&utm_medium=Mastodon&utm_campaign=Plugin-Focus-diaphora
-
👀 Does all binary code look the same to you? Upgrade your #BinaryDiffing skills and automate your diffing tasks with Joxean Koret’s (@joxean) #training on Advanced Binary Diffing with #Diaphora!
🎟️ https://ringzer0.training/trainings/advanced-binary-diffing-with-diaphora.html
-
So, continuing my rant about academic research in the #bindiffing area and not releasing required stuff: In one paper they say that 2 malware samples aren't properly diffed by both #Diaphora and #BinDiff, so I have tried to search for the samples to do the diffing myself and see why, if at all, it fails. There is no dataset or sample hashes anywhere, only a set of assembly instructions for a specific basic block... #Fail
-
If I were to take a decision to continue the development of #Diaphora based on what academic research in the #BinDiffing area says, I should stop because almost every academic paper I read considers their authors already solved the problem and they even improve previous papers.
-
👀 Does all binary code look the same to you? Upgrade your #BinaryDiffing skills and automate your diffing tasks with @joxean’s training on Advanced Binary Diffing with #Diaphora!
🎟️ https://ringzer0.training/trainings/advanced-binary-diffing-with-diaphora.html
-
👀 Can’t spot the difference? Upgrade your #BinaryDiffing skills and automate your diffing tasks with Joxean Koret’s training on Advanced Binary Diffing with #Diaphora!
🎟️ https://ringzer0.training/trainings/advanced-binary-diffing-with-diaphora.html
🧵 6/10