home.social

Search

712 results for “hellmanmd”

  1. If you’ve never heard of NIST SP 800-108 before, or NIST Special Publications in general, here’s a quick primer:

    Special Publications are a type of publication issued by NIST. Specifically, the SP 800-series reports on the Information Technology Laboratory’s research, guidelines, and outreach efforts in computer security, and its collaborative activities with industry, government, and academic organizations. These documents often support FIPS (Federal Information Protection Standards).

    Via NIST.gov

    One of the NIST 800-series documents concerned with Key Derivation using Pseudorandom Functions is NIST SP 800-108, first published in 2009.

    In October 2021, NIST published a draft update to NIST SP 800-108 and opened a comment period until January 2022. This update mostly included Keccak-based Message Authentication Codes (KMAC) in addition to the incumbent standardized designs (HMAC and CMAC).

    Upon reviewing a proposal for NIST SP 800-108 revision 1 after its comment period opened, Amazon’s cryptographers discovered a novel security issue with the standard.

    I was a co-author of the public comment that disclosed this issue, along with Matthew Campagna, Panos Kampanakis, and Adam Petcher, but take no credit for its discovery.

    Consequently, Section 6.7 was added to the final revision 1 of the standard to address Key Control Security.

    This post examines the attack against the initial SP 800-108 design when AES-CMAC is used as the PRF in KDF Counter mode.

    This meme is the TL;DR of this blog post

    Preliminaries

    (If you’re in a hurry, feel free to skip to the attack.)

    NIST SP 800-108 specifies a “KDF in Counter Mode” that can be used with several PRFs, including AES-CMAC. It’s worth noting that this family of KDFs can be defined to use any arbitrary PRF, but only the PRFs approved by NIST for this use are recommended.

    AES-CMAC is a one-key CBC-MAC construction. Some cryptographers, such as Matt Green, are famously not fond of CBC-MAC.

    KDF Security and PRF Security

    Yes, I will take any excuse to turn cryptography knowledge into wholesome memes.

    KDF stands for “Key Derivation Function”.

    PRF stands for “Pseudo-Random Function”.

    The security notion for KDF Security is stronger than PRF Security.

    PRFs require a uniformly-distributed secret key, while KDFs can tolerate a key that is not uniformly random.

    This matters if you’re, say, trying to derive symmetric encryption keys from a Diffie-Hellman shared secret of some sort, where the output of your DH() function has some algebraic structure.

    Realistically, the difference between the two security notions matters a lot less in scenarios where you’re deriving sub-keys from a primary uniformly random cryptographic secret.

    However, it does make your proofs nicer to achieve KDF security instead of merely PRF security.

    Key Control Security

    Let’s pretend, for simplicity, we have a generic KDF() function that offers KDF Security. We don’t need to know how it works just yet.

    Because KDFs are thought of as PRFs, but stronger, it seems perfectly reasonable that you could use KDF() in a setup where multiple inputs are provided, each from a different party, and the output would always be uniformly random.

    Further, even if all other parties’ inputs are known, it should remain computationally infeasible for one of the parties to influence the output of KDF() to produce a specific value; e.g. a key with all bits zeroed.

    The assumption that this result is computationally infeasible when working with KDF() is referred to as “Key Control Security”.

    Loss of Key Control Security in NIST SP 800-108

    You already know where this is going…

    I’m going to explain the attack by way of example.

    If you want a more formal treatment, I believe Appendix B of NIST SP 800-108 rev 1 has what you’re looking for.

    Imagine that you’re designing an online two-party private messaging app. To ensure forward secrecy, you implement a forward-secure KDF ratchet, loosely inspired by Signal’s design.

    For your KDF, you choose AES-CMAC in Counter Mode, because you’re designing for hardware that has accelerated AES instructions and want to avoid the overhead of hash functions.

    (Aside: I guess this would also imply you’re most likely selecting AES-CCM for your actual message encryption.)

    With each message, the sender commits some random bytes by encrypting them with their message. The recipient, after verifying the authentication tag and decrypting the message, possess knowledge of the same random bytes.

    Both parties then use the random bytes and the current symmetric key to ratchet forward to a new 128-bit symmetric key.

    The million dollar question is: Is this ratcheting protocol secure?

    In the case of KDF in Counter Mode with AES-CMAC, if you have more than 16 bytes of input material, the answer is simply: No.

    How The Attack Works

    A two-block implementation of this KDF is normally computed as follows:

    1. Return

    Don’t get intimidated by the notation. This is just AES encryption and XOR.

    The messages and are defined in the KDF specification. In the scenario we sketched above, we assume the attacker can choose these arbitrarily.

    To coerce a recipient to use an arbitrary 128-bit value (i.e., ) all an attacker needs to do is:

    1. Calculate
    2. Let some value
      • Here, is the target value.
    3. Force

    Notice that is the result of encrypting , and our attacker’s goal in step 3 can be achieved solely by manipulating (which exists independent of )?

    That’s the vulnerability.

    The public comments and Appendix B on the NIST document describe the actual steps of computing to force a chosen , which involve manipulating the structure of to achieve this result.

    Feel free to check out both documents if you’re interested in the finer details.

    What Can An Attacker Actually Do With This?

    If an attacker controls both and …

    Or if an attacker knows some and can control …

    …then they can force the final KDF output to equal whatever 128-bit value they want you to use.

    The most straightforward application of the loss of key control security is to introduce a backdoor into an application.

    If the Underhanded Crypto Contest were still running this year, NIST SP 800-108 using AES-CMAC in Counter Mode would be an excellent basis for a contestant.

    Does Anyone Actually Use NIST SP 800-108 This Way?

    I’m not aware of any specific products or services that use this KDF in this way. I will update this section if someone finds any.

    Is This A Deliberate Backdoor in a NIST Standard?

    No.

    I understand that, in the wake of Dual_EC_DRBG, there is a lot of distrust for NIST’s work on standardized cryptography.

    However, I have no specific knowledge to indicate this was placed deliberately in the standard.

    It is inaccurate to describe the loss of key control security in this context as a backdoor. Instead, it’s an unexpected property of the algorithms that can be used to create a clever backdoor. These are wildly different propositions.

    At least, that was the case until it was disclosed to NIST in January 2022. 🙂

    (I’m including an answer to this question, preemptively, in case someone overreacts when I publish this blog post. I hope it proves unnecessary, but I figured some caution was warranted.)

    Mitigation Options

    If you care about Key Control Security and use NIST SP 800-108, you should use HMAC or KMAC instead of CMAC. Only CMAC is impacted.

    Revision 1 of NIST SP800-108 also outlines another mitigation that involves changing the inputs to include an additional (but reusable) PRF output for every block.

    This tweak does change makes the KDF behave more like our intuition for PRFs, but in my opinion it’s better to avoid using CMAC entirely for KDFs.

    Why Wasn’t This Widely Publicized?

    As interesting and surprising as the loss of Key Control Security in a NIST standard is to cryptography nerds, it’s exactly not like Heartbleed or Log4shell.

    That said, regardless of your personal feelings on NIST, if you’re interesting in not having findings like this slip through the cracks in the future, it’s generally worthwhile to pay attention to what NIST is up to.

    https://scottarc.blog/2024/06/04/attacking-nist-sp-800-108/

    #cybersecurity #framework #KDF #KDFSecurity #KeyDerivationFunctions #NIST #NISTSP800108 #PRFSecurity #security #standards #symmetricCryptography

  2. If you’ve never heard of NIST SP 800-108 before, or NIST Special Publications in general, here’s a quick primer:

    Special Publications are a type of publication issued by NIST. Specifically, the SP 800-series reports on the Information Technology Laboratory’s research, guidelines, and outreach efforts in computer security, and its collaborative activities with industry, government, and academic organizations. These documents often support FIPS (Federal Information Protection Standards).

    Via NIST.gov

    One of the NIST 800-series documents concerned with Key Derivation using Pseudorandom Functions is NIST SP 800-108, first published in 2009.

    In October 2021, NIST published a draft update to NIST SP 800-108 and opened a comment period until January 2022. This update mostly included Keccak-based Message Authentication Codes (KMAC) in addition to the incumbent standardized designs (HMAC and CMAC).

    Upon reviewing a proposal for NIST SP 800-108 revision 1 after its comment period opened, Amazon’s cryptographers discovered a novel security issue with the standard.

    I was a co-author of the public comment that disclosed this issue, along with Matthew Campagna, Panos Kampanakis, and Adam Petcher, but take no credit for its discovery.

    Consequently, Section 6.7 was added to the final revision 1 of the standard to address Key Control Security.

    This post examines the attack against the initial SP 800-108 design when AES-CMAC is used as the PRF in KDF Counter mode.

    This meme is the TL;DR of this blog post

    Preliminaries

    (If you’re in a hurry, feel free to skip to the attack.)

    NIST SP 800-108 specifies a “KDF in Counter Mode” that can be used with several PRFs, including AES-CMAC. It’s worth noting that this family of KDFs can be defined to use any arbitrary PRF, but only the PRFs approved by NIST for this use are recommended.

    AES-CMAC is a one-key CBC-MAC construction. Some cryptographers, such as Matt Green, are famously not fond of CBC-MAC.

    KDF Security and PRF Security

    Yes, I will take any excuse to turn cryptography knowledge into wholesome memes.

    KDF stands for “Key Derivation Function”.

    PRF stands for “Pseudo-Random Function”.

    The security notion for KDF Security is stronger than PRF Security.

    PRFs require a uniformly-distributed secret key, while KDFs can tolerate a key that is not uniformly random.

    This matters if you’re, say, trying to derive symmetric encryption keys from a Diffie-Hellman shared secret of some sort, where the output of your DH() function has some algebraic structure.

    Realistically, the difference between the two security notions matters a lot less in scenarios where you’re deriving sub-keys from a primary uniformly random cryptographic secret.

    However, it does make your proofs nicer to achieve KDF security instead of merely PRF security.

    Key Control Security

    Let’s pretend, for simplicity, we have a generic KDF() function that offers KDF Security. We don’t need to know how it works just yet.

    Because KDFs are thought of as PRFs, but stronger, it seems perfectly reasonable that you could use KDF() in a setup where multiple inputs are provided, each from a different party, and the output would always be uniformly random.

    Further, even if all other parties’ inputs are known, it should remain computationally infeasible for one of the parties to influence the output of KDF() to produce a specific value; e.g. a key with all bits zeroed.

    The assumption that this result is computationally infeasible when working with KDF() is referred to as “Key Control Security”.

    Loss of Key Control Security in NIST SP 800-108

    You already know where this is going…

    I’m going to explain the attack by way of example.

    If you want a more formal treatment, I believe Appendix B of NIST SP 800-108 rev 1 has what you’re looking for.

    Imagine that you’re designing an online two-party private messaging app. To ensure forward secrecy, you implement a forward-secure KDF ratchet, loosely inspired by Signal’s design.

    For your KDF, you choose AES-CMAC in Counter Mode, because you’re designing for hardware that has accelerated AES instructions and want to avoid the overhead of hash functions.

    (Aside: I guess this would also imply you’re most likely selecting AES-CCM for your actual message encryption.)

    With each message, the sender commits some random bytes by encrypting them with their message. The recipient, after verifying the authentication tag and decrypting the message, possess knowledge of the same random bytes.

    Both parties then use the random bytes and the current symmetric key to ratchet forward to a new 128-bit symmetric key.

    The million dollar question is: Is this ratcheting protocol secure?

    In the case of KDF in Counter Mode with AES-CMAC, if you have more than 16 bytes of input material, the answer is simply: No.

    How The Attack Works

    A two-block implementation of this KDF is normally computed as follows:

    1. Return

    Don’t get intimidated by the notation. This is just AES encryption and XOR.

    The messages and are defined in the KDF specification. In the scenario we sketched above, we assume the attacker can choose these arbitrarily.

    To coerce a recipient to use an arbitrary 128-bit value (i.e., ) all an attacker needs to do is:

    1. Calculate
    2. Let some value
      • Here, is the target value.
    3. Force

    Notice that is the result of encrypting , and our attacker’s goal in step 3 can be achieved solely by manipulating (which exists independent of )?

    That’s the vulnerability.

    The public comments and Appendix B on the NIST document describe the actual steps of computing to force a chosen , which involve manipulating the structure of to achieve this result.

    Feel free to check out both documents if you’re interested in the finer details.

    What Can An Attacker Actually Do With This?

    If an attacker controls both and …

    Or if an attacker knows some and can control …

    …then they can force the final KDF output to equal whatever 128-bit value they want you to use.

    The most straightforward application of the loss of key control security is to introduce a backdoor into an application.

    If the Underhanded Crypto Contest were still running this year, NIST SP 800-108 using AES-CMAC in Counter Mode would be an excellent basis for a contestant.

    Does Anyone Actually Use NIST SP 800-108 This Way?

    I’m not aware of any specific products or services that use this KDF in this way. I will update this section if someone finds any.

    Is This A Deliberate Backdoor in a NIST Standard?

    No.

    I understand that, in the wake of Dual_EC_DRBG, there is a lot of distrust for NIST’s work on standardized cryptography.

    However, I have no specific knowledge to indicate this was placed deliberately in the standard.

    It is inaccurate to describe the loss of key control security in this context as a backdoor. Instead, it’s an unexpected property of the algorithms that can be used to create a clever backdoor. These are wildly different propositions.

    At least, that was the case until it was disclosed to NIST in January 2022. 🙂

    (I’m including an answer to this question, preemptively, in case someone overreacts when I publish this blog post. I hope it proves unnecessary, but I figured some caution was warranted.)

    Mitigation Options

    If you care about Key Control Security and use NIST SP 800-108, you should use HMAC or KMAC instead of CMAC. Only CMAC is impacted.

    Revision 1 of NIST SP800-108 also outlines another mitigation that involves changing the inputs to include an additional (but reusable) PRF output for every block.

    This tweak does change makes the KDF behave more like our intuition for PRFs, but in my opinion it’s better to avoid using CMAC entirely for KDFs.

    Why Wasn’t This Widely Publicized?

    As interesting and surprising as the loss of Key Control Security in a NIST standard is to cryptography nerds, it’s exactly not like Heartbleed or Log4shell.

    That said, regardless of your personal feelings on NIST, if you’re interesting in not having findings like this slip through the cracks in the future, it’s generally worthwhile to pay attention to what NIST is up to.

    https://scottarc.blog/2024/06/04/attacking-nist-sp-800-108/

    #cybersecurity #framework #KDF #KDFSecurity #KeyDerivationFunctions #NIST #NISTSP800108 #PRFSecurity #security #standards #symmetricCryptography

  3. Vircolac – Veneration Review

    By Steel Druhm

    Sometimes a promo one-sheet actually does its job and gets you incredibly curious to hear something. That was the case with Ireland’s unusual death metal act Vircolac. I had no knowledge of them, but the one-sheet made it sound as if I had to hear their sophomore release Veneration or risk missing out on something unique and special. Steel hates missing out on something good as much as the next Viking gorilla, so I grabbed it and stashed it in the Jungle Room. The trials and tribulations began soon thereafter. You see, Vircolac are a very tough bird to pigeonhole with a sound ranging from OSDM to crust, doom, and several niche places in between. They’re not so much proggy as they are fucking crazy, and Veneration is all over the damn map in an unpredictable, haphazard way that feels devoid of a plan or blueprint. It’s filthy and ugly, but there are rare moments of unexpected beauty and grandeur too. In a nutshell, it’s a hot, soupy mess.

    Things open with ” The Lament (I Am Calling You) ” which is 100% pure Celtic folk music with passionate female singing and sawing strings. It’s primal, powerful, and leaves a big impression. As it fades out with increasingly frantic, unsettling strings, you’re launched abruptly into the gaping maw of vicious death that is the title track. It’s scuzzy, punky death in the vein of Autopsy with abrasive riffs and gruesome vocals tearing at your ear flesh. Over the next 5 minutes, Vircolac deliver a series of aural experiences that don’t always seem to be part of the same song. At one point the bruising death lapses into something that sounds a lot like recent Dark Tranquillity, only to stumble into moments that feel like the early Hellmammer demos from the 80s. It’s a wild ride for sure. Is it a good one though? Tough to say. “Repentant” is also chaotic, abrasive crust-death but this gives way to large Black Royal-esque power grooves that shake the rafters. It’s wild and woolly and there’s good stuff going on, but as with the title track, segments feel pasted together with boogers and bubble gum without rhyme or reason.

    Then there are the mammoth tracks like “Our Burden of Stone on Bone” where the band really cuts loose with their Build-a-Bear song construction using extra glue, glitter, and googly eyes. As before, there are interesting pieces to this musical Frankenstein, but the madcap way they stitch things together makes for a tough listening experience. Transitions are like jump cuts in some artsy-farsty try-hard indie movie and nothing seems to develop logically. They latch onto a cool riff or groove and then leap into something unrelated without warning. Many of these jumps are between blasting death and plodding doom segments. While Incantation do these kinds of transitions seamlessly, Vircolac can not or will not. This gives the listener musical whiplash and makes it challenging to stay focused on the madness. Nearly 9-minute closer “She is Calling Me (I. War II. Death III. Redemption)” is better, with a somewhat more linear direction, but it too suffers from the band’s ADHD composition style. At a slim 36-plus minutes, Veneration ends up feeling much longer due to the disorganized writing. I struggle mightily to absorb the album in one sitting, usually bailing around the halfway point to go listen to something less chaotic and challenging, like Archspire.

    The players here are talented enough. Brendan McConnell uncorks some blistering, dissonant riffs and also offers some gonzo soloing. Some of his playing is actually quite striking and at times, beautiful. He’s a Renaissance man of sorts and his playing is easily the most interesting thing going on here. Darragh O’Laoghaire comes from the Chris Reifert school of rabid wolfman vocals and he goes all in at all times. He’s a good death vocalist, but his somewhat one-note croaking feels out-of-synch with the wildly shifting music at times. It’s the songsmithing that really derails the journey here, with a completely undisciplined, tumultuous style that tests the listener’s resolve.

    Veneration is a tough album to grasp and an even tougher one to score. There’s so much going on that it becomes difficult to process. The core style is well within my wheelhouse and there’s a lot of potential, but it isn’t fully realized. With some smoothing and a modicum of focus, I could see Vircolac being a deadly force. For now, they’re just a sanity destabilizing one. Mileage may vary for the criminally insane.

    Rating: 2.5/5.0
    DR: 8 | Format Reviewed: 320 kbps mp3
    Label: Dark Descent
    Websites: vircolac.bandcamp.com/album/veneration | facebook.com/vircolacdeathmetal
    Releases Worldwide: February 23rd, 2024

    #25 #2024 #Autopsy #DarkDescentRecords #DeathMetal #Feb24 #Incantation #IrishMetal #Review #Reviews #Veneration #Vircolac

  4. Vircolac – Veneration Review

    By Steel Druhm

    Sometimes a promo one-sheet actually does its job and gets you incredibly curious to hear something. That was the case with Ireland’s unusual death metal act Vircolac. I had no knowledge of them, but the one-sheet made it sound as if I had to hear their sophomore release Veneration or risk missing out on something unique and special. Steel hates missing out on something good as much as the next Viking gorilla, so I grabbed it and stashed it in the Jungle Room. The trials and tribulations began soon thereafter. You see, Vircolac are a very tough bird to pigeonhole with a sound ranging from OSDM to crust, doom, and several niche places in between. They’re not so much proggy as they are fucking crazy, and Veneration is all over the damn map in an unpredictable, haphazard way that feels devoid of a plan or blueprint. It’s filthy and ugly, but there are rare moments of unexpected beauty and grandeur too. In a nutshell, it’s a hot, soupy mess.

    Things open with ” The Lament (I Am Calling You) ” which is 100% pure Celtic folk music with passionate female singing and sawing strings. It’s primal, powerful, and leaves a big impression. As it fades out with increasingly frantic, unsettling strings, you’re launched abruptly into the gaping maw of vicious death that is the title track. It’s scuzzy, punky death in the vein of Autopsy with abrasive riffs and gruesome vocals tearing at your ear flesh. Over the next 5 minutes, Vircolac deliver a series of aural experiences that don’t always seem to be part of the same song. At one point the bruising death lapses into something that sounds a lot like recent Dark Tranquillity, only to stumble into moments that feel like the early Hellmammer demos from the 80s. It’s a wild ride for sure. Is it a good one though? Tough to say. “Repentant” is also chaotic, abrasive crust-death but this gives way to large Black Royal-esque power grooves that shake the rafters. It’s wild and woolly and there’s good stuff going on, but as with the title track, segments feel pasted together with boogers and bubble gum without rhyme or reason.

    Then there are the mammoth tracks like “Our Burden of Stone on Bone” where the band really cuts loose with their Build-a-Bear song construction using extra glue, glitter, and googly eyes. As before, there are interesting pieces to this musical Frankenstein, but the madcap way they stitch things together makes for a tough listening experience. Transitions are like jump cuts in some artsy-farsty try-hard indie movie and nothing seems to develop logically. They latch onto a cool riff or groove and then leap into something unrelated without warning. Many of these jumps are between blasting death and plodding doom segments. While Incantation do these kinds of transitions seamlessly, Vircolac can not or will not. This gives the listener musical whiplash and makes it challenging to stay focused on the madness. Nearly 9-minute closer “She is Calling Me (I. War II. Death III. Redemption)” is better, with a somewhat more linear direction, but it too suffers from the band’s ADHD composition style. At a slim 36-plus minutes, Veneration ends up feeling much longer due to the disorganized writing. I struggle mightily to absorb the album in one sitting, usually bailing around the halfway point to go listen to something less chaotic and challenging, like Archspire.

    The players here are talented enough. Brendan McConnell uncorks some blistering, dissonant riffs and also offers some gonzo soloing. Some of his playing is actually quite striking and at times, beautiful. He’s a Renaissance man of sorts and his playing is easily the most interesting thing going on here. Darragh O’Laoghaire comes from the Chris Reifert school of rabid wolfman vocals and he goes all in at all times. He’s a good death vocalist, but his somewhat one-note croaking feels out-of-synch with the wildly shifting music at times. It’s the songsmithing that really derails the journey here, with a completely undisciplined, tumultuous style that tests the listener’s resolve.

    Veneration is a tough album to grasp and an even tougher one to score. There’s so much going on that it becomes difficult to process. The core style is well within my wheelhouse and there’s a lot of potential, but it isn’t fully realized. With some smoothing and a modicum of focus, I could see Vircolac being a deadly force. For now, they’re just a sanity destabilizing one. Mileage may vary for the criminally insane.

    Rating: 2.5/5.0
    DR: 8 | Format Reviewed: 320 kbps mp3
    Label: Dark Descent
    Websites: vircolac.bandcamp.com/album/veneration | facebook.com/vircolacdeathmetal
    Releases Worldwide: February 23rd, 2024

    #25 #2024 #Autopsy #DarkDescentRecords #DeathMetal #Feb24 #Incantation #IrishMetal #Review #Reviews #Veneration #Vircolac

  5. Grönland och den nordiska solidariteten

    Jag utanför Universitet i Nuuk strax före föreläsning

    I mars förra året befann jag mig i Nuuk för att föreläsa om digital resiliens i Arktis, inför en sal med forskare och beslutsfattare. Utanför fönstren låg fjorden stilla och grå. Samma vecka befann sig USA:s vicepresident JD Vance i Pituffik, den amerikanska flygbasen längst upp i norr. Han talade om säkerhetsintressen. Jag talade om hur små samhällen kan skydda sin demokrati när infrastrukturen ägs av någon annan. Jag beskrev också min upplevelse av det grönländska folkets motstånd mot amerikanska intressen i Aftonbladets podd Höjd Beredskap, strax efter det att jag själv föreläst på plats. Poddavsnittet fick titeln “Räkna med att Trump menar allvar om Grönland”. Jag och JD rörde oss i samma arktiska landskap den veckan, men i helt olika världar. Jag minns hur jag tänkte att vi alla befann oss i ett slags mellanrum, ett ögonblick innan någonting förändrades.

    Det har förändrats nu.

    Den 3 januari i år genomförde USA sin största militära operation i Latinamerika sedan invasionen av Panama 1989. Operationen motiverades med Monroe-doktrinen, principen från 1823 som hävdar att västra halvklotet är amerikanskt intresseområde. Samma dag publicerade Katie Miller, gift med Vita husets biträdande stabschef, en bild av Grönland täckt av den amerikanska flaggan. Under bilden stod ett enda ord. SOON. Danmarks ambassadör i Washington svarade inom timmar. Dagen efter gick statsminister Mette Frederiksen ut med ett ovanligt direkt budskap riktat rakt mot Washington. “Jag vill starkt uppmana USA att sluta med hoten mot en historiskt nära allierad och mot ett annat land och ett annat folk som mycket tydligt har sagt att de inte är till salu.” Trumps svar kom samma dag. “Vi behöver Grönland, absolut. Vi behöver det för vårt försvar.”

    Det som gör situationen än mer påtaglig är att Trump redan innan årsskiftet utsåg ett särskilt sändebud till Grönland. Uppdraget gick till Jeff Landry, Louisianas konservative guvernör, som tackade presidenten på sociala medier. “Det är en ära att tjäna i denna frivilliga position för att göra Grönland till en del av USA”, skrev han. Danmarks utrikesminister Lars Løkke Rasmussen kallade utnämningen “djupt upprörande” och kallade upp USA:s ambassadör för att kräva en förklaring. Det är ovanligt rakt språk mellan NATO-allierade. Grönlands regeringschef Jens-Frederik Nielsen svarade med större lugn men lika tydligt. “Det här förändrar ingenting för oss här hemma. Vi avgör vår egen framtid själva.”

    Men det stannar inte vid symboler och utnämningar. Redan i somras avslöjade Danmarks Radio att minst tre amerikanska män med kopplingar till Trump bedriver hemliga påverkansoperationer på Grönland. En av dem reste till Nuuk för att upprätta listor över USA-vänliga grönlänningar och personer som motsätter sig Trump. De samlade också in material som kan användas för att sätta Danmark i dålig dager i amerikanska medier. Danska säkerhetspolisen PET bekräftar att Grönland är måltavla för påverkanskampanjer som syftar till att skapa splittring mellan Grönland och Danmark. “Det vi ser är användningen av mjuk makt, påverkan och försök att skapa intern splittring”, sade en källa till DR. Det är ord som kunde beskriva Kremls metoder. Nu kommer de från en allierad.

    Danska flottan utanför Nuuk, mars 2025. Foto: Carl Heath

    Inte till salu. Det är intressant att Frederiksen valde just de orden. Den sista gången Danmark sålde ett territorium till USA var 1917. Då gällde det Danska Västindien, de tre karibiska öar som idag heter US Virgin Islands. Köpesumman var 25 miljoner dollar i guld. Men affären hade ett villkor. Danmark krävde att USA formellt skulle erkänna dansk suveränitet över hela Grönland. Den 4 augusti 1916 skrev utrikesminister Robert Lansing under en deklaration som gjorde just det. Med ett penndrag skapade amerikanerna ett undantag från sin egen Monroe-doktrin för att bekräfta att Grönland tillhörde Danmark. Det var ord som betydde något. De banade väg för att andra nationer också accepterade det danska anspråket.

    Hundra år senare tycks de orden ha glömts bort.

    För att förstå vad som händer idag behöver man ta del av den danska militära underrättelsetjänstens årsrapport. I årets utgåva av Udsyn nämns för första gången i historien USA som en del av hotbilden mot Danmark. Jacob Kaarsbo, tidigare chefanalytiker, kommenterade rapporten i Danmarks Radio. “Det är den part vi kallat vår största och viktigaste allierade ända sedan andra världskriget och bildandet av NATO, och nu är det plötsligt den parten som hotar oss.” Det mest känsliga och uppseendeväckande som någonsin kommit från den danska underrättelsetjänsten, sa han.

    Det är värt att stanna vid orden “viktigaste allierade” ett ögonblick. Danmark har inte bara varit medlem i NATO. Danmark har spillt blod för alliansen. I Afghanistan stupade 43 danska soldater mellan 2002 och 2013. Det låter kanske inte som mycket jämfört med USA:s 2 461. Men räknat per capita hade Danmark bland de högsta dödstalen av alla NATO-länder. Danska trupper skickades till Helmand-provinsen, en av de farligaste zonerna. De skickade inte logistikpersonal utan stridande förband. Ett litet land med fem och en halv miljon invånare betalade nästan samma pris som supermakten, mätt i proportion till sin befolkning.

    Jag stannar upp i den tanken. Den ordning som Danmark byggt sin säkerhet på sedan 1949 har USA som garant. Nu är garanten själv hotet. Samma nation som en gång erkände dansk suveränitet i skrift är den som nu hotar att upphäva den.

    Här uppstår en paradox som är har betydelse. NATO:s artikel 5 är tänkt att skydda medlemmar mot angrepp. Men den kan inte aktiveras mot USA. Skälet är enkelt. NATO fattar alla beslut genom konsensus. Varje medlem kan blockera. När Turkiet invaderade Cypern 1974 var alliansen paralyserad eftersom både Turkiet och Grekland var medlemmar. Den situationen pågår än idag, femtio år senare. Men skillnaden till det som sker nu är avgörande. USA är inte part i en regional tvist. USA är den som utövar hotet. Det finns ingen tredje kraft med auktoritet att medla.

    EU:s roll blir därför viktigare än någonsin. Lissabonfördragets artikel 42.7 innehåller en ömsesidig försvarsklausul som inte kräver konsensus på samma sätt som NATO. Frankrike aktiverade den efter Parisattackerna 2015. Grönlands status som utomeuropeiskt territorium skapar juridisk osäkerhet, det är sant. Men om EU vill vara en gemenskap som faktiskt står för något blir det svårt att titta bort.

    Från svensk horisont är läget ovanligt tydligt. Danmark är vårt grannland. Statsminister Ulf Kristersson sade den 4 januari att det bara är Danmark och Grönland som har rätt att bestämma i frågor som rör dem, och att Sverige står fullt ut bakom sitt grannland. Det är bra. Frågan är vad som kommer efter orden. Magnus Christiansson vid Försvarshögskolan formulerade konsekvenserna i Dagens Nyheter. “Om USA tar Grönland är det slut med NATO. Det skulle totalt slå sönder hela grunden för alliansen.”

    Det är en stor tanke att hålla i huvudet. Att stå upp för Danmark och för folkrätten kan innebära att ställa sig mot USA. Inte mot idén om Amerika, utan mot den nu sittande administration som nu styr landet och dess uttalade avsikter. Det svindlar lite. Men om vi accepterar den situation som nu råder, att en allierad stormakt hotar en annan allierad med territoriellt övertagande, utan att det för med sig konsekvenser, vad finns då kvar av de principer vi säger oss dela?

    Moskva och Beijing har förstås noterat utvecklingen. Ryska kommentatorer var snabba att ställa den uppenbara frågan. Om USA kan åberopa Monroe-doktrinen i Venezuela, varför skulle inte Ryssland kunna åberopa sina säkerhetsintressen i Ukraina? Det är ett narrativ som tjänar Kremls syften. Konflikten mellan Europa och USA gynnar dem som längtar efter att se den transatlantiska gemenskapen falla samman. Men det vore för enkelt att skylla på utomstående krafter. Det är den amerikanska administrationen som valt att hota en allierad. Ingen tvingar dem. De gör detta mot sig själva och omvärlden. Simon Mølholm Olesen, historiker vid Aarhus Universitet, påminner om att den amerikanska militära närvaron på Grönland redan vilar på ett avtal från 1951 som ger USA vida befogenheter att bygga installationer. “Det är värt att notera när man talar om amerikanska säkerhetsbehov.”

    Nivi Olsen, utbildningsminister, Grönland. Foto: Carl Heath

    I oktober deltog jag i ett panelsamtal på Island tillsammans med gxrönländska röster som delade sina erfarenheter av att plötsligt befinna sig i centrum för en global uppmärksamhet de aldrig bett om. Grönlands utbildningsminister Nivi Olsen öppnade med ord som har stannat kvar hos mig sedan dess. “Vi är inte handelsvaror. Vi är människor med historia, med kultur och med värdighet.”

    Det finns en linje från det samtalet på Island, genom min vecka i Nuuk med den stilla fjorden utanför fönstret, till det som utspelar sig nu. Den handlar om vad som händer när små samhällen hamnar i stormakternas blickfång. Om hur fort retoriska utspel kan förvandlas till verkliga hot. Och om vad avtal och deklarationer egentligen är värda när den starkare parten bestämmer sig för att glömma dem.

    I mars förra året kände jag att vi befann oss i ett mellanrum. Nu vet jag vad som kom efter. Det som händer nu är en prövning, ett test. Inte av USA, utan av oss. Av om vi menar något med de principer vi säger oss dela. Om avtal mellan nationer betyder något när det verkligen gäller. Om ett folk i Arktis har rätt att säga nej. Om Norden ställer upp för varandra. Robert Lansings penndrag från 1916 samlar damm i ett arkiv någonstans. Men svaret på de frågorna skrivs inte där. Det skrivs nu.

    Referenser

    Atlantic Council. (2024). NATO’s decision process has an Achilles’ heel. New Atlanticist.

    DR. (2025, 10 december). USA beskrives på ny måde i trusselsvurdering: ‘Noget af det mest følsomme og opsigtsvækkende’. Danmarks Radio.

    DR Mørklagt. (2025, 27 augusti). Centrale kilder: Mænd med forbindelser til Trump forsøger at infiltrere Grønland. Danmarks Radio.

    European Parliament. (2015). The EU’s mutual assistance clause: The first ever activation of Article 42(7) TEU. European Parliament Research Service.

    Forsvarets Efterretningstjeneste. (2025). Udsyn 2025.

    Heath, C. (2025, 27 mars). Digital resilience in the Arctic. carlheath.se.

    Heath, C. (2025, 18 oktober). “We are not commodities” – Grönland och kampen om det digitala samtalet. carlheath.se.

    Mølholm Olesen, S. (2026, 5 januari). Greenland’s sovereignty. LinkedIn.

    nordics.info. (2019, 21 juni). USA’s declaration on Danish sovereignty of Greenland, 1916. Aarhus University.

    NPR. (2026, 4 januari). Denmark’s prime minister says ‘stop the threats’ over Greenland.

    Pierini, M. (2020, 15 september). Is NATO paralyzed over the Greece-Turkey conflict? Carnegie Europe.

    Scherer, M. (2026, 4 januari). Trump threatens Venezuela’s new leader. The Atlantic.

    SVT. (2026, 4 januari). Mette Frederiksen svarar Trump om Grönland – stödet från Kristersson. SVT Nyheter.

    Wikipedia. (2026). Coalition casualties in Afghanistan.

    Wikipedia. (2026). Treaty of the Danish West Indies.

    DN. (2026, 4 januari). Expert: Nato spricker om USA tar över Grönland. Dagens Nyheter.

    BBC. (2025, 22 december). Trump says US ‘has to have’ Greenland after naming special envoy. BBC News.

    #Danmark #Demokrati #DigitalResiliens #EU #Grönland #Nato #Svenska #USA
  6. Cyber-droni, elusione radar e sciami autonomi: Roma, la sfida invisibile del Giubileo 2025

    Il Giubileo 2025 a Roma rappresenta una sfida non solo per la gestione di milioni di pellegrini e turisti, ma anche per la protezione dello spazio aereo urbano. I droni, sempre più diffusi e accessibili, portano con sé vulnerabilità e rischi significativi. Nonostante l’implementazione di sistemi avanzati di monitoraggio e controllo, permangono criticità legate ai droni non conformi o autocostruiti, in grado di eludere i sistemi di identificazione e tracciamento.

    Tra elusione radar, sciami autonomi, comunicazioni criptate e le minacce dei cyber-droni, si delinea un nuovo scenario tecnologico per la Capitale

    Il sistema di controllo UAV a Roma


    In vista dell’evento, Roma ha installato due antenne lungo l’asse Vaticano-Aurelia per il controllo del traffico dei droni. Il sistema consente di monitorare in tempo reale i droni autorizzati e trasmettere i dati alle autorità competenti, con l’obiettivo di garantire la sicurezza dello spazio aereo cittadino. Tuttavia, nonostante i progressi tecnologici, esistono ancora diverse vulnerabilità che potrebbero comprometterne l’efficacia.

    Funzionalità del sistema di tracciamento

    Il sistema è progettato per collezionare i segnali DRI, aggregare telemetria e inviare alert alle autorità competenti in caso di anomalie

    Limiti operativi in ambiente urbano

    Edifici alti, multipath RF e rumore elettromagnetico riducono la capacità di identificazione; i droni a bassa RCS o RF‑silent possono restare inosservati.

    Tecniche di elusione e impatti sulla sicurezza


    Questo articolo analizza le principali tecniche di elusione: dai voli a bassa quota all’impiego di materiali stealth, fino alle comunicazioni criptate e ai sistemi di navigazione autonoma basati su SLAM (Simultaneous Localization and Mapping) e Deep Reinforcement Learning (DRL). Verrà anche discusso l’impatto dell’uso di comunicazioni sicure (AES, FHSS, VPN), che pur migliorando la protezione per gli operatori legittimi, ostacolano le attività di contrasto da parte delle autorità. Da qui la necessità di aggiornamenti normativi, strumenti tecnologici più evoluti e un approccio integrato alla difesa dello spazio aereo.

    Droni non conformi: una sfida concreta per la sicurezza urbana


    Con l’avvicinarsi della fase clou del Giubileo 2025, Roma si prepara a gestire eventi sempre più affollati e complessi. In questo contesto, uno dei rischi maggiori per la sicurezza dello spazio aereo urbano riguarda i droni che non rispettano le normative europee sull’identificazione elettronica a distanza (DRI). Alcuni modelli autocostruiti o modificati possono infatti non trasmettere i segnali identificativi obbligatori, eludendo i sistemi di monitoraggio installati.

    Questo fenomeno rappresenta una minaccia concreta: un drone “invisibile” può sorvolare aree sensibili trasportando e rilasciando oggetti non autorizzati (che si tratti di apparecchiature di spionaggio, merce di contrabbando o materiali pericolosi) senza che le autorità abbiano il tempo di intervenire tempestivamente.

    Definizione operativa di “drone autocostruito”


    Un drone autocostruito è un velivolo UAV assemblato manualmente combinando componenti standard o artigianali. Possono variare da semplici quadricotteri fino a esacotteri o ottocotteri.

    Perché sono difficili da tracciare


    La modularità e l’uso di firmware open-source facilitano la rimozione del modulo DRI e la sostituzione dei canali di comunicazione con LTE/5G o protocolli proprietari.

    Componentistica tecnica


    I principali componenti includono:

    • Sensori (IMU, giroscopio, barometro, bussola) – fondamentali per stabilizzazione e orientamento.
    • Telaio (frame) – struttura di supporto in fibra di carbonio, alluminio o plastica rinforzata.
    • Motori brushless – garantiscono la spinta necessaria al volo.
    • ESC (Electronic Speed Controller) – regolano la velocità dei motori.
    • Eliche (propellers) – determinano portanza e direzione.
    • Batteria LiPo – fonte di alimentazione ad alta densità.
    • Flight Controller (FC) – il “cervello” che gestisce stabilità e navigazione.
    • Ricevitore e trasmettitore RC – collegamento con il radiocomando.
    • GPS e telemetria – utilizzati per navigazione autonoma e posizionamento.

    Questa struttura modulare, unita alla diffusione di firmware open-source, rende molto semplice personalizzare i droni, rimuovere o disattivare il modulo DRI e sostituire i canali di comunicazione standard con collegamenti alternativi, come modem LTE/5G.

    Tecniche di elusione più diffuse


    I malintenzionati possono contare su strategie ormai collaudate per aggirare i sistemi di rilevamento. Tra le tecniche più comuni vi sono:

    • Voli a bassa quota, che riducono la probabilità di essere intercettati dalle antenne.
    • Uso di materiali stealth, che minimizzano la traccia radar del drone.
    • Comunicazioni criptate o sistemi di navigazione autonomi, che impediscono l’intercettazione dei segnali da parte delle autorità di controllo.
    • Modalità “dark drone” con assenza di emissioni RF
    • Uso di firmware modificati

    Negli ultimi anni, la ricerca ha prodotto sistemi di volo autonomo altamente avanzati, capaci di operare in ambienti urbani complessi e privi di segnale GPS. Queste tecnologie possono rendere i droni particolarmente difficili da rilevare o neutralizzare, anche in presenza di sistemi di sorveglianza avanzata come quelli implementati in vista del Giubileo.

    Volo a bassa quota e interazione con l’ambiente urbano


    I voli a bassa quota minimizzano il tempo di esposizione rispetto ai sensori a lungo raggio ma aumentano il rischio di collisione; sono efficaci contro sistemi pensati per obiettivi a quote maggiori.

    Materiali stealth e RCS


    L’impiego di RAM e geometrie ottimizzate riduce la riflessione radar; in piccoli UAV la RCS può scendere sotto soglie che rendono difficile la rilevazione con radar X/S.

    Navigazione autonoma e ambienti urbani complessi


    Il sistema SLAM (Simultaneous Localization and Mapping) consente ai droni di costruire mappe tridimensionali dell’ambiente e localizzarsi all’interno di esse, utilizzando esclusivamente sensori visivi e IMU, senza necessità di GPS. Questo è particolarmente utile in aree urbane complesse come il centro storico di Roma, dove la ricezione satellitare può essere ostacolata da edifici e strutture. Le survey recenti evidenziano progressi significativi nel gestire ambienti dinamici e texture deboli. L’integrazione deep learning + V‑SLAM migliora affidabilità e performance in tempi reali su hardware da drone.

    Focus operativo: Un drone malevolo dotato di SLAM potrebbe volare in modalità “radio-silenziosa” tra vicoli, basiliche e piazze affollate, evitando automaticamente ostacoli e rendendosi invisibile ai sistemi di tracciamento basati su segnali RF o GPS

    DRL e apprendimento delle traiettorie
    Il Deep Reinforcement Learning (DRL) ha rivoluzionato la navigazione autonoma dei droni, consentendo l’addestramento di agenti intelligenti che apprendono traiettorie sicure in ambienti sconosciuti. Algoritmi come PPO (Proximal Policy Optimization), DDPG e TD3 sono stati utilizzati per sviluppare strategie di volo adattative, capaci di evitare ostacoli, seguire obiettivi e rispondere in tempo reale a nuove minacce. Algoritmi end-to-end (es. SAC) supportano navigazione BVLOS in scenari molto dinamici. Framework modulari come “VizNav” usano TD3 e PER per volo efficiente e reattivo in 3D. Modelli ibridi che sfruttano input visivi e LiDAR permettono iniezioni di input contestuali per migliore percezione. I modelli DRL non seguono regole fisse: apprendendo dall’ambiente, generano risposte che complicano le contromisure predeterminate.
    Uno sciame ostile può frammentare la missione e ridistribuire compiti, garantendo persistenza dell’effetto operativo anche in caso di intercettazione di singoli nodi

    Focus operativo: Un drone equipaggiato con DRL potrebbe riconoscere automaticamente i pattern di sorveglianza della polizia aerea e modificare la rotta per eluderli, sfruttando zone cieche o percorsi meno monitorati. Questo comportamento, non pre-programmato ma appreso, rende estremamente difficile anticiparne le mosse.

    Sistemi DDA e navigazione in ambienti urbani affollati


    I sistemi Detect and Avoid (DAA) integrano sensori radar, telecamere e algoritmi di computer vision per prevenire collisioni e identificare ostacoli in volo, anche in ambienti ad alta densità di traffico. Questi sistemi permettono operazioni autonome in spazi aerei controllati, aumentando la sicurezza durante il volo in ambienti urbani o critici. Droni commerciali autorizzati per la logistica o la sorveglianza potrebbero sfruttare DAA per navigare in ambienti congestionati. Tuttavia, anche un drone ostile con capacità DAA può “leggere” e aggirare i flussi di traffico aereo legale, mimetizzandosi nel contesto e rendendo difficile la sua identificazione.

    Fonti: Bresson, G., et al. (2017). IEEE Transactions on Intelligent Vehicles. – Tzoumas, V., et al. (2021). IEEE Robotics and Automation Letters. – Hwangbo, J., et al. (2017). IEEE Robotics and Automation Letters. – Yan, J., et al. (2022). Sensors, 22(8). Babbar, R., & Duggal, R. (2020). Journal of Aerospace Information Systems. – FlytBase. (2023). DAA Technology for BVLOS Drone Operations.

    Sciami cooperativi e intelligenza distribuita


    Gli sciami di droni si basano su algoritmi ispirati alla natura, che permettono a più velivoli di agire in modo coordinato ma decentralizzato. Ogni drone comunica con i vicini per prendere decisioni collettive, senza bisogno di una regia centrale. I modelli basati su Particle Swarm Optimization (PSO) permettono la traccia di bersagli nascosti anche sotto copertura, accelerando mappature complesse. L’utilizzo di algoritmi evolutivi multi-agente garantisce pattugliamenti efficienti anche in ambienti sconosciuti mentre i principi di swarm intelligence favoriscono robustezza, scalabilità e resilienza senza controllo centralizzato. La trasmissione dei segnali tra drone e operatore avviene attraverso protocolli radio. Per evitare l’intercettazione o il jamming (disturbo delle comunicazioni), i droni possono impiegare tecniche di cifratura avanzata.

    Comunicazioni mesh e resilienza al jamming
    Le reti mesh P2P aumentano la resilienza: la perdita di un nodo non compromette la missione. Contromisura efficace: analisi comportamentale a livello di rete

    Tecnologie stealth nei droni: invisibilità radar e RCS ridotto


    La tecnologia stealth nei droni si basa su due principi fondamentali: l’uso di materiali radar-absorbing (RAM) e la progettazione di geometrie con basso radar cross-section (RCS). La sezione radar di un oggetto rappresenta la quantità di energia riflessa verso il radar da cui è illuminato. Nei droni, specialmente quelli di piccole dimensioni, l’uso di superfici inclinate, materiali compositi e rivestimenti assorbenti consente una significativa riduzione della visibilità radar. Test condotti su modelli UAV in fibra di carbonio hanno mostrato valori medi di RCS inferiori a –17 dBsm, in un range di frequenza 3–16 GHz, rendendoli difficili da individuare con radar convenzionali a banda X o S. Inoltre, alcuni modelli commerciali impiegano plastiche conduttive trattate per deviare o assorbire microonde in arrivo.

    In un contesto urbano come quello di Roma durante il Giubileo 2025, un drone stealth potrebbe sorvolare aree sensibili mantenendo un profilo elettromagnetico indistinguibile dal rumore di fondo, eludendo antenne di rilevamento e radar passivi.

    Fonti: Mikhailov, M., et al. (2022). Characterization of RCS of Composite UAVs. MDPI Drones, 7(1), 39. https://www.mdpi.com/2504-446X/7/1/39 – Ali, Z. (2022). Effect of RCS variation on drone detectability. LinkedIn Engineering Note. https://www.linkedin.com/pulse/effect-radar-cross-section-rcs-variation-z2sqc

    Dark Drones: operatività senza emissioni RF


    I “dark drones” sono progettati per evitare l’identificazione attraverso segnali radio. Contrariamente ai droni tradizionali, che trasmettono dati in tempo reale su frequenze note (es. 2.4 GHz o 5.8 GHz), questi dispositivi volano in modalità RF-silent, escludendo completamente le comunicazioni durante il volo. Spesso operano tramite waypoint pre-programmati, caricati nella memoria del controllore di volo, oppure utilizzano visione computerizzata per orientarsi nell’ambiente circostante. Questa assenza di emissioni li rende invisibili a molti sistemi anti-drone che si basano su intercettazione RF.

    Un drone “oscuro” può anche disattivare il sistema di identificazione remota (DRI), obbligatorio secondo le normative europee, risultando legalmente invisibile. Inoltre, l’uso di flight controller open-source (es. Pixhawk, ArduPilot) consente l’installazione di firmware modificati per mascherare il comportamento elettromagnetico del dispositivo. In aree come Vaticano o Trastevere, questi droni possono attraversare il centro città senza alcuna rilevazione da parte dei sistemi attualmente attivi.

    Modalità operative dei dark drones


    I dark drones operano seguendo rotte preimpostate tramite waypoint caricati localmente, senza necessità di connessione continua. Il mission planning avviene offline, evitando qualsiasi trasmissione di dati rilevabile. Grazie a sensori ottici e algoritmi SLAM (Simultaneous Localization and Mapping), possono orientarsi in ambienti complessi senza GPS. L’assenza di segnali RF li rende praticamente invisibili ai radar e ai sistemi di tracciamento. Questa modalità di volo li rende ideali per operazioni clandestine o non autorizzate.

    Fonti: Echodyne (2023). What is a dark drone and how to ID one. https://www.echodyne.com/resources/news-events/what-is-a-dark-drone-and-how-to-id-one
    Dedrone (2023). Counter-UAS: Beyond RF Detection. https://www.dedrone.com/white-papers/counter-uas

    Radar fotonici per rilevamento obiettivi sthealth


    I radar fotonici, che per chi è nato negli anni ’70 ricordano bene la rappresentazione del futuro in stile #shōnen mecha giapponese, rappresentano una nuova generazione di dispositivi di rilevamento basati su tecnologie ottiche, in grado di superare i limiti dei radar convenzionali. Utilizzando impulsi laser e onde millimetriche generate otticamente, questi radar garantiscono risoluzione spaziale elevata, bassa interferenza e sensibilità a bersagli molto piccoli, come micro-UAV stealth.

    In Corea del Sud, il radar fotonico è stato testato con successo per individuare droni di piccole dimensioni oltre i 3 km, anche in condizioni atmosferiche avverse come nebbia o pioggia. I sistemi combinano capacità di imaging e intelligenza artificiale per classificare i bersagli sulla base di firme Doppler o profili elettromagnetici.

    Applicazioni urbane dei radar fotonici – Limiti e integrazione


    Installazioni su tetti o torri panoramiche possono fornire copertura complementare a radar e sensori RF, specialmente in contesti con molte superfici riflettenti. Le sfide operatve di questa nuova tecnologia rimangono, per il momento, i costi, l’integrazione in città popolate e la gestione dei falsi positivi.

    Fonti: Han, K. et al. (2023). Photonic radar performance in adverse environments. PLOS One, 18(12):e0322693. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0322693
    Aerospace Testing Int. (2023). South Korea tests photonic radar for drone detection. https://www.aerospacetestinginternational.com/news/south-korea-tests-photonic-radar-for-drone-detection.html

    Tecniche di cifratura e comunicazioni sicure nei droni


    La protezione delle comunicazioni tra drone e stazione di controllo è una componente essenziale per garantire la sicurezza operativa, soprattutto in scenari urbani ad alta densità come quelli previsti durante il Giubileo 2025. La vulnerabilità dei canali radio può infatti esporre i droni a intercettazioni, spoofing, man-in-the-middle e disturbi intenzionali (jamming). Per mitigare tali rischi, l’industria impiega una serie di tecnologie crittografiche e protocolli di trasmissione sicuri.

    AES e standard di cifratura


    L’AES-256 è lo standard di cifratura simmetrica più diffuso per la protezione delle comunicazioni drone-operatore. Questo sistema utilizza chiavi a 256 bit per criptare i dati, rendendoli virtualmente inviolabili senza la chiave corretta. È impiegato sia nella trasmissione in tempo reale dei dati di telemetria e controllo, sia per i flussi video trasmessi in FPV (first-person view). Tuttavia, la sua efficacia dipende dalla sicurezza nella gestione delle chiavi e dalla robustezza dell’intero protocollo applicativo.

    Alcune contromisure avanzate che potrebbero essere implementate, come l’adozione obbligatoria di standard AES in tutti i droni civili con sistemi di rolling-code delle chiavi e autenticazione a doppio fattore tra drone e stazione base

    FHSS e resilienza al jamming


    Il Frequency Hopping Spread Spectrum è una tecnica che prevede la trasmissione dei dati su un ampio spettro di frequenze, saltando da un canale all’altro secondo una sequenza pseudo-casuale nota solo al drone e al suo controller. Questo rende estremamente difficile per un attaccante bloccare la comunicazione, poiché dovrebbe interferire simultaneamente su tutte le frequenze o conoscere il pattern di salto.

    Per contrastare droni ostili che impiegano FHSS, sono necessari sistemi di rilevamento RF a spettro largo, capaci di tracciare variazioni rapide di frequenza e correlare l’attività sospetta al comportamento di volo.

    Fonte: NIST FIPS 197 – Advanced Encryption Standard (AES).Sklar, B. (2001). Digital Communications: Fundamentals and Applications.

    Protocolli personalizzati e chiavi asimmetriche


    Alcuni droni impiegano protocolli di comunicazione personalizzati, con crittografia a chiave pubblica/privata (RSA, ECC) per autenticare l’origine del comando e criptare i pacchetti. Questi sistemi aumentano la sicurezza rispetto ai protocolli standard, ma pongono nuove sfide per il rilevamento, in quanto le trasmissioni non seguono schemi noti ai sistemi anti-drone. La mancanza di standardizzazione ostacola l’intercettazione e la decriptazione da parte delle forze dell’ordine, rendendo urgente una normativa europea che definisca standard minimi di interoperabilità crittografica nei droni civili.

    L’uso di RSA, ECC e protocolli proprietari aumenta la sicurezza ma riduce l’interoperabilità e la capacità di intercettazione legittima.

    VPN e tunnel crittografati su LTE/5G


    I droni controllati tramite rete cellulare (LTE/5G) possono utilizzare VPN (Virtual Private Network) e tunnel crittografici (es. IPsec, WireGuard) per nascondere la posizione del pilota, proteggere i dati e sfuggire a tentativi di hijacking. La connessione cifrata impedisce il rilevamento delle coordinate GPS trasmesse, se non tramite il contenuto criptato. Qesto sistema però non è esente da implicazioni legali e di privacy; L’uso di una VPN oscura la provenienza e l’identità dell’operatore, rendendo difficile associare un volo ad un utente registrato ponendo in essere una seriedi problematiche sulla tracciabilità e l’applicazione delle normative UE.

    Comunicazioni peer-to-peer decentralizzate


    Gli sciami di droni possono comunicare tramite reti mesh P2P, dove ogni nodo funge da relay per gli altri. Questo elimina la necessità di un punto di controllo centrale, rendendo più difficile disabilitare l’intera rete con un singolo attacco. Inoltre, gli algoritmi distribuiti (es. gossip protocol) garantiscono una resilienza intrinseca alle interferenze. L’unico modo per contrastare efficacemente queste reti è tramite analisi comportamentali avanzate e intelligenza artificiale che identifichi pattern anomali di cooperazione tra UAV, anche in assenza di emissioni centrali.

    Fonte: Diffie, W., & Hellman, M. (1976). New Directions in Cryptography. IEEE Transactions on Information Theory. – RFC 4301 – Security Architecture for the Internet Protocol. – Brambilla, M., et al. (2013). Swarm robotics: a review. Swarm Intelligence, 7(1).

    Aspetti normativi, privacy e operativi


    Nel contesto sempre più complesso e sensibile dell’impiego di droni civili in ambito urbano, la questione della regolamentazione giuridica e della gestione operativa assume una centralità assoluta.

    Quadro normativo UE e italiano


    Nel contesto sempre più complesso e sensibile dell’impiego di droni civili in ambito urbano, la questione della regolamentazione giuridica e della gestione operativa assume una centralità assoluta. L’approccio normativo italiano si fonda su un quadro europeo delineato dai regolamenti UE 2019/947 e 2019/945, i quali stabiliscono le condizioni generali per l’impiego di sistemi aerei senza pilota. Tali regolamenti, recepiti in Italia attraverso l’Ente Nazionale per l’Aviazione Civile (ENAC), definiscono categorie operative, requisiti tecnici e obblighi di identificazione elettronica, in particolare mediante il sistema di Direct Remote Identification (DRI).

    Limiti pratici del DRI


    Dal 1° gennaio 2024, il DRI è diventato obbligatorio per tutte le operazioni UAS in categoria specifica, inclusi i voli secondo scenari standard italiani. I droni delle classi C1–C6 devono infatti essere dotati di un modulo che trasmetta in tempo reale informazioni essenziali, come il codice operatore, la posizione del drone, l’altitudine e, ove disponibile, la posizione del pilota. Questa trasmissione deve essere accessibile pubblicamente tramite piattaforme come D‑Flight, mentre solo le autorità possono accedere all’identità completa dell’operatore. Tuttavia, l’attuale architettura normativa non impone standard minimi di crittografia per le comunicazioni né vieta esplicitamente l’impiego di VPN, reti peer-to-peer o protocolli di trasmissione proprietari. Questo ha creato una vera e propria “zona grigia” normativa, all’interno della quale un drone può risultare formalmente in regola pur operando in modo tecnicamente opaco e difficile da tracciare dalle autorità.

    Regolamento UAS‑IT e misure aggiuntive


    Il regolamento italiano UAS‑IT, aggiornato da ENAC nel 2021, impone inoltre requisiti aggiuntivi in termini di registrazione delle missioni, logbook obbligatori e obbligo di tracciamento tramite D‑Flight. Tuttavia, anche in questo caso manca un controllo attivo e sistematico sui firmware installati, sulla corretta configurazione dei moduli DRI e sulla conformità dei protocolli crittografici impiegati. In assenza di audit tecnici periodici, il rischio che vengano impiegati firmware modificati o bypassati aumenta significativamente, compromettendo l’efficacia complessiva del sistema normativo.

    Operatività e coordinamento interforze


    In termini operativi, la crescente sofisticazione dei droni malevoli rende sempre più urgente il rafforzamento delle capacità tecnologiche delle forze dell’ordine. L’efficacia del sistema DRI, da sola, non è sufficiente in presenza di tecniche come il frequency hopping, la cifratura end-to-end o l’utilizzo di architetture mesh decentralizzate. Gli attuali sistemi di monitoraggio radiofrequenza risultano spesso incapaci di identificare pattern di trasmissione non standardizzati. Le autorità dovrebbero quindi dotarsi di strumenti avanzati, come ricevitori RF a spettro ampio, tecnologie di deep packet inspection, e algoritmi di intelligenza artificiale in grado di rilevare comportamenti anomali anche in assenza di segnali centrali. A ciò si aggiunge la necessità di neutralizzare fisicamente eventuali minacce, per mezzo di droni intercettori o sistemi jammer selettivi a corto raggio.

    Raccomandazioni operative

    Audit e certificazione


    A completamento di questo quadro, si segnala l’opportunità di istituire un sistema di audit tecnico e certificazione preventiva per ogni drone immesso sul mercato. Questo sistema dovrebbe prevedere la validazione obbligatoria dei firmware, la verifica della conformità del modulo DRI e l’inserimento in un registro pubblico consultabile dalle autorità. In aggiunta, sarebbe utile introdurre ispezioni tecniche post-vendita e durante l’utilizzo, con sanzioni specifiche per gli operatori non conformi.

    Strumenti per le forze operative


    • Ricevitori RF a spettro ampio
    • Tecnologie di deep packet inspection (DPI) per correlare telemetria e sessioni dati
    • Algoritmi AI per pattern detection e anomalie comportamentali
    • Droni intercettori a corto raggio e jammer selettivi

    Infine un protocollo operativo condiviso tra ENAC, ENAV, polizie locali, forze armate, protezione civile e prefetture è indispensabile. Occorre investire in formazione specialistica affinché le unità operative possano riconoscere segnali sospetti, attivare contromisure e coordinare interventi complessi.

    Oltre il drone: l’invisibilità bio-ispirata


    L’evoluzione tecnologica nel settore degli UAV non mostra alcun segno di rallentamento. Se oggi il dibattito sulla sicurezza urbana si concentra su droni a bassa osservabilità, comunicazioni criptate e sciami coordinati, il futuro prossimo promette scenari ancor più complessi e difficili da gestire. L’orizzonte non è più soltanto costituito da velivoli a pilotaggio remoto o sistemi autonomi di dimensioni ridotte, ma si estende a dispositivi bio-ibridi, miniaturizzati, neurologicamente controllati.

    Cyber-coleotteri e micro-vettori biologici


    Un articolo recentemente pubblicato da RedHotCyber documenta esperimenti avanzati nella creazione di “cyber-coleotteri”, insetti vivi ai quali sono stati applicati microelettrodi e zaini neurali in grado di controllarne i movimenti tramite joystick, senza annullarne le funzioni vitali. In pratica, la natura diventa veicolo. Non si tratta di semplici robot: questi dispositivi sfruttano l’autonomia biologica dell’organismo ospite, al quale viene associata un’interfaccia elettronica minima. Il risultato è un vettore in grado di muoversi senza emissioni radio, virtualmente invisibile a radar e sistemi elettro-ottici, con un peso complessivo inferiore a 5 grammi e un’autonomia che non dipende da batterie o software.

    Implicazioni etiche e regolamentari


    Da un punto di vista operativo, questi cyber-insetti rappresentano una sfida inedita per ogni infrastruttura di sorveglianza: nessuna firma RF, nessun segnale GPS da tracciare, e una capacità di infiltrazione senza precedenti. Sul piano normativo, pongono interrogativi radicali: non esistono oggi regolamenti EASA o ENAC che possano classificare o disciplinare l’utilizzo di esseri viventi potenziati da interfacce neurali per scopi civili o militari. La distinzione tra drone, macchina e organismo si fa sfocata, aprendo scenari etici e strategici che le autorità dovranno affrontare con urgenza.

    Conclusioni


    La protezione dello spazio aereo urbano richiede un approccio integrato, capace di bilanciare efficacemente privacy e sicurezza, norma e tecnologia, controllo e innovazione. Il Giubileo 2025 rappresenta una sfida di altissimo profilo: è necessario che la risposta italiana sia all’altezza, sia in termini giuridici che operativi.

    L'articolo Cyber-droni, elusione radar e sciami autonomi: Roma, la sfida invisibile del Giubileo 2025 proviene da Red Hot Cyber.

  7. Earlier this year, Cendyne wrote a blog post covering the use of HKDF, building partially upon my own blog post about HKDF and the KDF security definition, but moreso inspired by a cryptographic issue they identified in another company’s product (dubbed AnonCo).

    At the bottom they teased:

    Database cryptography is hard. The above sketch is not complete and does not address several threats! This article is quite long, so I will not be sharing the fixes.

    Cendyne

    If you read Cendyne’s post, you may have nodded along with that remark and not appreciate the degree to which our naga friend was putting it mildly. So I thought I’d share some of my knowledge about real-world database cryptography in an accessible and fun format in the hopes that it might serve as an introduction to the specialization.

    Note: I’m also not going to fix Cendyne’s sketch of AnonCo’s software here–partly because I don’t want to get in the habit of assigning homework or required reading, but mostly because it’s kind of obvious once you’ve learned the basics.

    I’m including art of my fursona in this post… as is tradition for furry blogs.

    If you don’t like furries, please feel free to leave this blog and read about this topic elsewhere.

    Thanks to CMYKat for the awesome stickers.

    Contents

    • Database Cryptography?
    • Cryptography for Relational Databases
      • The Perils of Built-in Encryption Functions
      • Application-Layer Relational Database Cryptography
        • Confused Deputies
        • Canonicalization Attacks
        • Multi-Tenancy
    • Cryptography for NoSQL Databases
      • NoSQL is Built Different
      • Record Authentication
        • Bonus: A Maximally Schema-Free, Upgradeable Authentication Design
    • Searchable Encryption
      • Order-{Preserving, Revealing} Encryption
      • Deterministic Encryption
      • Homomorphic Encryption
      • Searchable Symmetric Encryption (SSE)
      • You Can Have Little a HMAC, As a Treat
    • Intermission
    • Case Study: MongoDB Client-Side Encryption
      • MongoCrypt: The Good
        • How is Queryable Encryption Implemented?
      • MongoCrypt: The Bad
      • MongoCrypt: The Ugly
    • Wrapping Up

    Database Cryptography?

    The premise of database cryptography is deceptively simple: You have a database, of some sort, and you want to store sensitive data in said database.

    The consequences of this simple premise are anything but simple. Let me explain.

    Art: ScruffKerfluff

    The sensitive data you want to store may need to remain confidential, or you may need to provide some sort of integrity guarantees throughout your entire system, or sometimes both. Sometimes all of your data is sensitive, sometimes only some of it is. Sometimes the confidentiality requirements of your data extends to where within a dataset the record you want actually lives. Sometimes that’s true of some data, but not others, so your cryptography has to be flexible to support multiple types of workloads.

    Other times, you just want your disks encrypted at rest so if they grow legs and walk out of the data center, the data cannot be comprehended by an attacker. And you can’t be bothered to work on this problem any deeper. This is usually what compliance requirements cover. Boxes get checked, executives feel safer about their operation, and the whole time nobody has really analyzed the risks they’re facing.

    But we’re not settling for mere compliance on this blog. Furries have standards, after all.

    So the first thing you need to do before diving into database cryptography is threat modelling. The first step in any good threat model is taking inventory; especially of assumptions, requirements, and desired outcomes. A few good starter questions:

    1. What database software is being used? Is it up to date?
    2. What data is being stored in which database software?
    3. How are databases oriented in the network of the overall system?
      • Is your database properly firewalled from the public Internet?
    4. How does data flow throughout the network, and when do these data flows intersect with the database?
      • Which applications talk to the database? What languages are they written in? Which APIs do they use?
    5. How will cryptography secrets be managed?
      • Is there one key for everyone, one key per tenant, etc.?
      • How are keys rotated?
      • Do you use envelope encryption with an HSM, or vend the raw materials to your end devices?

    The first two questions are paramount for deciding how to write software for database cryptography, before you even get to thinking about the cryptography itself.

    (This is not a comprehensive set of questions to ask, either. A formal threat model is much deeper in the weeds.)

    The kind of cryptography protocol you need for, say, storing encrypted CSV files an S3 bucket is vastly different from relational (SQL) databases, which in turn will be significantly different from schema-free (NoSQL) databases.

    Furthermore, when you get to the point that you can start to think about the cryptography, you’ll often need to tackle confidentiality and integrity separately.

    If that’s unclear, think of a scenario like, “I need to encrypt PII, but I also need to digitally sign the lab results so I know it wasn’t tampered with at rest.”

    My point is, right off the bat, we’ve got a three-dimensional matrix of complexity to contend with:

    1. On one axis, we have the type of database.
      • Flat-file
      • Relational
      • Schema-free
    2. On another, we have the basic confidentiality requirements of the data.
      • Field encryption
      • Row encryption
      • Column encryption
      • Unstructured record encryption
      • Encrypting entire collections of records
    3. Finally, we have the integrity requirements of the data.
      • Field authentication
      • Row/column authentication
      • Unstructured record authentication
      • Collection authentication (based on e.g. Sparse Merkle Trees)

    And then you have a fourth dimension that often falls out of operational requirements for databases: Searchability.

    Why store data in a database if you have no way to index or search the data for fast retrieval?

    Credit: Harubaki

    If you’re starting to feel overwhelmed, you’re not alone. A lot of developers drastically underestimate the difficulty of the undertaking, until they run head-first into the complexity.

    Some just phone it in with AES_Encrypt() calls in their MySQL queries. (Too bad ECB mode doesn’t provide semantic security!)

    Which brings us to the meat of this blog post: The actual cryptography part.

    Cryptography is the art of transforming information security problems into key management problems.

    Former coworker

    Note: In the interest of time, I’m skipping over flat files and focusing instead on actual database technologies.

    Cryptography for Relational Databases

    Encrypting data in an SQL database seems simple enough, even if you’ve managed to shake off the complexity I teased from the introduction.

    You’ve got data, you’ve got a column on a table. Just encrypt the data and shove it in a cell on that column and call it a day, right?

    But, alas, this is a trap. There are so many gotchas that I can’t weave a coherent, easy-to-follow narrative between them all.

    So let’s start with a simple question: where and how are you performing your encryption?

    The Perils of Built-in Encryption Functions

    MySQL provides functions called AES_Encrypt and AES_Decrypt, which many developers have unfortunately decided to rely on in the past.

    It’s unfortunate because these functions implement ECB mode. To illustrate why ECB mode is bad, I encrypted one of my art commissions with AES in ECB mode:

    Art by Riley, encrypted with AES-ECB

    The problems with ECB mode aren’t exactly “you can see the image through it,” because ECB-encrypting a compressed image won’t have redundancy (and thus can make you feel safer than you are).

    ECB art is a good visual for the actual issue you should care about, however: A lack of semantic security.

    A cryptosystem is considered semantically secure if observing the ciphertext doesn’t reveal information about the plaintext (except, perhaps, the length; which all cryptosystems leak to some extent). More information here.

    ECB art isn’t to be confused with ECB poetry, which looks like this:

    Oh little one, you’re growing up
    You’ll soon be writing C
    You’ll treat your ints as pointers
    You’ll nest the ternary
    You’ll cut and paste from github
    And try cryptography
    But even in your darkest hour
    Do not use ECB

    CBC’s BEASTly when padding’s abused
    And CTR’s fine til a nonce is reused
    Some say it’s a CRIME to compress then encrypt
    Or store keys in the browser (or use javascript)
    Diffie Hellman will collapse if hackers choose your g
    And RSA is full of traps when e is set to 3
    Whiten! Blind! In constant time! Don’t write an RNG!
    But failing all, and listen well: Do not use ECB

    They’ll say “It’s like a one-time-pad!
    The data’s short, it’s not so bad
    the keys are long–they’re iron clad
    I have a PhD!”
    And then you’re front page Hacker News
    Your passwords cracked–Adobe Blues.
    Don’t leave your penguins showing through,
    Do not use ECB

    — Ben Nagy, PoC||GTFO 0x04:13

    Most people reading this probably know better than to use ECB mode already, and don’t need any of these reminders, but there is still a lot of code that inadvertently uses ECB mode to encrypt data in the database.

    Also, SHOW processlist; leaks your encryption keys. Oops.

    Credit: CMYKatt

    Application-layer Relational Database Cryptography

    Whether burned by ECB or just cautious about not giving your secrets to the system that stores all the ciphertext protected by said secret, a common next step for developers is to simply encrypt in their server-side application code.

    And, yes, that’s part of the answer. But how you encrypt is important.

    Credit: Harubaki

    “I’ll encrypt with CBC mode.”
    If you don’t authenticate your ciphertext, you’ll be sorry. Maybe try again?

    “Okay, fine, I’ll use an authenticated mode like GCM.”
    Did you remember to make the table and column name part of your AAD? What about the primary key of the record?

    “What on Earth are you talking about, Soatok?”
    Welcome to the first footgun of database cryptography!

    Confused Deputies

    Encrypting your sensitive data is necessary, but not sufficient. You need to also bind your ciphertexts to the specific context in which they are stored.

    To understand why, let’s take a step back: What specific threat does encrypting your database records protect against?

    We’ve already established that “your disks walk out of the datacenter” is a “full disk encryption” problem, so if you’re using application-layer cryptography to encrypt data in a relational database, your threat model probably involves unauthorized access to the database server.

    What, then, stops an attacker from copying ciphertexts around?

    Credit: CMYKatt

    Let’s say I have a legitimate user account with an ID 12345, and I want to read your street address, but it’s encrypted in the database. But because I’m a clever hacker, I have unfettered access to your relational database server.

    All I would need to do is simply…

    UPDATE table SET addr_encrypted = 'your-ciphertext' WHERE id = 12345

    …and then access the application through my legitimate access. Bam, data leaked. As an attacker, I can probably even copy fields from other columns and it will just decrypt. Even if you’re using an authenticated mode.

    We call this a confused deputy attack, because the deputy (the component of the system that has been delegated some authority or privilege) has become confused by the attacker, and thus undermined an intended security goal.

    The fix is to use the AAD parameter from the authenticated mode to bind the data to a given context. (AAD = Additional Authenticated Data.)

    - $addr = aes_gcm_encrypt($addr, $key);+ $addr = aes_gcm_encrypt($addr, $key, canonicalize([+     $tableName,+     $columnName,+     $primaryKey+ ]);

    Now if I start cutting and pasting ciphertexts around, I get a decryption failure instead of silently decrypting plaintext.

    This may sound like a specific vulnerability, but it’s more of a failure to understand an important general lesson with database cryptography:

    Where your data lives is part of its identity, and MUST be authenticated.

    Soatok’s Rule of Database Cryptography

    Canonicalization Attacks

    In the previous section, I introduced a pseudocode called canonicalize(). This isn’t a pasto from some reference code; it’s an important design detail that I will elaborate on now.

    First, consider you didn’t do anything to canonicalize your data, and you just joined strings together and called it a day…

    function dumbCanonicalize(    string $tableName,    string $columnName,    string|int $primaryKey): string {    return $tableName . '_' . $columnName . '#' . $primaryKey;}

    Consider these two inputs to this function:

    1. dumbCanonicalize('customers', 'last_order_uuid', 123);
    2. dumbCanonicalize('customers_last_order', 'uuid', 123);

    In this case, your AAD would be the same, and therefore, your deputy can still be confused (albeit in a narrower use case).

    In Cendyne’s article, AnonCo did something more subtle: The canonicalization bug created a collision on the inputs to HKDF, which resulted in an unintentional key reuse.

    Up until this point, their mistake isn’t relevant to us, because we haven’t even explored key management at all. But the same design flaw can re-emerge in multiple locations, with drastically different consequence.

    Multi-Tenancy

    Once you’ve implemented a mitigation against Confused Deputies, you may think your job is done. And it very well could be.

    Often times, however, software developers are tasked with building support for Bring Your Own Key (BYOK).

    This is often spawned from a specific compliance requirement (such as cryptographic shredding; i.e. if you erase the key, you can no longer recover the plaintext, so it may as well be deleted).

    Other times, this is driven by a need to cut costs: Storing different users’ data in the same database server, but encrypting it such that they can only encrypt their own records.

    Two things can happen when you introduce multi-tenancy into your database cryptography designs:

    1. Invisible Salamanders becomes a risk, due to multiple keys being possible for any given encrypted record.
    2. Failure to address the risk of Invisible Salamanders can undermine your protection against Confused Deputies, thereby returning you to a state before you properly used the AAD.

    So now you have to revisit your designs and ensure you’re using a key-committing authenticated mode, rather than just a regular authenticated mode.

    Isn’t cryptography fun?

    “What Are Invisible Salamanders?”

    This refers to a fun property of AEAD modes based on Polynomical MACs. Basically, if you:

    1. Encrypt one message under a specific key and nonce.
    2. Encrypt another message under a separate key and nonce.

    …Then you can get the same exact ciphertext and authentication tag. Performing this attack requires you to control the keys for both encryption operations.

    This was first demonstrated in an attack against encrypted messaging applications, where a picture of a salamander was hidden from the abuse reporting feature because another attached file had the same authentication tag and ciphertext, and you could trick the system if you disclosed the second key instead of the first. Thus, the salamander is invisible to attackers.

    Art: CMYKat

    We’re not quite done with relational databases yet, but we should talk about NoSQL databases for a bit. The final topic in scope applies equally to both, after all.

    Cryptography for NoSQL Databases

    Most of the topics from relational databases also apply to NoSQL databases, so I shall refrain from duplicating them here. This article is already sufficiently long to read, after all, and I dislike redundancy.

    NoSQL is Built Different

    The main thing that NoSQL databases offer in the service of making cryptographers lose sleep at night is the schema-free nature of NoSQL designs.

    What this means is that, if you’re using a client-side encryption library for a NoSQL database, the previous concerns about confused deputy attacks are amplified by the malleability of the document structure.

    Additionally, the previously discussed cryptographic attacks against the encryption mode may be less expensive for an attacker to pull off.

    Consider the following record structure, which stores a bunch of data stored with AES in CBC mode:

    {  "encrypted-data-key": "<blob>",  "name": "<ciphertext>",  "address": [    "<ciphertext>",    "<ciphertext>"  ],  "social-security": "<ciphertext>",  "zip-code": "<ciphertext>"}

    If this record is decrypted with code that looks something like this:

    $decrypted = [];// ... snip ...foreach ($record['address'] as $i => $addrLine) {    try {        $decrypted['address'][$i] = $this->decrypt($addrLine);    } catch (Throwable $ex) {        // You'd never deliberately do this, but it's for illustration        $this->doSomethingAnOracleCanObserve($i);                // This is more believable, of course:        $this->logDecryptionError($ex, $addrLine);        $decrypted['address'][$i] = '';    }}

    Then you can keep appending rows to the "address" field to reduce the number of writes needed to exploit a padding oracle attack against any of the <ciphertext> fields.

    Art: Harubaki

    This isn’t to say that NoSQL is less secure than SQL, from the context of client-side encryption. However, the powerful feature sets that NoSQL users are accustomed to may also give attackers a more versatile toolkit to work with.

    Record Authentication

    A pedant may point out that record authentication applies to both SQL and NoSQL. However, I mostly only observe this feature in NoSQL databases and document storage systems in the wild, so I’m shoving it in here.

    Encrypting fields is nice and all, but sometimes what you want to know is that your unencrypted data hasn’t been tampered with as it flows through your system.

    The trivial way this is done is by using a digital signature algorithm over the whole record, and then appending the signature to the end. When you go to verify the record, all of the information you need is right there.

    This works well enough for most use cases, and everyone can pack up and go home. Nothing more to see here.

    Except…

    When you’re working with NoSQL databases, you often want systems to be able to write to additional fields, and since you’re working with schema-free blobs of data rather than a normalized set of relatable tables, the most sensible thing to do is to is to append this data to the same record.

    Except, oops! You can’t do that if you’re shoving a digital signature over the record. So now you need to specify which fields are to be included in the signature.

    And you need to think about how to model that in a way that doesn’t prohibit schema upgrades nor allow attackers to perform downgrade attacks. (See below.)

    I don’t have any specific real-world examples here that I can point to of this problem being solved well.

    Art: CMYKat

    Furthermore, as with preventing confused deputy and/or canonicalization attacks above, you must also include the fully qualified path of each field in the data that gets signed.

    As I said with encryption before, but also true here:

    Where your data lives is part of its identity, and MUST be authenticated.

    Soatok’s Rule of Database Cryptography

    This requirement holds true whether you’re using symmetric-key authentication (i.e. HMAC) or asymmetric-key digital signatures (e.g. EdDSA).

    Bonus: A Maximally Schema-Free, Upgradeable Authentication Design

    Art: Harubaki

    Okay, how do you solve this problem so that you can perform updates and upgrades to your schema but without enabling attackers to downgrade the security? Here’s one possible design.

    Let’s say you have two metadata fields on each record:

    1. A compressed binary string representing which fields should be authenticated. This field is, itself, not authenticated. Let’s call this meta-auth.
    2. A compressed binary string representing which of the authenticated fields should also be encrypted. This field is also authenticated. This is at most the same length as the first metadata field. Let’s call this meta-enc.

    Furthermore, you will specify a canonical field ordering for both how data is fed into the signature algorithm as well as the field mappings in meta-auth and meta-enc.

    {  "example": {    "credit-card": {      "number": /* encrypted */,      "expiration": /* encrypted */,      "ccv": /* encrypted */    },    "superfluous": {      "rewards-member": null    }  },  "meta-auth": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true   /* meta-enc */  ]),  "meta-enc": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false  /* example.superfluous.rewards-member */  ]),  "signature": /* -- snip -- */}

    When you go to append data to an existing record, you’ll need to update meta-auth to include the mapping of fields based on this canonical ordering to ensure only the intended fields get validated.

    When you update your code to add an additional field that is intended to be signed, you can roll that out for new records and the record will continue to be self-describing:

    • New records will have the additional field flagged as authenticated in meta-auth (and meta-enc will grow)
    • Old records will not, but your code will still sign them successfully
    • To prevent downgrade attacks, simply include a schema version ID as an additional plaintext field that gets authenticated. An attacker who tries to downgrade will need to be able to produce a valid signature too.

    You might think meta-auth gives an attacker some advantage, but this only includes which fields are included in the security boundary of the signature or MAC, which allows unauthenticated data to be appended for whatever operational purpose without having to update signatures or expose signing keys to a wider part of the network.

    {  "example": {    "credit-card": {      "number": /* encrypted */,      "expiration": /* encrypted */,      "ccv": /* encrypted */    },    "superfluous": {      "rewards-member": null    }  },  "meta-auth": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true,  /* meta-enc */    true   /* meta-version */  ]),  "meta-enc": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true   /* meta-version */  ]),  "meta-version": 0x01000000,  "signature": /* -- snip -- */}

    If an attacker tries to use the meta-auth field to mess with a record, the best they can hope for is an Invalid Signature exception (assuming the signature algorithm is secure to begin with).

    Even if they keep all of the fields the same, but play around with the structure of the record (e.g. changing the XPath or equivalent), so long as the path is authenticated with each field, breaking this is computationally infeasible.

    Searchable Encryption

    If you’ve managed to make it through the previous sections, congratulations, you now know enough to build a secure but completely useless database.

    Art: CMYKat

    Okay, put away the pitchforks; I will explain.

    Part of the reason why we store data in a database, rather than a flat file, is because we want to do more than just read and write. Sometimes computer scientists want to compute. Almost always, you want to be able to query your database for a subset of records based on your specific business logic needs.

    And so, a database which doesn’t do anything more than store ciphertext and maybe signatures is pretty useless to most people. You’d have better luck selling Monkey JPEGs to furries than convincing most businesses to part with their precious database-driven report generators.

    Art: Sophie

    So whenever one of your users wants to actually use their data, rather than just store it, they’re forced to decide between two mutually exclusive options:

    1. Encrypting the data, to protect it from unauthorized disclosure, but render it useless
    2. Doing anything useful with the data, but leaving it unencrypted in the database

    This is especially annoying for business types that are all in on the Zero Trust buzzword.

    Fortunately, the cryptographers are at it again, and boy howdy do they have a lot of solutions for this problem.

    Order-{Preserving, Revealing} Encryption

    On the fun side of things, you have things like Order-Preserving and Order-Revealing Encryption, which Matthew Green wrote about at length.

    [D]atabase encryption has been a controversial subject in our field. I wish I could say that there’s been an actual debate, but it’s more that different researchers have fallen into different camps, and nobody has really had the data to make their position in a compelling way. There have actually been some very personal arguments made about it.

    Attack of the week: searchable encryption and the ever-expanding leakage function

    The problem with these designs is that they have a significant enough leakage that it no longer provides semantic security.

    From Grubbs, et al. (GLMP, 2019.)
    Colors inverted to fit my blog’s theme better.

    To put it in other words: These designs are only marginally better than ECB mode, and probably deserve their own poems too.

    Order revealing
    Reveals much more than order
    Softcore ECB

    Order preserving
    Semantic security?
    Only in your dreams

    Haiku for your consideration

    Deterministic Encryption

    Here’s a simpler, but also terrible, idea for searchable encryption: Simply give up on semantic security entirely.

    If you recall the AES_{De,En}crypt() functions built into MySQL I mentioned at the start of this article, those are the most common form of deterministic encryption I’ve seen in use.

     SELECT * FROM foo WHERE bar = AES_Encrypt('query', 'key');

    However, there are slightly less bad variants. If you use AES-GCM-SIV with a static nonce, your ciphertexts are fully deterministic, and you can encrypt a small number of distinct records safely before you’re no longer secure.

    From Page 14 of the linked paper. Full view.

    That’s certainly better than nothing, but you also can’t mitigate confused deputy attacks. But we can do better than this.

    Homomorphic Encryption

    In a safer plane of academia, you’ll find homomorphic encryption, which researchers recently demonstrated with serving Wikipedia pages in a reasonable amount of time.

    Homomorphic encryption allows computations over the ciphertext, which will be reflected in the plaintext, without ever revealing the key to the entity performing the computation.

    If this sounds vaguely similar to the conditions that enable chosen-ciphertext attacks, you probably have a good intuition for how it works: RSA is homomorphic to multiplication, AES-CTR is homomorphic to XOR. Fully homomorphic encryption uses lattices, which enables multiple operations but carries a relatively enormous performance cost.

    Art: Harubaki

    Homomorphic encryption sometimes intersects with machine learning, because the notion of training an encrypted model by feeding it encrypted data, then decrypting it after-the-fact is desirable for certain business verticals. Your data scientists never see your data, and you have some plausible deniability about the final ML model this work produces. This is like a Siren song for Venture Capitalist-backed medical technology companies. Tech journalists love writing about it.

    However, a less-explored use case is the ability to encrypt your programs but still get the correct behavior and outputs. Although this sounds like a DRM technology, it’s actually something that individuals could one day use to prevent their ISPs or cloud providers from knowing what software is being executed on the customer’s leased hardware. The potential for a privacy win here is certainly worth pondering, even if you’re a tried and true Pirate Party member.

    Just say “NO” to the copyright cartels.

    Art: CMYKat

    Searchable Symmetric Encryption (SSE)

    Forget about working at the level of fields and rows or individual records. What if we, instead, worked over collections of documents, where each document is viewed as a set of keywords from a keyword space?

    Art: CMYKat

    That’s the basic premise of SSE: Encrypting collections of documents rather than individual records.

    The actual implementation details differ greatly between designs. They also differ greatly in their leakage profiles and susceptibility to side-channel attacks.

    Some schemes use a so-called trapdoor permutation, such as RSA, as one of their building blocks.

    Some schemes only allow for searching a static set of records, while others can accommodate new data over time (with the trade-off between more leakage or worse performance).

    If you’re curious, you can learn more about SSE here, and see some open source SEE implementations online here.

    You’re probably wondering, “If SSE is this well-studied and there are open source implementations available, why isn’t it more widely used?”

    Your guess is as good as mine, but I can think of a few reasons:

    1. The protocols can be a little complicated to implement, and aren’t shipped by default in cryptography libraries (i.e. OpenSSL’s libcrypto or libsodium).
    2. Every known security risk in SSE is the product of a trade-offs, rather than there being a single winner for all use cases that developers can feel comfortable picking.
    3. Insufficient marketing and developer advocacy.
      SSE schemes are mostly of interest to academics, although Seny Kamara (Brown Univeristy professior and one of the luminaries of searchable encryption) did try to develop an app called Pixek which used SSE to encrypt photos.

    Maybe there’s room for a cryptography competition on searchable encryption schemes in the future.

    You Can Have Little a HMAC, As a Treat

    Finally, I can’t talk about searchable encryption without discussing a technique that’s older than dirt by Internet standards, that has been independently reinvented by countless software developers tasked with encrypting database records.

    The oldest version I’ve been able to track down dates to 2006 by Raul Garcia at Microsoft, but I’m not confident that it didn’t exist before.

    The idea I’m alluding to goes like this:

    1. Encrypt your data, securely, using symmetric cryptography.
      (Hopefully your encryption addresses the considerations outlined in the relevant sections above.)
    2. Separately, calculate an HMAC over the unencrypted data with a separate key used exclusively for indexing.

    When you need to query your data, you can just recalculate the HMAC of your challenge and fetch the records that match it. Easy, right?

    Even if you rotate your keys for encryption, you keep your indexing keys static across your entire data set. This lets you have durable indexes for encrypted data, which gives you the ability to do literal lookups for the performance hit of a hash function.

    Additionally, everyone has HMAC in their toolkit, so you don’t have to move around implementations of complex cryptographic building blocks. You can live off the land. What’s not to love?

    Hooray!

    However, if you stopped here, we regret to inform you that your data is no longer indistinguishable from random, which probably undermines the security proof for your encryption scheme.

    How annoying!

    Of course, you don’t have to stop with the addition of plain HMAC to your database encryption software.

    Take a page from Troy Hunt: Truncate the output to provide k-anonymity rather than a direct literal look-up.

    “K-What Now?”

    Imagine you have a full HMAC-SHA256 of the plaintext next to every ciphertext record with a static key, for searchability.

    Each HMAC output corresponds 1:1 with a unique plaintext.

    Because you’re using HMAC with a secret key, an attacker can’t just build a rainbow table like they would when attempting password cracking, but it still leaks duplicate plaintexts.

    For example, an HMAC-SHA256 output might look like this: 04a74e4c0158e34a566785d1a5e1167c4e3455c42aea173104e48ca810a8b1ae

    Art: CMYKat\

    If you were to slice off most of those bytes (e.g. leaving only the last 3, which in the previous example yields a8b1ae), then with sufficient records, multiple plaintexts will now map to the same truncated HMAC tag.

    Which means if you’re only revealing a truncated HMAC tag to the database server (both when storing records or retrieving them), you can now expect false positives due to collisions in your truncated HMAC tag.

    These false positives give your data a discrete set of anonymity (called k-anonymity), which means an attacker with access to your database cannot:

    1. Distinguish between two encrypted records with the same short HMAC tag.
    2. Reverse engineer the short HMAC tag into a single possible plaintext value, even if they can supply candidate queries and study the tags sent to the database.
    Art: CMYKat\

    As with SSE above, this short HMAC technique exposes a trade-off to users.

    • Too much k-anonymity (i.e. too many false positives), and you will have to decrypt-then-discard multiple mismatching records. This can make queries slow.
    • Not enough k-anonymity (i.e. insufficient false positives), and you’re no better off than a full HMAC.

    Even more troublesome, the right amount to truncate is expressed in bits (not bytes), and calculating this value depends on the number of unique plaintext values you anticipate in your dataset. (Fortunately, it grows logarithmically, so you’ll rarely if ever have to tune this.)

    If you’d like to play with this idea, here’s a quick and dirty demo script.

    Intermission

    If you started reading this post with any doubts about Cendyne’s statement that “Database cryptography is hard”, by making it to this point, they’ve probably been long since put to rest.

    Art: Harubaki

    Conversely, anyone that specializes in this topic is probably waiting for me to say anything novel or interesting; their patience wearing thin as I continue to rehash a surface-level introduction of their field without really diving deep into anything.

    Thus, if you’ve read this far, I’d like to demonstrate the application of what I’ve covered thus far into a real-world case study into an database cryptography product.

    Case Study: MongoDB Client-Side Encryption

    MongoDB is an open source schema-free NoSQL database. Last year, MongoDB made waves when they announced Queryable Encryption in their upcoming client-side encryption release.

    Taken from the press release, but adapted for dark themes.

    A statement at the bottom of their press release indicates that this isn’t clown-shoes:

    Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz, who are pioneers in the field of encrypted search. The Group conducts cutting-edge peer-reviewed research in cryptography and works with MongoDB engineering teams to transfer and deploy the latest innovations in cryptography and privacy to the MongoDB data platform.

    If you recall, I mentioned Seny Kamara in the SSE section of this post. They certainly aren’t wrong about Kamara and Moataz being pioneers in this field.

    So with that in mind, let’s explore the implementation in libmongocrypt and see how it stands up to scrutiny.

    MongoCrypt: The Good

    MongoDB’s encryption library takes key management seriously: They provide a KMS integration for cloud users by default (supporting both AWS and Azure).

    MongoDB uses Encrypt-then-MAC with AES-CBC and HMAC-SHA256, which is congruent to what Signal does for message encryption.

    How Is Queryable Encryption Implemented?

    From the current source code, we can see that MongoCrypt generates several different types of tokens, using HMAC (calculation defined here).

    According to their press release:

    The feature supports equality searches, with additional query types such as range, prefix, suffix, and substring planned for future releases.

    MongoDB Queryable Encryption Announcement

    Which means that most of the juicy details probably aren’t public yet.

    These HMAC-derived tokens are stored wholesale in the data structure, but most are encrypted before storage using AES-CTR.

    There are more layers of encryption (using AEAD), server-side token processing, and more AES-CTR-encrypted edge tokens. All of this is finally serialized (implementation) as one blob for storage.

    Since only the equality operation is currently supported (which is the same feature you’d get from HMAC), it’s difficult to speculate what the full feature set looks like.

    However, since Kamara and Moataz are leading its development, it’s likely that this feature set will be excellent.

    MongoCrypt: The Bad

    Every call to do_encrypt() includes at most the Key ID (but typically NULL) as the AAD. This means that the concerns over Confused Deputies (and NoSQL specifically) are relevant to MongoDB.

    However, even if they did support authenticating the fully qualified path to a field in the AAD for their encryption, their AEAD construction is vulnerable to the kind of canonicalization attack I wrote about previously.

    First, observe this code which assembles the multi-part inputs into HMAC.

    /* Construct the input to the HMAC */uint32_t num_intermediates = 0;_mongocrypt_buffer_t intermediates[3];// -- snip --if (!_mongocrypt_buffer_concat (  &to_hmac, intermediates, num_intermediates)) {   CLIENT_ERR ("failed to allocate buffer");   goto done;}if (hmac == HMAC_SHA_512_256) {   uint8_t storage[64];   _mongocrypt_buffer_t tag = {.data = storage, .len = sizeof (storage)};   if (!_crypto_hmac_sha_512 (crypto, Km, &to_hmac, &tag, status)) {      goto done;   }   // Truncate sha512 to first 256 bits.   memcpy (out->data, tag.data, MONGOCRYPT_HMAC_LEN);} else {   BSON_ASSERT (hmac == HMAC_SHA_256);   if (!_mongocrypt_hmac_sha_256 (crypto, Km, &to_hmac, out, status)) {      goto done;   }}

    The implementation of _mongocrypt_buffer_concat() can be found here.

    If either the implementation of that function, or the code I snipped from my excerpt, had contained code that prefixed every segment of the AAD with the length of the segment (represented as a uint64_t to make overflow infeasible), then their AEAD mode would not be vulnerable to canonicalization issues.

    Using TupleHash would also have prevented this issue.

    Silver lining for MongoDB developers: Because the AAD is either a key ID or NULL, this isn’t exploitable in practice.

    The first cryptographic flaw sort of cancels the second out.

    If the libmongocrypt developers ever want to mitigate Confused Deputy attacks, they’ll need to address this canonicalization issue too.

    MongoCrypt: The Ugly

    MongoCrypt supports deterministic encryption.

    If you specify deterministic encryption for a field, your application passes a deterministic initialization vector to AEAD.

    MongoDB documentation

    We already discussed why this is bad above.

    Wrapping Up

    This was not a comprehensive treatment of the field of database cryptography. There are many areas of this field that I did not cover, nor do I feel qualified to discuss.

    However, I hope anyone who takes the time to read this finds themselves more familiar with the subject.

    Additionally, I hope any developers who think “encrypting data in a database is [easy, trivial] (select appropriate)” will find this broad introduction a humbling experience.

    Art: CMYKat

    https://soatok.blog/2023/03/01/database-cryptography-fur-the-rest-of-us/

    #appliedCryptography #blockCipherModes #cryptography #databaseCryptography #databases #encryptedSearch #HMAC #MongoCrypt #MongoDB #QueryableEncryption #realWorldCryptography #security #SecurityGuidance #SQL #SSE #symmetricCryptography #symmetricSearchableEncryption

  8. Earlier this year, Cendyne wrote a blog post covering the use of HKDF, building partially upon my own blog post about HKDF and the KDF security definition, but moreso inspired by a cryptographic issue they identified in another company’s product (dubbed AnonCo).

    At the bottom they teased:

    Database cryptography is hard. The above sketch is not complete and does not address several threats! This article is quite long, so I will not be sharing the fixes.

    Cendyne

    If you read Cendyne’s post, you may have nodded along with that remark and not appreciate the degree to which our naga friend was putting it mildly. So I thought I’d share some of my knowledge about real-world database cryptography in an accessible and fun format in the hopes that it might serve as an introduction to the specialization.

    Note: I’m also not going to fix Cendyne’s sketch of AnonCo’s software here–partly because I don’t want to get in the habit of assigning homework or required reading, but mostly because it’s kind of obvious once you’ve learned the basics.

    I’m including art of my fursona in this post… as is tradition for furry blogs.

    If you don’t like furries, please feel free to leave this blog and read about this topic elsewhere.

    Thanks to CMYKat for the awesome stickers.

    Contents

    • Database Cryptography?
    • Cryptography for Relational Databases
      • The Perils of Built-in Encryption Functions
      • Application-Layer Relational Database Cryptography
        • Confused Deputies
        • Canonicalization Attacks
        • Multi-Tenancy
    • Cryptography for NoSQL Databases
      • NoSQL is Built Different
      • Record Authentication
        • Bonus: A Maximally Schema-Free, Upgradeable Authentication Design
    • Searchable Encryption
      • Order-{Preserving, Revealing} Encryption
      • Deterministic Encryption
      • Homomorphic Encryption
      • Searchable Symmetric Encryption (SSE)
      • You Can Have Little a HMAC, As a Treat
    • Intermission
    • Case Study: MongoDB Client-Side Encryption
      • MongoCrypt: The Good
        • How is Queryable Encryption Implemented?
      • MongoCrypt: The Bad
      • MongoCrypt: The Ugly
    • Wrapping Up

    Database Cryptography?

    The premise of database cryptography is deceptively simple: You have a database, of some sort, and you want to store sensitive data in said database.

    The consequences of this simple premise are anything but simple. Let me explain.

    Art: ScruffKerfluff

    The sensitive data you want to store may need to remain confidential, or you may need to provide some sort of integrity guarantees throughout your entire system, or sometimes both. Sometimes all of your data is sensitive, sometimes only some of it is. Sometimes the confidentiality requirements of your data extends to where within a dataset the record you want actually lives. Sometimes that’s true of some data, but not others, so your cryptography has to be flexible to support multiple types of workloads.

    Other times, you just want your disks encrypted at rest so if they grow legs and walk out of the data center, the data cannot be comprehended by an attacker. And you can’t be bothered to work on this problem any deeper. This is usually what compliance requirements cover. Boxes get checked, executives feel safer about their operation, and the whole time nobody has really analyzed the risks they’re facing.

    But we’re not settling for mere compliance on this blog. Furries have standards, after all.

    So the first thing you need to do before diving into database cryptography is threat modelling. The first step in any good threat model is taking inventory; especially of assumptions, requirements, and desired outcomes. A few good starter questions:

    1. What database software is being used? Is it up to date?
    2. What data is being stored in which database software?
    3. How are databases oriented in the network of the overall system?
      • Is your database properly firewalled from the public Internet?
    4. How does data flow throughout the network, and when do these data flows intersect with the database?
      • Which applications talk to the database? What languages are they written in? Which APIs do they use?
    5. How will cryptography secrets be managed?
      • Is there one key for everyone, one key per tenant, etc.?
      • How are keys rotated?
      • Do you use envelope encryption with an HSM, or vend the raw materials to your end devices?

    The first two questions are paramount for deciding how to write software for database cryptography, before you even get to thinking about the cryptography itself.

    (This is not a comprehensive set of questions to ask, either. A formal threat model is much deeper in the weeds.)

    The kind of cryptography protocol you need for, say, storing encrypted CSV files an S3 bucket is vastly different from relational (SQL) databases, which in turn will be significantly different from schema-free (NoSQL) databases.

    Furthermore, when you get to the point that you can start to think about the cryptography, you’ll often need to tackle confidentiality and integrity separately.

    If that’s unclear, think of a scenario like, “I need to encrypt PII, but I also need to digitally sign the lab results so I know it wasn’t tampered with at rest.”

    My point is, right off the bat, we’ve got a three-dimensional matrix of complexity to contend with:

    1. On one axis, we have the type of database.
      • Flat-file
      • Relational
      • Schema-free
    2. On another, we have the basic confidentiality requirements of the data.
      • Field encryption
      • Row encryption
      • Column encryption
      • Unstructured record encryption
      • Encrypting entire collections of records
    3. Finally, we have the integrity requirements of the data.
      • Field authentication
      • Row/column authentication
      • Unstructured record authentication
      • Collection authentication (based on e.g. Sparse Merkle Trees)

    And then you have a fourth dimension that often falls out of operational requirements for databases: Searchability.

    Why store data in a database if you have no way to index or search the data for fast retrieval?

    Credit: Harubaki

    If you’re starting to feel overwhelmed, you’re not alone. A lot of developers drastically underestimate the difficulty of the undertaking, until they run head-first into the complexity.

    Some just phone it in with AES_Encrypt() calls in their MySQL queries. (Too bad ECB mode doesn’t provide semantic security!)

    Which brings us to the meat of this blog post: The actual cryptography part.

    Cryptography is the art of transforming information security problems into key management problems.

    Former coworker

    Note: In the interest of time, I’m skipping over flat files and focusing instead on actual database technologies.

    Cryptography for Relational Databases

    Encrypting data in an SQL database seems simple enough, even if you’ve managed to shake off the complexity I teased from the introduction.

    You’ve got data, you’ve got a column on a table. Just encrypt the data and shove it in a cell on that column and call it a day, right?

    But, alas, this is a trap. There are so many gotchas that I can’t weave a coherent, easy-to-follow narrative between them all.

    So let’s start with a simple question: where and how are you performing your encryption?

    The Perils of Built-in Encryption Functions

    MySQL provides functions called AES_Encrypt and AES_Decrypt, which many developers have unfortunately decided to rely on in the past.

    It’s unfortunate because these functions implement ECB mode. To illustrate why ECB mode is bad, I encrypted one of my art commissions with AES in ECB mode:

    Art by Riley, encrypted with AES-ECB

    The problems with ECB mode aren’t exactly “you can see the image through it,” because ECB-encrypting a compressed image won’t have redundancy (and thus can make you feel safer than you are).

    ECB art is a good visual for the actual issue you should care about, however: A lack of semantic security.

    A cryptosystem is considered semantically secure if observing the ciphertext doesn’t reveal information about the plaintext (except, perhaps, the length; which all cryptosystems leak to some extent). More information here.

    ECB art isn’t to be confused with ECB poetry, which looks like this:

    Oh little one, you’re growing up
    You’ll soon be writing C
    You’ll treat your ints as pointers
    You’ll nest the ternary
    You’ll cut and paste from github
    And try cryptography
    But even in your darkest hour
    Do not use ECB

    CBC’s BEASTly when padding’s abused
    And CTR’s fine til a nonce is reused
    Some say it’s a CRIME to compress then encrypt
    Or store keys in the browser (or use javascript)
    Diffie Hellman will collapse if hackers choose your g
    And RSA is full of traps when e is set to 3
    Whiten! Blind! In constant time! Don’t write an RNG!
    But failing all, and listen well: Do not use ECB

    They’ll say “It’s like a one-time-pad!
    The data’s short, it’s not so bad
    the keys are long–they’re iron clad
    I have a PhD!”
    And then you’re front page Hacker News
    Your passwords cracked–Adobe Blues.
    Don’t leave your penguins showing through,
    Do not use ECB

    — Ben Nagy, PoC||GTFO 0x04:13

    Most people reading this probably know better than to use ECB mode already, and don’t need any of these reminders, but there is still a lot of code that inadvertently uses ECB mode to encrypt data in the database.

    Also, SHOW processlist; leaks your encryption keys. Oops.

    Credit: CMYKatt

    Application-layer Relational Database Cryptography

    Whether burned by ECB or just cautious about not giving your secrets to the system that stores all the ciphertext protected by said secret, a common next step for developers is to simply encrypt in their server-side application code.

    And, yes, that’s part of the answer. But how you encrypt is important.

    Credit: Harubaki

    “I’ll encrypt with CBC mode.”
    If you don’t authenticate your ciphertext, you’ll be sorry. Maybe try again?

    “Okay, fine, I’ll use an authenticated mode like GCM.”
    Did you remember to make the table and column name part of your AAD? What about the primary key of the record?

    “What on Earth are you talking about, Soatok?”
    Welcome to the first footgun of database cryptography!

    Confused Deputies

    Encrypting your sensitive data is necessary, but not sufficient. You need to also bind your ciphertexts to the specific context in which they are stored.

    To understand why, let’s take a step back: What specific threat does encrypting your database records protect against?

    We’ve already established that “your disks walk out of the datacenter” is a “full disk encryption” problem, so if you’re using application-layer cryptography to encrypt data in a relational database, your threat model probably involves unauthorized access to the database server.

    What, then, stops an attacker from copying ciphertexts around?

    Credit: CMYKatt

    Let’s say I have a legitimate user account with an ID 12345, and I want to read your street address, but it’s encrypted in the database. But because I’m a clever hacker, I have unfettered access to your relational database server.

    All I would need to do is simply…

    UPDATE table SET addr_encrypted = 'your-ciphertext' WHERE id = 12345

    …and then access the application through my legitimate access. Bam, data leaked. As an attacker, I can probably even copy fields from other columns and it will just decrypt. Even if you’re using an authenticated mode.

    We call this a confused deputy attack, because the deputy (the component of the system that has been delegated some authority or privilege) has become confused by the attacker, and thus undermined an intended security goal.

    The fix is to use the AAD parameter from the authenticated mode to bind the data to a given context. (AAD = Additional Authenticated Data.)

    - $addr = aes_gcm_encrypt($addr, $key);+ $addr = aes_gcm_encrypt($addr, $key, canonicalize([+     $tableName,+     $columnName,+     $primaryKey+ ]);

    Now if I start cutting and pasting ciphertexts around, I get a decryption failure instead of silently decrypting plaintext.

    This may sound like a specific vulnerability, but it’s more of a failure to understand an important general lesson with database cryptography:

    Where your data lives is part of its identity, and MUST be authenticated.

    Soatok’s Rule of Database Cryptography

    Canonicalization Attacks

    In the previous section, I introduced a pseudocode called canonicalize(). This isn’t a pasto from some reference code; it’s an important design detail that I will elaborate on now.

    First, consider you didn’t do anything to canonicalize your data, and you just joined strings together and called it a day…

    function dumbCanonicalize(    string $tableName,    string $columnName,    string|int $primaryKey): string {    return $tableName . '_' . $columnName . '#' . $primaryKey;}

    Consider these two inputs to this function:

    1. dumbCanonicalize('customers', 'last_order_uuid', 123);
    2. dumbCanonicalize('customers_last_order', 'uuid', 123);

    In this case, your AAD would be the same, and therefore, your deputy can still be confused (albeit in a narrower use case).

    In Cendyne’s article, AnonCo did something more subtle: The canonicalization bug created a collision on the inputs to HKDF, which resulted in an unintentional key reuse.

    Up until this point, their mistake isn’t relevant to us, because we haven’t even explored key management at all. But the same design flaw can re-emerge in multiple locations, with drastically different consequence.

    Multi-Tenancy

    Once you’ve implemented a mitigation against Confused Deputies, you may think your job is done. And it very well could be.

    Often times, however, software developers are tasked with building support for Bring Your Own Key (BYOK).

    This is often spawned from a specific compliance requirement (such as cryptographic shredding; i.e. if you erase the key, you can no longer recover the plaintext, so it may as well be deleted).

    Other times, this is driven by a need to cut costs: Storing different users’ data in the same database server, but encrypting it such that they can only encrypt their own records.

    Two things can happen when you introduce multi-tenancy into your database cryptography designs:

    1. Invisible Salamanders becomes a risk, due to multiple keys being possible for any given encrypted record.
    2. Failure to address the risk of Invisible Salamanders can undermine your protection against Confused Deputies, thereby returning you to a state before you properly used the AAD.

    So now you have to revisit your designs and ensure you’re using a key-committing authenticated mode, rather than just a regular authenticated mode.

    Isn’t cryptography fun?

    “What Are Invisible Salamanders?”

    This refers to a fun property of AEAD modes based on Polynomical MACs. Basically, if you:

    1. Encrypt one message under a specific key and nonce.
    2. Encrypt another message under a separate key and nonce.

    …Then you can get the same exact ciphertext and authentication tag. Performing this attack requires you to control the keys for both encryption operations.

    This was first demonstrated in an attack against encrypted messaging applications, where a picture of a salamander was hidden from the abuse reporting feature because another attached file had the same authentication tag and ciphertext, and you could trick the system if you disclosed the second key instead of the first. Thus, the salamander is invisible to attackers.

    Art: CMYKat

    We’re not quite done with relational databases yet, but we should talk about NoSQL databases for a bit. The final topic in scope applies equally to both, after all.

    Cryptography for NoSQL Databases

    Most of the topics from relational databases also apply to NoSQL databases, so I shall refrain from duplicating them here. This article is already sufficiently long to read, after all, and I dislike redundancy.

    NoSQL is Built Different

    The main thing that NoSQL databases offer in the service of making cryptographers lose sleep at night is the schema-free nature of NoSQL designs.

    What this means is that, if you’re using a client-side encryption library for a NoSQL database, the previous concerns about confused deputy attacks are amplified by the malleability of the document structure.

    Additionally, the previously discussed cryptographic attacks against the encryption mode may be less expensive for an attacker to pull off.

    Consider the following record structure, which stores a bunch of data stored with AES in CBC mode:

    {  "encrypted-data-key": "<blob>",  "name": "<ciphertext>",  "address": [    "<ciphertext>",    "<ciphertext>"  ],  "social-security": "<ciphertext>",  "zip-code": "<ciphertext>"}

    If this record is decrypted with code that looks something like this:

    $decrypted = [];// ... snip ...foreach ($record['address'] as $i => $addrLine) {    try {        $decrypted['address'][$i] = $this->decrypt($addrLine);    } catch (Throwable $ex) {        // You'd never deliberately do this, but it's for illustration        $this->doSomethingAnOracleCanObserve($i);                // This is more believable, of course:        $this->logDecryptionError($ex, $addrLine);        $decrypted['address'][$i] = '';    }}

    Then you can keep appending rows to the "address" field to reduce the number of writes needed to exploit a padding oracle attack against any of the <ciphertext> fields.

    Art: Harubaki

    This isn’t to say that NoSQL is less secure than SQL, from the context of client-side encryption. However, the powerful feature sets that NoSQL users are accustomed to may also give attackers a more versatile toolkit to work with.

    Record Authentication

    A pedant may point out that record authentication applies to both SQL and NoSQL. However, I mostly only observe this feature in NoSQL databases and document storage systems in the wild, so I’m shoving it in here.

    Encrypting fields is nice and all, but sometimes what you want to know is that your unencrypted data hasn’t been tampered with as it flows through your system.

    The trivial way this is done is by using a digital signature algorithm over the whole record, and then appending the signature to the end. When you go to verify the record, all of the information you need is right there.

    This works well enough for most use cases, and everyone can pack up and go home. Nothing more to see here.

    Except…

    When you’re working with NoSQL databases, you often want systems to be able to write to additional fields, and since you’re working with schema-free blobs of data rather than a normalized set of relatable tables, the most sensible thing to do is to is to append this data to the same record.

    Except, oops! You can’t do that if you’re shoving a digital signature over the record. So now you need to specify which fields are to be included in the signature.

    And you need to think about how to model that in a way that doesn’t prohibit schema upgrades nor allow attackers to perform downgrade attacks. (See below.)

    I don’t have any specific real-world examples here that I can point to of this problem being solved well.

    Art: CMYKat

    Furthermore, as with preventing confused deputy and/or canonicalization attacks above, you must also include the fully qualified path of each field in the data that gets signed.

    As I said with encryption before, but also true here:

    Where your data lives is part of its identity, and MUST be authenticated.

    Soatok’s Rule of Database Cryptography

    This requirement holds true whether you’re using symmetric-key authentication (i.e. HMAC) or asymmetric-key digital signatures (e.g. EdDSA).

    Bonus: A Maximally Schema-Free, Upgradeable Authentication Design

    Art: Harubaki

    Okay, how do you solve this problem so that you can perform updates and upgrades to your schema but without enabling attackers to downgrade the security? Here’s one possible design.

    Let’s say you have two metadata fields on each record:

    1. A compressed binary string representing which fields should be authenticated. This field is, itself, not authenticated. Let’s call this meta-auth.
    2. A compressed binary string representing which of the authenticated fields should also be encrypted. This field is also authenticated. This is at most the same length as the first metadata field. Let’s call this meta-enc.

    Furthermore, you will specify a canonical field ordering for both how data is fed into the signature algorithm as well as the field mappings in meta-auth and meta-enc.

    {  "example": {    "credit-card": {      "number": /* encrypted */,      "expiration": /* encrypted */,      "ccv": /* encrypted */    },    "superfluous": {      "rewards-member": null    }  },  "meta-auth": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true   /* meta-enc */  ]),  "meta-enc": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false  /* example.superfluous.rewards-member */  ]),  "signature": /* -- snip -- */}

    When you go to append data to an existing record, you’ll need to update meta-auth to include the mapping of fields based on this canonical ordering to ensure only the intended fields get validated.

    When you update your code to add an additional field that is intended to be signed, you can roll that out for new records and the record will continue to be self-describing:

    • New records will have the additional field flagged as authenticated in meta-auth (and meta-enc will grow)
    • Old records will not, but your code will still sign them successfully
    • To prevent downgrade attacks, simply include a schema version ID as an additional plaintext field that gets authenticated. An attacker who tries to downgrade will need to be able to produce a valid signature too.

    You might think meta-auth gives an attacker some advantage, but this only includes which fields are included in the security boundary of the signature or MAC, which allows unauthenticated data to be appended for whatever operational purpose without having to update signatures or expose signing keys to a wider part of the network.

    {  "example": {    "credit-card": {      "number": /* encrypted */,      "expiration": /* encrypted */,      "ccv": /* encrypted */    },    "superfluous": {      "rewards-member": null    }  },  "meta-auth": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true,  /* meta-enc */    true   /* meta-version */  ]),  "meta-enc": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true   /* meta-version */  ]),  "meta-version": 0x01000000,  "signature": /* -- snip -- */}

    If an attacker tries to use the meta-auth field to mess with a record, the best they can hope for is an Invalid Signature exception (assuming the signature algorithm is secure to begin with).

    Even if they keep all of the fields the same, but play around with the structure of the record (e.g. changing the XPath or equivalent), so long as the path is authenticated with each field, breaking this is computationally infeasible.

    Searchable Encryption

    If you’ve managed to make it through the previous sections, congratulations, you now know enough to build a secure but completely useless database.

    Art: CMYKat

    Okay, put away the pitchforks; I will explain.

    Part of the reason why we store data in a database, rather than a flat file, is because we want to do more than just read and write. Sometimes computer scientists want to compute. Almost always, you want to be able to query your database for a subset of records based on your specific business logic needs.

    And so, a database which doesn’t do anything more than store ciphertext and maybe signatures is pretty useless to most people. You’d have better luck selling Monkey JPEGs to furries than convincing most businesses to part with their precious database-driven report generators.

    Art: Sophie

    So whenever one of your users wants to actually use their data, rather than just store it, they’re forced to decide between two mutually exclusive options:

    1. Encrypting the data, to protect it from unauthorized disclosure, but render it useless
    2. Doing anything useful with the data, but leaving it unencrypted in the database

    This is especially annoying for business types that are all in on the Zero Trust buzzword.

    Fortunately, the cryptographers are at it again, and boy howdy do they have a lot of solutions for this problem.

    Order-{Preserving, Revealing} Encryption

    On the fun side of things, you have things like Order-Preserving and Order-Revealing Encryption, which Matthew Green wrote about at length.

    [D]atabase encryption has been a controversial subject in our field. I wish I could say that there’s been an actual debate, but it’s more that different researchers have fallen into different camps, and nobody has really had the data to make their position in a compelling way. There have actually been some very personal arguments made about it.

    Attack of the week: searchable encryption and the ever-expanding leakage function

    The problem with these designs is that they have a significant enough leakage that it no longer provides semantic security.

    From Grubbs, et al. (GLMP, 2019.)
    Colors inverted to fit my blog’s theme better.

    To put it in other words: These designs are only marginally better than ECB mode, and probably deserve their own poems too.

    Order revealing
    Reveals much more than order
    Softcore ECB

    Order preserving
    Semantic security?
    Only in your dreams

    Haiku for your consideration

    Deterministic Encryption

    Here’s a simpler, but also terrible, idea for searchable encryption: Simply give up on semantic security entirely.

    If you recall the AES_{De,En}crypt() functions built into MySQL I mentioned at the start of this article, those are the most common form of deterministic encryption I’ve seen in use.

     SELECT * FROM foo WHERE bar = AES_Encrypt('query', 'key');

    However, there are slightly less bad variants. If you use AES-GCM-SIV with a static nonce, your ciphertexts are fully deterministic, and you can encrypt a small number of distinct records safely before you’re no longer secure.

    From Page 14 of the linked paper. Full view.

    That’s certainly better than nothing, but you also can’t mitigate confused deputy attacks. But we can do better than this.

    Homomorphic Encryption

    In a safer plane of academia, you’ll find homomorphic encryption, which researchers recently demonstrated with serving Wikipedia pages in a reasonable amount of time.

    Homomorphic encryption allows computations over the ciphertext, which will be reflected in the plaintext, without ever revealing the key to the entity performing the computation.

    If this sounds vaguely similar to the conditions that enable chosen-ciphertext attacks, you probably have a good intuition for how it works: RSA is homomorphic to multiplication, AES-CTR is homomorphic to XOR. Fully homomorphic encryption uses lattices, which enables multiple operations but carries a relatively enormous performance cost.

    Art: Harubaki

    Homomorphic encryption sometimes intersects with machine learning, because the notion of training an encrypted model by feeding it encrypted data, then decrypting it after-the-fact is desirable for certain business verticals. Your data scientists never see your data, and you have some plausible deniability about the final ML model this work produces. This is like a Siren song for Venture Capitalist-backed medical technology companies. Tech journalists love writing about it.

    However, a less-explored use case is the ability to encrypt your programs but still get the correct behavior and outputs. Although this sounds like a DRM technology, it’s actually something that individuals could one day use to prevent their ISPs or cloud providers from knowing what software is being executed on the customer’s leased hardware. The potential for a privacy win here is certainly worth pondering, even if you’re a tried and true Pirate Party member.

    Just say “NO” to the copyright cartels.

    Art: CMYKat

    Searchable Symmetric Encryption (SSE)

    Forget about working at the level of fields and rows or individual records. What if we, instead, worked over collections of documents, where each document is viewed as a set of keywords from a keyword space?

    Art: CMYKat

    That’s the basic premise of SSE: Encrypting collections of documents rather than individual records.

    The actual implementation details differ greatly between designs. They also differ greatly in their leakage profiles and susceptibility to side-channel attacks.

    Some schemes use a so-called trapdoor permutation, such as RSA, as one of their building blocks.

    Some schemes only allow for searching a static set of records, while others can accommodate new data over time (with the trade-off between more leakage or worse performance).

    If you’re curious, you can learn more about SSE here, and see some open source SEE implementations online here.

    You’re probably wondering, “If SSE is this well-studied and there are open source implementations available, why isn’t it more widely used?”

    Your guess is as good as mine, but I can think of a few reasons:

    1. The protocols can be a little complicated to implement, and aren’t shipped by default in cryptography libraries (i.e. OpenSSL’s libcrypto or libsodium).
    2. Every known security risk in SSE is the product of a trade-offs, rather than there being a single winner for all use cases that developers can feel comfortable picking.
    3. Insufficient marketing and developer advocacy.
      SSE schemes are mostly of interest to academics, although Seny Kamara (Brown Univeristy professior and one of the luminaries of searchable encryption) did try to develop an app called Pixek which used SSE to encrypt photos.

    Maybe there’s room for a cryptography competition on searchable encryption schemes in the future.

    You Can Have Little a HMAC, As a Treat

    Finally, I can’t talk about searchable encryption without discussing a technique that’s older than dirt by Internet standards, that has been independently reinvented by countless software developers tasked with encrypting database records.

    The oldest version I’ve been able to track down dates to 2006 by Raul Garcia at Microsoft, but I’m not confident that it didn’t exist before.

    The idea I’m alluding to goes like this:

    1. Encrypt your data, securely, using symmetric cryptography.
      (Hopefully your encryption addresses the considerations outlined in the relevant sections above.)
    2. Separately, calculate an HMAC over the unencrypted data with a separate key used exclusively for indexing.

    When you need to query your data, you can just recalculate the HMAC of your challenge and fetch the records that match it. Easy, right?

    Even if you rotate your keys for encryption, you keep your indexing keys static across your entire data set. This lets you have durable indexes for encrypted data, which gives you the ability to do literal lookups for the performance hit of a hash function.

    Additionally, everyone has HMAC in their toolkit, so you don’t have to move around implementations of complex cryptographic building blocks. You can live off the land. What’s not to love?

    Hooray!

    However, if you stopped here, we regret to inform you that your data is no longer indistinguishable from random, which probably undermines the security proof for your encryption scheme.

    How annoying!

    Of course, you don’t have to stop with the addition of plain HMAC to your database encryption software.

    Take a page from Troy Hunt: Truncate the output to provide k-anonymity rather than a direct literal look-up.

    “K-What Now?”

    Imagine you have a full HMAC-SHA256 of the plaintext next to every ciphertext record with a static key, for searchability.

    Each HMAC output corresponds 1:1 with a unique plaintext.

    Because you’re using HMAC with a secret key, an attacker can’t just build a rainbow table like they would when attempting password cracking, but it still leaks duplicate plaintexts.

    For example, an HMAC-SHA256 output might look like this: 04a74e4c0158e34a566785d1a5e1167c4e3455c42aea173104e48ca810a8b1ae

    Art: CMYKat\

    If you were to slice off most of those bytes (e.g. leaving only the last 3, which in the previous example yields a8b1ae), then with sufficient records, multiple plaintexts will now map to the same truncated HMAC tag.

    Which means if you’re only revealing a truncated HMAC tag to the database server (both when storing records or retrieving them), you can now expect false positives due to collisions in your truncated HMAC tag.

    These false positives give your data a discrete set of anonymity (called k-anonymity), which means an attacker with access to your database cannot:

    1. Distinguish between two encrypted records with the same short HMAC tag.
    2. Reverse engineer the short HMAC tag into a single possible plaintext value, even if they can supply candidate queries and study the tags sent to the database.
    Art: CMYKat\

    As with SSE above, this short HMAC technique exposes a trade-off to users.

    • Too much k-anonymity (i.e. too many false positives), and you will have to decrypt-then-discard multiple mismatching records. This can make queries slow.
    • Not enough k-anonymity (i.e. insufficient false positives), and you’re no better off than a full HMAC.

    Even more troublesome, the right amount to truncate is expressed in bits (not bytes), and calculating this value depends on the number of unique plaintext values you anticipate in your dataset. (Fortunately, it grows logarithmically, so you’ll rarely if ever have to tune this.)

    If you’d like to play with this idea, here’s a quick and dirty demo script.

    Intermission

    If you started reading this post with any doubts about Cendyne’s statement that “Database cryptography is hard”, by making it to this point, they’ve probably been long since put to rest.

    Art: Harubaki

    Conversely, anyone that specializes in this topic is probably waiting for me to say anything novel or interesting; their patience wearing thin as I continue to rehash a surface-level introduction of their field without really diving deep into anything.

    Thus, if you’ve read this far, I’d like to demonstrate the application of what I’ve covered thus far into a real-world case study into an database cryptography product.

    Case Study: MongoDB Client-Side Encryption

    MongoDB is an open source schema-free NoSQL database. Last year, MongoDB made waves when they announced Queryable Encryption in their upcoming client-side encryption release.

    Taken from the press release, but adapted for dark themes.

    A statement at the bottom of their press release indicates that this isn’t clown-shoes:

    Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz, who are pioneers in the field of encrypted search. The Group conducts cutting-edge peer-reviewed research in cryptography and works with MongoDB engineering teams to transfer and deploy the latest innovations in cryptography and privacy to the MongoDB data platform.

    If you recall, I mentioned Seny Kamara in the SSE section of this post. They certainly aren’t wrong about Kamara and Moataz being pioneers in this field.

    So with that in mind, let’s explore the implementation in libmongocrypt and see how it stands up to scrutiny.

    MongoCrypt: The Good

    MongoDB’s encryption library takes key management seriously: They provide a KMS integration for cloud users by default (supporting both AWS and Azure).

    MongoDB uses Encrypt-then-MAC with AES-CBC and HMAC-SHA256, which is congruent to what Signal does for message encryption.

    How Is Queryable Encryption Implemented?

    From the current source code, we can see that MongoCrypt generates several different types of tokens, using HMAC (calculation defined here).

    According to their press release:

    The feature supports equality searches, with additional query types such as range, prefix, suffix, and substring planned for future releases.

    MongoDB Queryable Encryption Announcement

    Which means that most of the juicy details probably aren’t public yet.

    These HMAC-derived tokens are stored wholesale in the data structure, but most are encrypted before storage using AES-CTR.

    There are more layers of encryption (using AEAD), server-side token processing, and more AES-CTR-encrypted edge tokens. All of this is finally serialized (implementation) as one blob for storage.

    Since only the equality operation is currently supported (which is the same feature you’d get from HMAC), it’s difficult to speculate what the full feature set looks like.

    However, since Kamara and Moataz are leading its development, it’s likely that this feature set will be excellent.

    MongoCrypt: The Bad

    Every call to do_encrypt() includes at most the Key ID (but typically NULL) as the AAD. This means that the concerns over Confused Deputies (and NoSQL specifically) are relevant to MongoDB.

    However, even if they did support authenticating the fully qualified path to a field in the AAD for their encryption, their AEAD construction is vulnerable to the kind of canonicalization attack I wrote about previously.

    First, observe this code which assembles the multi-part inputs into HMAC.

    /* Construct the input to the HMAC */uint32_t num_intermediates = 0;_mongocrypt_buffer_t intermediates[3];// -- snip --if (!_mongocrypt_buffer_concat (  &to_hmac, intermediates, num_intermediates)) {   CLIENT_ERR ("failed to allocate buffer");   goto done;}if (hmac == HMAC_SHA_512_256) {   uint8_t storage[64];   _mongocrypt_buffer_t tag = {.data = storage, .len = sizeof (storage)};   if (!_crypto_hmac_sha_512 (crypto, Km, &to_hmac, &tag, status)) {      goto done;   }   // Truncate sha512 to first 256 bits.   memcpy (out->data, tag.data, MONGOCRYPT_HMAC_LEN);} else {   BSON_ASSERT (hmac == HMAC_SHA_256);   if (!_mongocrypt_hmac_sha_256 (crypto, Km, &to_hmac, out, status)) {      goto done;   }}

    The implementation of _mongocrypt_buffer_concat() can be found here.

    If either the implementation of that function, or the code I snipped from my excerpt, had contained code that prefixed every segment of the AAD with the length of the segment (represented as a uint64_t to make overflow infeasible), then their AEAD mode would not be vulnerable to canonicalization issues.

    Using TupleHash would also have prevented this issue.

    Silver lining for MongoDB developers: Because the AAD is either a key ID or NULL, this isn’t exploitable in practice.

    The first cryptographic flaw sort of cancels the second out.

    If the libmongocrypt developers ever want to mitigate Confused Deputy attacks, they’ll need to address this canonicalization issue too.

    MongoCrypt: The Ugly

    MongoCrypt supports deterministic encryption.

    If you specify deterministic encryption for a field, your application passes a deterministic initialization vector to AEAD.

    MongoDB documentation

    We already discussed why this is bad above.

    Wrapping Up

    This was not a comprehensive treatment of the field of database cryptography. There are many areas of this field that I did not cover, nor do I feel qualified to discuss.

    However, I hope anyone who takes the time to read this finds themselves more familiar with the subject.

    Additionally, I hope any developers who think “encrypting data in a database is [easy, trivial] (select appropriate)” will find this broad introduction a humbling experience.

    Art: CMYKat

    https://soatok.blog/2023/03/01/database-cryptography-fur-the-rest-of-us/

    #appliedCryptography #blockCipherModes #cryptography #databaseCryptography #databases #encryptedSearch #HMAC #MongoCrypt #MongoDB #QueryableEncryption #realWorldCryptography #security #SecurityGuidance #SQL #SSE #symmetricCryptography #symmetricSearchableEncryption

  9. Earlier this year, Cendyne wrote a blog post covering the use of HKDF, building partially upon my own blog post about HKDF and the KDF security definition, but moreso inspired by a cryptographic issue they identified in another company’s product (dubbed AnonCo).

    At the bottom they teased:

    Database cryptography is hard. The above sketch is not complete and does not address several threats! This article is quite long, so I will not be sharing the fixes.

    Cendyne

    If you read Cendyne’s post, you may have nodded along with that remark and not appreciate the degree to which our naga friend was putting it mildly. So I thought I’d share some of my knowledge about real-world database cryptography in an accessible and fun format in the hopes that it might serve as an introduction to the specialization.

    Note: I’m also not going to fix Cendyne’s sketch of AnonCo’s software here–partly because I don’t want to get in the habit of assigning homework or required reading, but mostly because it’s kind of obvious once you’ve learned the basics.

    I’m including art of my fursona in this post… as is tradition for furry blogs.

    If you don’t like furries, please feel free to leave this blog and read about this topic elsewhere.

    Thanks to CMYKat for the awesome stickers.

    Contents

    • Database Cryptography?
    • Cryptography for Relational Databases
      • The Perils of Built-in Encryption Functions
      • Application-Layer Relational Database Cryptography
        • Confused Deputies
        • Canonicalization Attacks
        • Multi-Tenancy
    • Cryptography for NoSQL Databases
      • NoSQL is Built Different
      • Record Authentication
        • Bonus: A Maximally Schema-Free, Upgradeable Authentication Design
    • Searchable Encryption
      • Order-{Preserving, Revealing} Encryption
      • Deterministic Encryption
      • Homomorphic Encryption
      • Searchable Symmetric Encryption (SSE)
      • You Can Have Little a HMAC, As a Treat
    • Intermission
    • Case Study: MongoDB Client-Side Encryption
      • MongoCrypt: The Good
        • How is Queryable Encryption Implemented?
      • MongoCrypt: The Bad
      • MongoCrypt: The Ugly
    • Wrapping Up

    Database Cryptography?

    The premise of database cryptography is deceptively simple: You have a database, of some sort, and you want to store sensitive data in said database.

    The consequences of this simple premise are anything but simple. Let me explain.

    Art: ScruffKerfluff

    The sensitive data you want to store may need to remain confidential, or you may need to provide some sort of integrity guarantees throughout your entire system, or sometimes both. Sometimes all of your data is sensitive, sometimes only some of it is. Sometimes the confidentiality requirements of your data extends to where within a dataset the record you want actually lives. Sometimes that’s true of some data, but not others, so your cryptography has to be flexible to support multiple types of workloads.

    Other times, you just want your disks encrypted at rest so if they grow legs and walk out of the data center, the data cannot be comprehended by an attacker. And you can’t be bothered to work on this problem any deeper. This is usually what compliance requirements cover. Boxes get checked, executives feel safer about their operation, and the whole time nobody has really analyzed the risks they’re facing.

    But we’re not settling for mere compliance on this blog. Furries have standards, after all.

    So the first thing you need to do before diving into database cryptography is threat modelling. The first step in any good threat model is taking inventory; especially of assumptions, requirements, and desired outcomes. A few good starter questions:

    1. What database software is being used? Is it up to date?
    2. What data is being stored in which database software?
    3. How are databases oriented in the network of the overall system?
      • Is your database properly firewalled from the public Internet?
    4. How does data flow throughout the network, and when do these data flows intersect with the database?
      • Which applications talk to the database? What languages are they written in? Which APIs do they use?
    5. How will cryptography secrets be managed?
      • Is there one key for everyone, one key per tenant, etc.?
      • How are keys rotated?
      • Do you use envelope encryption with an HSM, or vend the raw materials to your end devices?

    The first two questions are paramount for deciding how to write software for database cryptography, before you even get to thinking about the cryptography itself.

    (This is not a comprehensive set of questions to ask, either. A formal threat model is much deeper in the weeds.)

    The kind of cryptography protocol you need for, say, storing encrypted CSV files an S3 bucket is vastly different from relational (SQL) databases, which in turn will be significantly different from schema-free (NoSQL) databases.

    Furthermore, when you get to the point that you can start to think about the cryptography, you’ll often need to tackle confidentiality and integrity separately.

    If that’s unclear, think of a scenario like, “I need to encrypt PII, but I also need to digitally sign the lab results so I know it wasn’t tampered with at rest.”

    My point is, right off the bat, we’ve got a three-dimensional matrix of complexity to contend with:

    1. On one axis, we have the type of database.
      • Flat-file
      • Relational
      • Schema-free
    2. On another, we have the basic confidentiality requirements of the data.
      • Field encryption
      • Row encryption
      • Column encryption
      • Unstructured record encryption
      • Encrypting entire collections of records
    3. Finally, we have the integrity requirements of the data.
      • Field authentication
      • Row/column authentication
      • Unstructured record authentication
      • Collection authentication (based on e.g. Sparse Merkle Trees)

    And then you have a fourth dimension that often falls out of operational requirements for databases: Searchability.

    Why store data in a database if you have no way to index or search the data for fast retrieval?

    Credit: Harubaki

    If you’re starting to feel overwhelmed, you’re not alone. A lot of developers drastically underestimate the difficulty of the undertaking, until they run head-first into the complexity.

    Some just phone it in with AES_Encrypt() calls in their MySQL queries. (Too bad ECB mode doesn’t provide semantic security!)

    Which brings us to the meat of this blog post: The actual cryptography part.

    Cryptography is the art of transforming information security problems into key management problems.

    Former coworker

    Note: In the interest of time, I’m skipping over flat files and focusing instead on actual database technologies.

    Cryptography for Relational Databases

    Encrypting data in an SQL database seems simple enough, even if you’ve managed to shake off the complexity I teased from the introduction.

    You’ve got data, you’ve got a column on a table. Just encrypt the data and shove it in a cell on that column and call it a day, right?

    But, alas, this is a trap. There are so many gotchas that I can’t weave a coherent, easy-to-follow narrative between them all.

    So let’s start with a simple question: where and how are you performing your encryption?

    The Perils of Built-in Encryption Functions

    MySQL provides functions called AES_Encrypt and AES_Decrypt, which many developers have unfortunately decided to rely on in the past.

    It’s unfortunate because these functions implement ECB mode. To illustrate why ECB mode is bad, I encrypted one of my art commissions with AES in ECB mode:

    Art by Riley, encrypted with AES-ECB

    The problems with ECB mode aren’t exactly “you can see the image through it,” because ECB-encrypting a compressed image won’t have redundancy (and thus can make you feel safer than you are).

    ECB art is a good visual for the actual issue you should care about, however: A lack of semantic security.

    A cryptosystem is considered semantically secure if observing the ciphertext doesn’t reveal information about the plaintext (except, perhaps, the length; which all cryptosystems leak to some extent). More information here.

    ECB art isn’t to be confused with ECB poetry, which looks like this:

    Oh little one, you’re growing up
    You’ll soon be writing C
    You’ll treat your ints as pointers
    You’ll nest the ternary
    You’ll cut and paste from github
    And try cryptography
    But even in your darkest hour
    Do not use ECB

    CBC’s BEASTly when padding’s abused
    And CTR’s fine til a nonce is reused
    Some say it’s a CRIME to compress then encrypt
    Or store keys in the browser (or use javascript)
    Diffie Hellman will collapse if hackers choose your g
    And RSA is full of traps when e is set to 3
    Whiten! Blind! In constant time! Don’t write an RNG!
    But failing all, and listen well: Do not use ECB

    They’ll say “It’s like a one-time-pad!
    The data’s short, it’s not so bad
    the keys are long–they’re iron clad
    I have a PhD!”
    And then you’re front page Hacker News
    Your passwords cracked–Adobe Blues.
    Don’t leave your penguins showing through,
    Do not use ECB

    — Ben Nagy, PoC||GTFO 0x04:13

    Most people reading this probably know better than to use ECB mode already, and don’t need any of these reminders, but there is still a lot of code that inadvertently uses ECB mode to encrypt data in the database.

    Also, SHOW processlist; leaks your encryption keys. Oops.

    Credit: CMYKatt

    Application-layer Relational Database Cryptography

    Whether burned by ECB or just cautious about not giving your secrets to the system that stores all the ciphertext protected by said secret, a common next step for developers is to simply encrypt in their server-side application code.

    And, yes, that’s part of the answer. But how you encrypt is important.

    Credit: Harubaki

    “I’ll encrypt with CBC mode.”
    If you don’t authenticate your ciphertext, you’ll be sorry. Maybe try again?

    “Okay, fine, I’ll use an authenticated mode like GCM.”
    Did you remember to make the table and column name part of your AAD? What about the primary key of the record?

    “What on Earth are you talking about, Soatok?”
    Welcome to the first footgun of database cryptography!

    Confused Deputies

    Encrypting your sensitive data is necessary, but not sufficient. You need to also bind your ciphertexts to the specific context in which they are stored.

    To understand why, let’s take a step back: What specific threat does encrypting your database records protect against?

    We’ve already established that “your disks walk out of the datacenter” is a “full disk encryption” problem, so if you’re using application-layer cryptography to encrypt data in a relational database, your threat model probably involves unauthorized access to the database server.

    What, then, stops an attacker from copying ciphertexts around?

    Credit: CMYKatt

    Let’s say I have a legitimate user account with an ID 12345, and I want to read your street address, but it’s encrypted in the database. But because I’m a clever hacker, I have unfettered access to your relational database server.

    All I would need to do is simply…

    UPDATE table SET addr_encrypted = 'your-ciphertext' WHERE id = 12345

    …and then access the application through my legitimate access. Bam, data leaked. As an attacker, I can probably even copy fields from other columns and it will just decrypt. Even if you’re using an authenticated mode.

    We call this a confused deputy attack, because the deputy (the component of the system that has been delegated some authority or privilege) has become confused by the attacker, and thus undermined an intended security goal.

    The fix is to use the AAD parameter from the authenticated mode to bind the data to a given context. (AAD = Additional Authenticated Data.)

    - $addr = aes_gcm_encrypt($addr, $key);+ $addr = aes_gcm_encrypt($addr, $key, canonicalize([+     $tableName,+     $columnName,+     $primaryKey+ ]);

    Now if I start cutting and pasting ciphertexts around, I get a decryption failure instead of silently decrypting plaintext.

    This may sound like a specific vulnerability, but it’s more of a failure to understand an important general lesson with database cryptography:

    Where your data lives is part of its identity, and MUST be authenticated.

    Soatok’s Rule of Database Cryptography

    Canonicalization Attacks

    In the previous section, I introduced a pseudocode called canonicalize(). This isn’t a pasto from some reference code; it’s an important design detail that I will elaborate on now.

    First, consider you didn’t do anything to canonicalize your data, and you just joined strings together and called it a day…

    function dumbCanonicalize(    string $tableName,    string $columnName,    string|int $primaryKey): string {    return $tableName . '_' . $columnName . '#' . $primaryKey;}

    Consider these two inputs to this function:

    1. dumbCanonicalize('customers', 'last_order_uuid', 123);
    2. dumbCanonicalize('customers_last_order', 'uuid', 123);

    In this case, your AAD would be the same, and therefore, your deputy can still be confused (albeit in a narrower use case).

    In Cendyne’s article, AnonCo did something more subtle: The canonicalization bug created a collision on the inputs to HKDF, which resulted in an unintentional key reuse.

    Up until this point, their mistake isn’t relevant to us, because we haven’t even explored key management at all. But the same design flaw can re-emerge in multiple locations, with drastically different consequence.

    Multi-Tenancy

    Once you’ve implemented a mitigation against Confused Deputies, you may think your job is done. And it very well could be.

    Often times, however, software developers are tasked with building support for Bring Your Own Key (BYOK).

    This is often spawned from a specific compliance requirement (such as cryptographic shredding; i.e. if you erase the key, you can no longer recover the plaintext, so it may as well be deleted).

    Other times, this is driven by a need to cut costs: Storing different users’ data in the same database server, but encrypting it such that they can only encrypt their own records.

    Two things can happen when you introduce multi-tenancy into your database cryptography designs:

    1. Invisible Salamanders becomes a risk, due to multiple keys being possible for any given encrypted record.
    2. Failure to address the risk of Invisible Salamanders can undermine your protection against Confused Deputies, thereby returning you to a state before you properly used the AAD.

    So now you have to revisit your designs and ensure you’re using a key-committing authenticated mode, rather than just a regular authenticated mode.

    Isn’t cryptography fun?

    “What Are Invisible Salamanders?”

    This refers to a fun property of AEAD modes based on Polynomical MACs. Basically, if you:

    1. Encrypt one message under a specific key and nonce.
    2. Encrypt another message under a separate key and nonce.

    …Then you can get the same exact ciphertext and authentication tag. Performing this attack requires you to control the keys for both encryption operations.

    This was first demonstrated in an attack against encrypted messaging applications, where a picture of a salamander was hidden from the abuse reporting feature because another attached file had the same authentication tag and ciphertext, and you could trick the system if you disclosed the second key instead of the first. Thus, the salamander is invisible to attackers.

    Art: CMYKat

    We’re not quite done with relational databases yet, but we should talk about NoSQL databases for a bit. The final topic in scope applies equally to both, after all.

    Cryptography for NoSQL Databases

    Most of the topics from relational databases also apply to NoSQL databases, so I shall refrain from duplicating them here. This article is already sufficiently long to read, after all, and I dislike redundancy.

    NoSQL is Built Different

    The main thing that NoSQL databases offer in the service of making cryptographers lose sleep at night is the schema-free nature of NoSQL designs.

    What this means is that, if you’re using a client-side encryption library for a NoSQL database, the previous concerns about confused deputy attacks are amplified by the malleability of the document structure.

    Additionally, the previously discussed cryptographic attacks against the encryption mode may be less expensive for an attacker to pull off.

    Consider the following record structure, which stores a bunch of data stored with AES in CBC mode:

    {  "encrypted-data-key": "<blob>",  "name": "<ciphertext>",  "address": [    "<ciphertext>",    "<ciphertext>"  ],  "social-security": "<ciphertext>",  "zip-code": "<ciphertext>"}

    If this record is decrypted with code that looks something like this:

    $decrypted = [];// ... snip ...foreach ($record['address'] as $i => $addrLine) {    try {        $decrypted['address'][$i] = $this->decrypt($addrLine);    } catch (Throwable $ex) {        // You'd never deliberately do this, but it's for illustration        $this->doSomethingAnOracleCanObserve($i);                // This is more believable, of course:        $this->logDecryptionError($ex, $addrLine);        $decrypted['address'][$i] = '';    }}

    Then you can keep appending rows to the "address" field to reduce the number of writes needed to exploit a padding oracle attack against any of the <ciphertext> fields.

    Art: Harubaki

    This isn’t to say that NoSQL is less secure than SQL, from the context of client-side encryption. However, the powerful feature sets that NoSQL users are accustomed to may also give attackers a more versatile toolkit to work with.

    Record Authentication

    A pedant may point out that record authentication applies to both SQL and NoSQL. However, I mostly only observe this feature in NoSQL databases and document storage systems in the wild, so I’m shoving it in here.

    Encrypting fields is nice and all, but sometimes what you want to know is that your unencrypted data hasn’t been tampered with as it flows through your system.

    The trivial way this is done is by using a digital signature algorithm over the whole record, and then appending the signature to the end. When you go to verify the record, all of the information you need is right there.

    This works well enough for most use cases, and everyone can pack up and go home. Nothing more to see here.

    Except…

    When you’re working with NoSQL databases, you often want systems to be able to write to additional fields, and since you’re working with schema-free blobs of data rather than a normalized set of relatable tables, the most sensible thing to do is to is to append this data to the same record.

    Except, oops! You can’t do that if you’re shoving a digital signature over the record. So now you need to specify which fields are to be included in the signature.

    And you need to think about how to model that in a way that doesn’t prohibit schema upgrades nor allow attackers to perform downgrade attacks. (See below.)

    I don’t have any specific real-world examples here that I can point to of this problem being solved well.

    Art: CMYKat

    Furthermore, as with preventing confused deputy and/or canonicalization attacks above, you must also include the fully qualified path of each field in the data that gets signed.

    As I said with encryption before, but also true here:

    Where your data lives is part of its identity, and MUST be authenticated.

    Soatok’s Rule of Database Cryptography

    This requirement holds true whether you’re using symmetric-key authentication (i.e. HMAC) or asymmetric-key digital signatures (e.g. EdDSA).

    Bonus: A Maximally Schema-Free, Upgradeable Authentication Design

    Art: Harubaki

    Okay, how do you solve this problem so that you can perform updates and upgrades to your schema but without enabling attackers to downgrade the security? Here’s one possible design.

    Let’s say you have two metadata fields on each record:

    1. A compressed binary string representing which fields should be authenticated. This field is, itself, not authenticated. Let’s call this meta-auth.
    2. A compressed binary string representing which of the authenticated fields should also be encrypted. This field is also authenticated. This is at most the same length as the first metadata field. Let’s call this meta-enc.

    Furthermore, you will specify a canonical field ordering for both how data is fed into the signature algorithm as well as the field mappings in meta-auth and meta-enc.

    {  "example": {    "credit-card": {      "number": /* encrypted */,      "expiration": /* encrypted */,      "ccv": /* encrypted */    },    "superfluous": {      "rewards-member": null    }  },  "meta-auth": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true   /* meta-enc */  ]),  "meta-enc": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false  /* example.superfluous.rewards-member */  ]),  "signature": /* -- snip -- */}

    When you go to append data to an existing record, you’ll need to update meta-auth to include the mapping of fields based on this canonical ordering to ensure only the intended fields get validated.

    When you update your code to add an additional field that is intended to be signed, you can roll that out for new records and the record will continue to be self-describing:

    • New records will have the additional field flagged as authenticated in meta-auth (and meta-enc will grow)
    • Old records will not, but your code will still sign them successfully
    • To prevent downgrade attacks, simply include a schema version ID as an additional plaintext field that gets authenticated. An attacker who tries to downgrade will need to be able to produce a valid signature too.

    You might think meta-auth gives an attacker some advantage, but this only includes which fields are included in the security boundary of the signature or MAC, which allows unauthenticated data to be appended for whatever operational purpose without having to update signatures or expose signing keys to a wider part of the network.

    {  "example": {    "credit-card": {      "number": /* encrypted */,      "expiration": /* encrypted */,      "ccv": /* encrypted */    },    "superfluous": {      "rewards-member": null    }  },  "meta-auth": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true,  /* meta-enc */    true   /* meta-version */  ]),  "meta-enc": compress_bools([    true,  /* example.credit-card.number */    true,  /* example.credit-card.expiration */    true,  /* example.credit-card.ccv */    false, /* example.superfluous.rewards-member */    true   /* meta-version */  ]),  "meta-version": 0x01000000,  "signature": /* -- snip -- */}

    If an attacker tries to use the meta-auth field to mess with a record, the best they can hope for is an Invalid Signature exception (assuming the signature algorithm is secure to begin with).

    Even if they keep all of the fields the same, but play around with the structure of the record (e.g. changing the XPath or equivalent), so long as the path is authenticated with each field, breaking this is computationally infeasible.

    Searchable Encryption

    If you’ve managed to make it through the previous sections, congratulations, you now know enough to build a secure but completely useless database.

    Art: CMYKat

    Okay, put away the pitchforks; I will explain.

    Part of the reason why we store data in a database, rather than a flat file, is because we want to do more than just read and write. Sometimes computer scientists want to compute. Almost always, you want to be able to query your database for a subset of records based on your specific business logic needs.

    And so, a database which doesn’t do anything more than store ciphertext and maybe signatures is pretty useless to most people. You’d have better luck selling Monkey JPEGs to furries than convincing most businesses to part with their precious database-driven report generators.

    Art: Sophie

    So whenever one of your users wants to actually use their data, rather than just store it, they’re forced to decide between two mutually exclusive options:

    1. Encrypting the data, to protect it from unauthorized disclosure, but render it useless
    2. Doing anything useful with the data, but leaving it unencrypted in the database

    This is especially annoying for business types that are all in on the Zero Trust buzzword.

    Fortunately, the cryptographers are at it again, and boy howdy do they have a lot of solutions for this problem.

    Order-{Preserving, Revealing} Encryption

    On the fun side of things, you have things like Order-Preserving and Order-Revealing Encryption, which Matthew Green wrote about at length.

    [D]atabase encryption has been a controversial subject in our field. I wish I could say that there’s been an actual debate, but it’s more that different researchers have fallen into different camps, and nobody has really had the data to make their position in a compelling way. There have actually been some very personal arguments made about it.

    Attack of the week: searchable encryption and the ever-expanding leakage function

    The problem with these designs is that they have a significant enough leakage that it no longer provides semantic security.

    From Grubbs, et al. (GLMP, 2019.)
    Colors inverted to fit my blog’s theme better.

    To put it in other words: These designs are only marginally better than ECB mode, and probably deserve their own poems too.

    Order revealing
    Reveals much more than order
    Softcore ECB

    Order preserving
    Semantic security?
    Only in your dreams

    Haiku for your consideration

    Deterministic Encryption

    Here’s a simpler, but also terrible, idea for searchable encryption: Simply give up on semantic security entirely.

    If you recall the AES_{De,En}crypt() functions built into MySQL I mentioned at the start of this article, those are the most common form of deterministic encryption I’ve seen in use.

     SELECT * FROM foo WHERE bar = AES_Encrypt('query', 'key');

    However, there are slightly less bad variants. If you use AES-GCM-SIV with a static nonce, your ciphertexts are fully deterministic, and you can encrypt a small number of distinct records safely before you’re no longer secure.

    From Page 14 of the linked paper. Full view.

    That’s certainly better than nothing, but you also can’t mitigate confused deputy attacks. But we can do better than this.

    Homomorphic Encryption

    In a safer plane of academia, you’ll find homomorphic encryption, which researchers recently demonstrated with serving Wikipedia pages in a reasonable amount of time.

    Homomorphic encryption allows computations over the ciphertext, which will be reflected in the plaintext, without ever revealing the key to the entity performing the computation.

    If this sounds vaguely similar to the conditions that enable chosen-ciphertext attacks, you probably have a good intuition for how it works: RSA is homomorphic to multiplication, AES-CTR is homomorphic to XOR. Fully homomorphic encryption uses lattices, which enables multiple operations but carries a relatively enormous performance cost.

    Art: Harubaki

    Homomorphic encryption sometimes intersects with machine learning, because the notion of training an encrypted model by feeding it encrypted data, then decrypting it after-the-fact is desirable for certain business verticals. Your data scientists never see your data, and you have some plausible deniability about the final ML model this work produces. This is like a Siren song for Venture Capitalist-backed medical technology companies. Tech journalists love writing about it.

    However, a less-explored use case is the ability to encrypt your programs but still get the correct behavior and outputs. Although this sounds like a DRM technology, it’s actually something that individuals could one day use to prevent their ISPs or cloud providers from knowing what software is being executed on the customer’s leased hardware. The potential for a privacy win here is certainly worth pondering, even if you’re a tried and true Pirate Party member.

    Just say “NO” to the copyright cartels.

    Art: CMYKat

    Searchable Symmetric Encryption (SSE)

    Forget about working at the level of fields and rows or individual records. What if we, instead, worked over collections of documents, where each document is viewed as a set of keywords from a keyword space?

    Art: CMYKat

    That’s the basic premise of SSE: Encrypting collections of documents rather than individual records.

    The actual implementation details differ greatly between designs. They also differ greatly in their leakage profiles and susceptibility to side-channel attacks.

    Some schemes use a so-called trapdoor permutation, such as RSA, as one of their building blocks.

    Some schemes only allow for searching a static set of records, while others can accommodate new data over time (with the trade-off between more leakage or worse performance).

    If you’re curious, you can learn more about SSE here, and see some open source SEE implementations online here.

    You’re probably wondering, “If SSE is this well-studied and there are open source implementations available, why isn’t it more widely used?”

    Your guess is as good as mine, but I can think of a few reasons:

    1. The protocols can be a little complicated to implement, and aren’t shipped by default in cryptography libraries (i.e. OpenSSL’s libcrypto or libsodium).
    2. Every known security risk in SSE is the product of a trade-offs, rather than there being a single winner for all use cases that developers can feel comfortable picking.
    3. Insufficient marketing and developer advocacy.
      SSE schemes are mostly of interest to academics, although Seny Kamara (Brown Univeristy professior and one of the luminaries of searchable encryption) did try to develop an app called Pixek which used SSE to encrypt photos.

    Maybe there’s room for a cryptography competition on searchable encryption schemes in the future.

    You Can Have Little a HMAC, As a Treat

    Finally, I can’t talk about searchable encryption without discussing a technique that’s older than dirt by Internet standards, that has been independently reinvented by countless software developers tasked with encrypting database records.

    The oldest version I’ve been able to track down dates to 2006 by Raul Garcia at Microsoft, but I’m not confident that it didn’t exist before.

    The idea I’m alluding to goes like this:

    1. Encrypt your data, securely, using symmetric cryptography.
      (Hopefully your encryption addresses the considerations outlined in the relevant sections above.)
    2. Separately, calculate an HMAC over the unencrypted data with a separate key used exclusively for indexing.

    When you need to query your data, you can just recalculate the HMAC of your challenge and fetch the records that match it. Easy, right?

    Even if you rotate your keys for encryption, you keep your indexing keys static across your entire data set. This lets you have durable indexes for encrypted data, which gives you the ability to do literal lookups for the performance hit of a hash function.

    Additionally, everyone has HMAC in their toolkit, so you don’t have to move around implementations of complex cryptographic building blocks. You can live off the land. What’s not to love?

    Hooray!

    However, if you stopped here, we regret to inform you that your data is no longer indistinguishable from random, which probably undermines the security proof for your encryption scheme.

    How annoying!

    Of course, you don’t have to stop with the addition of plain HMAC to your database encryption software.

    Take a page from Troy Hunt: Truncate the output to provide k-anonymity rather than a direct literal look-up.

    “K-What Now?”

    Imagine you have a full HMAC-SHA256 of the plaintext next to every ciphertext record with a static key, for searchability.

    Each HMAC output corresponds 1:1 with a unique plaintext.

    Because you’re using HMAC with a secret key, an attacker can’t just build a rainbow table like they would when attempting password cracking, but it still leaks duplicate plaintexts.

    For example, an HMAC-SHA256 output might look like this: 04a74e4c0158e34a566785d1a5e1167c4e3455c42aea173104e48ca810a8b1ae

    Art: CMYKat\

    If you were to slice off most of those bytes (e.g. leaving only the last 3, which in the previous example yields a8b1ae), then with sufficient records, multiple plaintexts will now map to the same truncated HMAC tag.

    Which means if you’re only revealing a truncated HMAC tag to the database server (both when storing records or retrieving them), you can now expect false positives due to collisions in your truncated HMAC tag.

    These false positives give your data a discrete set of anonymity (called k-anonymity), which means an attacker with access to your database cannot:

    1. Distinguish between two encrypted records with the same short HMAC tag.
    2. Reverse engineer the short HMAC tag into a single possible plaintext value, even if they can supply candidate queries and study the tags sent to the database.
    Art: CMYKat\

    As with SSE above, this short HMAC technique exposes a trade-off to users.

    • Too much k-anonymity (i.e. too many false positives), and you will have to decrypt-then-discard multiple mismatching records. This can make queries slow.
    • Not enough k-anonymity (i.e. insufficient false positives), and you’re no better off than a full HMAC.

    Even more troublesome, the right amount to truncate is expressed in bits (not bytes), and calculating this value depends on the number of unique plaintext values you anticipate in your dataset. (Fortunately, it grows logarithmically, so you’ll rarely if ever have to tune this.)

    If you’d like to play with this idea, here’s a quick and dirty demo script.

    Intermission

    If you started reading this post with any doubts about Cendyne’s statement that “Database cryptography is hard”, by making it to this point, they’ve probably been long since put to rest.

    Art: Harubaki

    Conversely, anyone that specializes in this topic is probably waiting for me to say anything novel or interesting; their patience wearing thin as I continue to rehash a surface-level introduction of their field without really diving deep into anything.

    Thus, if you’ve read this far, I’d like to demonstrate the application of what I’ve covered thus far into a real-world case study into an database cryptography product.

    Case Study: MongoDB Client-Side Encryption

    MongoDB is an open source schema-free NoSQL database. Last year, MongoDB made waves when they announced Queryable Encryption in their upcoming client-side encryption release.

    Taken from the press release, but adapted for dark themes.

    A statement at the bottom of their press release indicates that this isn’t clown-shoes:

    Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz, who are pioneers in the field of encrypted search. The Group conducts cutting-edge peer-reviewed research in cryptography and works with MongoDB engineering teams to transfer and deploy the latest innovations in cryptography and privacy to the MongoDB data platform.

    If you recall, I mentioned Seny Kamara in the SSE section of this post. They certainly aren’t wrong about Kamara and Moataz being pioneers in this field.

    So with that in mind, let’s explore the implementation in libmongocrypt and see how it stands up to scrutiny.

    MongoCrypt: The Good

    MongoDB’s encryption library takes key management seriously: They provide a KMS integration for cloud users by default (supporting both AWS and Azure).

    MongoDB uses Encrypt-then-MAC with AES-CBC and HMAC-SHA256, which is congruent to what Signal does for message encryption.

    How Is Queryable Encryption Implemented?

    From the current source code, we can see that MongoCrypt generates several different types of tokens, using HMAC (calculation defined here).

    According to their press release:

    The feature supports equality searches, with additional query types such as range, prefix, suffix, and substring planned for future releases.

    MongoDB Queryable Encryption Announcement

    Which means that most of the juicy details probably aren’t public yet.

    These HMAC-derived tokens are stored wholesale in the data structure, but most are encrypted before storage using AES-CTR.

    There are more layers of encryption (using AEAD), server-side token processing, and more AES-CTR-encrypted edge tokens. All of this is finally serialized (implementation) as one blob for storage.

    Since only the equality operation is currently supported (which is the same feature you’d get from HMAC), it’s difficult to speculate what the full feature set looks like.

    However, since Kamara and Moataz are leading its development, it’s likely that this feature set will be excellent.

    MongoCrypt: The Bad

    Every call to do_encrypt() includes at most the Key ID (but typically NULL) as the AAD. This means that the concerns over Confused Deputies (and NoSQL specifically) are relevant to MongoDB.

    However, even if they did support authenticating the fully qualified path to a field in the AAD for their encryption, their AEAD construction is vulnerable to the kind of canonicalization attack I wrote about previously.

    First, observe this code which assembles the multi-part inputs into HMAC.

    /* Construct the input to the HMAC */uint32_t num_intermediates = 0;_mongocrypt_buffer_t intermediates[3];// -- snip --if (!_mongocrypt_buffer_concat (  &to_hmac, intermediates, num_intermediates)) {   CLIENT_ERR ("failed to allocate buffer");   goto done;}if (hmac == HMAC_SHA_512_256) {   uint8_t storage[64];   _mongocrypt_buffer_t tag = {.data = storage, .len = sizeof (storage)};   if (!_crypto_hmac_sha_512 (crypto, Km, &to_hmac, &tag, status)) {      goto done;   }   // Truncate sha512 to first 256 bits.   memcpy (out->data, tag.data, MONGOCRYPT_HMAC_LEN);} else {   BSON_ASSERT (hmac == HMAC_SHA_256);   if (!_mongocrypt_hmac_sha_256 (crypto, Km, &to_hmac, out, status)) {      goto done;   }}

    The implementation of _mongocrypt_buffer_concat() can be found here.

    If either the implementation of that function, or the code I snipped from my excerpt, had contained code that prefixed every segment of the AAD with the length of the segment (represented as a uint64_t to make overflow infeasible), then their AEAD mode would not be vulnerable to canonicalization issues.

    Using TupleHash would also have prevented this issue.

    Silver lining for MongoDB developers: Because the AAD is either a key ID or NULL, this isn’t exploitable in practice.

    The first cryptographic flaw sort of cancels the second out.

    If the libmongocrypt developers ever want to mitigate Confused Deputy attacks, they’ll need to address this canonicalization issue too.

    MongoCrypt: The Ugly

    MongoCrypt supports deterministic encryption.

    If you specify deterministic encryption for a field, your application passes a deterministic initialization vector to AEAD.

    MongoDB documentation

    We already discussed why this is bad above.

    Wrapping Up

    This was not a comprehensive treatment of the field of database cryptography. There are many areas of this field that I did not cover, nor do I feel qualified to discuss.

    However, I hope anyone who takes the time to read this finds themselves more familiar with the subject.

    Additionally, I hope any developers who think “encrypting data in a database is [easy, trivial] (select appropriate)” will find this broad introduction a humbling experience.

    Art: CMYKat

    https://soatok.blog/2023/03/01/database-cryptography-fur-the-rest-of-us/

    #appliedCryptography #blockCipherModes #cryptography #databaseCryptography #databases #encryptedSearch #HMAC #MongoCrypt #MongoDB #QueryableEncryption #realWorldCryptography #security #SecurityGuidance #SQL #SSE #symmetricCryptography #symmetricSearchableEncryption

  10. Going Bark: A Furry’s Guide to End-to-End Encryption

    Governments are back on their anti-encryption bullshit again.

    Between the U.S. Senate’s “EARN IT” Act, the E.U.’s slew of anti-encryption proposals, and Australia’s new anti-encryption law, it’s become clear that the authoritarians in office view online privacy as a threat to their existence.

    Normally, when the governments increase their anti-privacy sabre-rattling, technologists start talking more loudly about Tor, Signal, and other privacy technologies (usually only to be drowned out by paranoid people who think Tor and Signal are government backdoors or something stupid; conspiracy theories ruin everything!).

    I’m not going to do that.

    Instead, I’m going to show you how to add end-to-end encryption to any communication software you’re developing. (Hopefully, I’ll avoid making any bizarre design decisions along the way.)

    But first, some important disclaimers:

    1. Yes, you should absolutely do this. I don’t care how banal your thing is; if you expect people to use it to communicate with each other, you should make it so that you can never decrypt their communications.
    2. You should absolutely NOT bill the thing you’re developing as an alternative to Signal or WhatsApp.
    3. The goal of doing this is to increase the amount of end-to-end encryption deployed on the Internet that the service operator cannot decrypt (even if compelled by court order) and make E2EE normalized. The goal is NOT to compete with highly specialized and peer-reviewed privacy technology.
    4. I am not a lawyer, I’m some furry who works in cryptography. The contents of this blog post is not legal advice, nor is it endorsed by any company or organization. Ask the EFF for legal questions.

    The organization of this blog post is as follows: First, I’ll explain how to encrypt and decrypt data between users, assuming you have a key. Next, I’ll explain how to build an authenticated key exchange and a ratcheting protocol to determine the keys used in the first step. Afterwards, I’ll explore techniques for binding authentication keys to identities and managing trust. Finally, I’ll discuss strategies for making it impractical to ever backdoor your software (and impossible to silently backdoor it), just to piss the creeps and tyrants of the world off even more.

    You don’t have to implement the full stack of solutions to protect users, but the further you can afford to go, the safer your users will be from privacy-invasive policing.

    (Art by Kyume.)

    Preliminaries

    Choosing a Cryptography Library

    In the examples contained on this page, I will be using the Sodium cryptography library. Specifically, my example code will be written with the Sodium-Plus library for JavaScript, since it strikes a good balance between performance and being cross-platform.

    const { SodiumPlus } = require('sodium-plus');(async function() {     // Select a backend automatically     const sodium = await SodiumPlus.auto();          // Do other stuff here})();

    Libsodium is generally the correct choice for developing cryptography features in software, and is available in most programming languages,

    If you’re prone to choose a different library, you should consult your cryptographer (and yes, you should have one on your payroll if you’re doing things different) about your design choices.

    Threat Modelling

    Remember above when I said, “You don’t have to implement the full stack of solutions to protect users, but the further you can afford to go, the safer your users will be from privacy-invasive policing”?

    How far you go in implementing the steps outlined on this blog post should be informed by a threat model, not an ad hoc judgment.

    For example, if you’re encrypting user data and storing it in the cloud, you probably want to pass the Mud Puddle Test:

    1. First, drop your device(s) in a mud puddle.
    2. Next, slip in said puddle and crack yourself on the head. When you regain consciousness you’ll be perfectly fine, but won’t for the life of you be able to recall your device passwords or keys.
    3. Now try to get your cloud data back.

    Did you succeed? If so, you’re screwed. Or to be a bit less dramatic, I should say: your cloud provider has access to your ‘encrypted’ data, as does the government if they want it, as does any rogue employee who knows their way around your provider’s internal policy checks.

    Matthew Green describes the Mud Puddle Test, which Apple products definitely don’t pass.

    If you must fail the Mud Puddle Test for your users, make sure you’re clear and transparent about this in the documentation for your product or service.

    (Art by Swizz.)

    I. Symmetric-Key Encryption

    The easiest piece of this puzzle is to encrypt data in transit between both ends (thus, satisfying the loosest definition of end-to-end encryption).

    At this layer, you already have some kind of symmetric key to use for encrypting data before you send it, and for decrypting it as you receive it.

    For example, the following code will encrypt/decrypt strings and return hexadecimal strings with a version prefix.

    const VERSION = "v1";/** * @param {string|Uint8Array} message * @param {Uint8Array} key * @param {string|null} assocData * @returns {string} */async function encryptData(message, key, assocData = null) {    const nonce = await sodium.randombytes_buf(24);    const aad = JSON.stringify({      'version': VERSION,      'nonce': await sodium.sodium_bin2hex(nonce),      'extra': assocData    });    const encrypted = await sodium.crypto_aead_xchacha20poly1305_ietf_encrypt(        message,        nonce,        key,        aad    );    return (       VERSION +       await sodium.sodium_bin2hex(nonce) +       await sodium.sodium_bin2hex(encrypted)    );}/** * @param {string|Uint8Array} message * @param {Uint8Array} key * @param {string|null} assocData * @returns {string} */async function decryptData(encrypted, key, assocData = null) {    const ver = encrypted.slice(0, 2);    if (!await sodium.sodium_memcmp(ver, VERSION)) {        throw new Error("Incorrect version: " + ver);    }    const nonce = await sodium.sodium_hex2bin(encrypted.slice(2, 50));    const ciphertext = await sodium.sodium_hex2bin(encrypted.slice(50));    const aad = JSON.stringify({      'version': ver,      'nonce': encrypted.slice(2, 50),      'extra': assocData    });        const plaintext = await sodium.crypto_aead_xchacha20poly1305_ietf_decrypt(        ciphertext,        nonce,        key,        aad    );    return plaintext.toString('utf-8');}

    Under-the-hood, this is using XChaCha20-Poly1305, which is less sensitive to timing leaks than AES-GCM. However, like AES-GCM, this encryption mode doesn’t provide message- or key-commitment.

    If you want key commitment, you should derive two keys from $key using a KDF based on hash functions: One for actual encryption, and the other as a key commitment value.

    If you want message commitment, you can use AES-CTR + HMAC-SHA256 or XChaCha20 + BLAKE2b-MAC.

    If you want both, ask Taylor Campbell about his BLAKE3-based design.

    A modified version of the above code with key-commitment might look like this:

    const VERSION = "v2";/** * Derive an encryption key and a commitment hash. * @param {CryptographyKey} key * @param {Uint8Array} nonce * @returns {{encKey: CryptographyKey, commitment: Uint8Array}} */async function deriveKeys(key, nonce) {    const encKey = new CryptographyKey(await sodium.crypto_generichash(        new Uint8Array([0x01].append(nonce)),        key    ));    const commitment = await sodium.crypto_generichash(        new Uint8Array([0x02].append(nonce)),        key    );    return {encKey, commitment};}/** * @param {string|Uint8Array} message * @param {Uint8Array} key * @param {string|null} assocData * @returns {string} */async function encryptData(message, key, assocData = null) {    const nonce = await sodium.randombytes_buf(24);    const aad = JSON.stringify({      'version': VERSION,      'nonce': await sodium.sodium_bin2hex(nonce),      'extra': assocData    });    const {encKey, commitment} = await deriveKeys(key, nonce);    const encrypted = await sodium.crypto_aead_xchacha20poly1305_ietf_encrypt(        message,        nonce,        encKey,        aad    );    return (       VERSION +       await sodium.sodium_bin2hex(nonce) +       await sodium.sodium_bin2hex(commitment) +       await sodium.sodium_bin2hex(encrypted)    );}/** * @param {string|Uint8Array} message * @param {Uint8Array} key * @param {string|null} assocData * @returns {string} */async function decryptData(encrypted, key, assocData = null) {    const ver = encrypted.slice(0, 2);    if (!await sodium.sodium_memcmp(ver, VERSION)) {        throw new Error("Incorrect version: " + ver);    }    const nonce = await sodium.sodium_hex2bin(encrypted.slice(2, 50));    const ciphertext = await sodium.sodium_hex2bin(encrypted.slice(114));    const aad = JSON.stringify({      'version': ver,      'nonce': encrypted.slice(2, 50),      'extra': assocData    });    const storedCommitment = await sodium.sodium_hex2bin(encrypted.slice(50, 114));    const {encKey, commitment} = await deriveKeys(key, nonce);    if (!(await sodium.sodium_memcmp(storedCommitment, commitment))) {        throw new Error("Incorrect commitment value");    }        const plaintext = await sodium.crypto_aead_xchacha20poly1305_ietf_decrypt(        ciphertext,        nonce,        encKey,        aad    );    return plaintext.toString('utf-8');}

    Another design choice you might make is to encode ciphertext with base64 instead of hexadecimal. That doesn’t significantly alter the design here, but it does mean your decoding logic has to accommodate this.

    You SHOULD version your ciphertexts, and include this in the AAD provided to your AEAD encryption mode. I used “v1” and “v2” as a version string above, but you can use your software name for that too.

    II. Key Agreement

    If you’re not familiar with Elliptic Curve Diffie-Hellman or Authenticated Key Exhcanges, the two of the earliest posts on this blog were dedicated to those topics.

    Key agreement in libsodium uses Elliptic Curve Diffie-Hellman over Curve25519, or X25519 for short.

    There are many schools of thought for extending ECDH into an authenticated key exchange protocol.

    We’re going to implement what the Signal Protocol calls X3DH instead of doing some interactive EdDSA + ECDH hybrid, because X3DH provides cryptographic deniability (see this section of the X3DH specification for more information).

    For the moment, I’m going to assume a client-server model. That may or may not be appropriate for your design. You can substitute “the server” for “the other participant” in a peer-to-peer configuration.

    Head’s up: This section of the blog post is code-heavy.

    Update (November 23, 2020): I implemented this design in TypeScript, if you’d like something tangible to work with. I call my library, Rawr X3DH.

    X3DH Pre-Key Bundles

    Each participant will need to upload an Ed25519 identity key once (which is a detail covered in another section), which will be used to sign bundles of X25519 public keys to use for X3DH.

    Your implementation will involve a fair bit of boilerplate, like so:

    /** * Generate an X25519 keypair. * * @returns {{secretKey: X25519SecretKey, publicKey: X25519PublicKey}} */async function generateKeyPair() {    const keypair = await sodium.crypto_box_keypair();    return {        secretKey: await sodium.crypto_box_secretkey(keypair),        publicKey: await sodium.crypto_box_publickey(keypair)    };}/** * Generates some number of X25519 keypairs. * * @param {number} preKeyCount * @returns {{secretKey: X25519SecretKey, publicKey: X25519PublicKey}[]} */async function generateBundle(preKeyCount = 100) {    const bundle = [];    for (let i = 0; i < preKeyCount; i++) {        bundle.push(await generateKeyPair());    }    return bundle;}/** * BLAKE2b( len(PK) | PK_0, PK_1, ... PK_n ) * * @param {X25519PublicKey[]} publicKeys * @returns {Uint8Array} */async function prehashPublicKeysForSigning(publicKeys) {    const hashState = await sodium.crypto_generichash_init();    // First, update the state with the number of public keys    const pkLen = new Uint8Array([        (publicKeys.length >>> 24) & 0xff,        (publicKeys.length >>> 16) & 0xff,        (publicKeys.length >>> 8) & 0xff,        publicKeys.length & 0xff    ]);    await sodium.crypto_generichash_update(hashState, pkLen);    // Next, update the state with each public key    for (let pk of publicKeys) {        await sodium.crypto_generichash_update(            hashState,            pk.getBuffer()        );    }    // Return the finalized BLAKE2b hash    return await sodium.crypto_generichash_final(hashState);}/** * Signs a bundle. Returns the signature. * * @param {Ed25519SecretKey} signingKey * @param {X25519PublicKey[]} publicKeys * @returns {Uint8Array} */async function signBundle(signingKey, publicKeys) {    return sodium.crypto_sign_detached(        await prehashPublicKeysForSigning(publicKeys),        signingKey    );}/** * This is just so you can see how verification looks. * * @param {Ed25519PublicKey} verificationKey * @param {X25519PublicKey[]} publicKeys * @param {Uint8Array} signature */async function verifyBundle(verificationKey, publicKeys, signature) {    return sodium.crypto_sign_verify_detached(        await prehashPublicKeysForSigning(publicKeys),        verificationKey,        signature    );}

    This boilerplate exists just so you can do something like this:

    /** * Generate some number of X25519 keypairs. * Persist the bundle. * Sign the bundle of publickeys with the Ed25519 secret key. * Return the signed bundle (which can be transmitted to the server.) * * @param {Ed25519SecretKey} signingKey * @param {number} numKeys * @returns {{signature: string, bundle: string[]}} */async function x3dh_pre_key(signingKey, numKeys = 100) {    const bundle = await generateBundle(numKeys);    const publicKeys = bundle.map(x => x.publicKey);    const signature = await signBundle(signingKey, publicKeys);        // This is a stub; how you persist it is app-specific:    persistBundleNotDefinedHere(signingKey, bundle);        // Hex-encode all the public keys    const encodedBundle = [];    for (let pk of publicKeys) {        encodedBundle.push(await sodium.sodium_bin2hex(pk.getBuffer()));    }        return {        'signature': await sodium.sodium_bin2hex(signature),        'bundle': encodedBundle    };}

    And then you can drop the output of x3dh_pre_key(secretKey) into a JSON-encoded HTTP request.

    In accordance to Signal’s X3DH spec, you want to use x3dh_pre_key(secretKey, 1) to generate the “signed pre-key” bundle and x3dn_pre_key(secretKey, 100) when pushing 100 one-time keys to the server.

    X3DH Initiation

    This section conforms to the Sending the Initial Message section of the X3DH specification.

    When you initiate a conversation, the server should provide you with a bundle containing:

    • Your peer’s Identity key (an Ed25519 public key)
    • Your peer’s current Signed Pre-Key (an X25519 public key)
    • (If any remain unburned) One of your key’s One-Time Keys (an X25519 public key) — and then delete it

    If we assume the structure of this response looks like this:

    {    "IdentityKey": "...",    "SignedPreKey": {        "Signature": "..."        "PreKey": "..."    },    "OneTimeKey": "..." // or NULL}

    Then we can write the initiation step of the handshake like so:

    /** * Get SK for initializing an X3DH handshake * * @param {object} r -- See previous code block * @param {Ed25519SecretKey} senderKey */async function x3dh_initiate_send_get_sk(r, senderKey) {    const identityKey = new Ed25519PublicKey(       await sodium.sodium_hex2bin(r.IdentityKey)    );    const signedPreKey = new X25519PublicKey(        await sodium.sodium_hex2bin(r.SignedPreKey.PreKey)    );    const signature = await sodium.sodium_hex2bin(r.SignedPreKey.Signature);    // Check signature    const valid = await verifyBundle(identityKey, [signedPreKey], signature);    if (!valid) {        throw new Error("Invalid signature");    }    const ephemeral = await generateKeyPair();    const ephSecret = ephemeral.secretKey;    const ephPublic = ephemeral.publicKey;    // Turn the Ed25519 keys into X25519 keys for X3DH:    const senderX = await sodium.crypto_sign_ed25519_sk_to_curve25519(senderKey);    const recipientX = await sodium.crypto_sign_ed25519_pk_to_curve25519(identityKey);        // See the X3DH specification to really understand this part:    const DH1 = await sodium.crypto_scalarmult(senderX, signedPreKey);    const DH2 = await sodium.crypto_scalarmult(ephSecret, recipientX);    const DH3 = await sodium.crypto_scalarmult(ephSecret, signedPreKey);    let SK;    if (r.OneTimeKey) {        let DH4 = await sodium.crypto_scalarmult(            ephSecret,            new X25519PublicKey(await sodium.sodium_hex2bin(r.OneTimeKey))        );        SK = kdf(new Uint8Array(             [].concat(DH1.getBuffer())             .concat(DH2.getBuffer())             .concat(DH3.getBuffer())             .concat(DH4.getBuffer())        ));        DH4.wipe();    } else {        SK = kdf(new Uint8Array(             [].concat(DH1.getBuffer())             .concat(DH2.getBuffer())             .concat(DH3.getBuffer())        ));            }    // Wipe keys    DH1.wipe();    DH2.wipe();    DH3.wipe();    ephSecret.wipe();    senderX.wipe();    return {        IK: identityKey,        EK: ephPublic,        SK: SK,        OTK: r.OneTimeKey // might be NULL    };}/** * Initialize an X3DH handshake * * @param {string} recipientIdentity - Some identifier for the user * @param {Ed25519SecretKey} secretKey - Sender's secret key * @param {Ed25519PublicKey} publicKey - Sender's public key * @param {string} message - The initial message to send * @returns {object} */async function x3dh_initiate_send(recipientIdentity, secretKey, publicKey, message) {    const r = await get_server_response(recipientIdentity);    const {IK, EK, SK, OTK} = await x3dh_initiate_send_get_sk(r, secretKey);    const assocData = await sodium.sodium_bin2hex(        new Uint8Array(            [].concat(publicKey.getBuffer())            .concat(IK.getBuffer())        )    );        /*     * We're going to set the session key for our recipient to SK.     * This might invoke a ratchet.     *     * Either SK or the output of the ratchet derived from SK     * will be returned by getEncryptionKey().     */    await setSessionKey(recipientIdentity, SK);        const encrypted = await encryptData(        message,        await getEncryptionKey(recipientIdentity),        assocData    );    return {        "Sender": my_identity_string,        "IdentityKey": await sodium.sodium_bin2hex(publicKey),        "EphemeralKey": await sodium.sodium_bin2hex(EK),        "OneTimeKey": OTK,        "CipherText": encrypted    };}

    We didn’t define setSessionKey() or getEncryptionKey() above. It will be covered later.

    X3DH – Receiving an Initial Message

    This section implements the Receiving the Initial Message section of the X3DH Specification.

    We’re going to assume the structure of the request looks like this:

    {    "Sender": "...",    "IdentityKey": "...",    "EphemeralKey": "...",    "OneTimeKey": "...",    "CipherText": "..."}

    The code to handle this should look like this:

    /** * Handle an X3DH initiation message as a receiver * * @param {object} r -- See previous code block * @param {Ed25519SecretKey} identitySecret * @param {Ed25519PublicKey} identityPublic * @param {Ed25519SecretKey} preKeySecret */async function x3dh_initiate_recv_get_sk(    r,    identitySecret,    identityPublic,    preKeySecret) {    // Decode strings    const senderIdentityKey = new Ed25519PublicKey(        await sodium.sodium_hex2bin(r.IdentityKey),    );    const ephemeral = new X25519PublicKey(        await sodium.sodium_hex2bin(r.EphemeralKey),    );        // Ed25519 -> X25519    const senderX = await sodium.crypto_sign_ed25519_pk_to_curve25519(senderIdentityKey);    const recipientX = await sodium.crypto_sign_ed25519_sk_to_curve25519(identitySecret);        // See the X3DH specification to really understand this part:    const DH1 = await sodium.crypto_scalarmult(preKeySecret, senderX);    const DH2 = await sodium.crypto_scalarmult(recipientX, ephemeral);    const DH3 = await sodium.crypto_scalarmult(preKeySecret, ephemeral);    let SK;        if (r.OneTimeKey) {        let DH4 = await sodium.crypto_scalarmult(            await fetchAndWipeOneTimeSecretKey(r.OneTimeKey),            ephemeral        );        SK = kdf(new Uint8Array(             [].concat(DH1.getBuffer())             .concat(DH2.getBuffer())             .concat(DH3.getBuffer())             .concat(DH4.getBuffer())        ));        DH4.wipe();    } else {        SK = kdf(new Uint8Array(             [].concat(DH1.getBuffer())             .concat(DH2.getBuffer())             .concat(DH3.getBuffer())        ));            }    // Wipe keys    DH1.wipe();    DH2.wipe();    DH3.wipe();    recipientX.wipe();    return {        Sender: r.Sender,        SK: SK,        IK: senderIdentityKey    };}/** * Initiate an X3DH handshake as a recipient * * @param {object} req - Request object * @returns {string} - The initial message */async function x3dh_initiate_recv(req) {    const {identitySecret, identityPublic} = await getIdentityKeypair();    const {preKeySecret, preKeyPublic} = await getPreKeyPair();    const {Sender, SK, IK} = await x3dh_initiate_recv_get_sk(        req,        identitySecret,        identityPublic,        preKeySecret,        preKeyPublic    );    const assocData = await sodium.sodium_bin2hex(        new Uint8Array(            [].concat(IK.getBuffer())            .concat(identityPublic.getBuffer())        )    );    try {        await setSessionKey(senderIdentity, SK);        return decryptData(            req.CipherText,            await getEncryptionKey(senderIdentity),            assocData        );    } catch (e) {        await destroySessionKey(senderIdentity);        throw e;    }}

    And with that, you’ve successfully implemented X3DH and symmetric encryption in JavaScript.

    We abstracted some of the details away (i.e. kdf(), the transport mechanisms, the session key management mechanisms, and a few others). Some of them will be highly specific to your application, so it doesn’t make a ton of sense to flesh them out.

    One thing to keep in mind: According to the X3DH specification, participants should regularly (e.g. weekly) replace their Signed Pre-Key in the server with a fresh one. They should also publish more One-Time Keys when they start to run low.

    If you’d like to see a complete reference implementation of X3DH, as I mentioned before, Rawr-X3DH implements it in TypeScript.

    Session Key Management

    Using X3DH to for every message is inefficient and unnecessary. Even the Signal Protocol doesn’t do that.

    Instead, Signal specifies a Double Ratchet protocol that combines a Symmetric-Key Ratchet on subsequent messages, and a Diffie-Hellman-based ratcheting protocol.

    Signal even specifies integration guidelines for the Double Ratchet with X3DH.

    It’s worth reading through the specification to understand their usages of Key-Derivation Functions (KDFs) and KDF Chains.

    Although it is recommended to use HKDF as the Signal protocol specifies, you can strictly speaking use any secure keyed PRF to accomplish the same goal.

    What follows is an example of a symmetric KDF chain that uses BLAKE2b with 512-bit digests of the current session key; the leftmost half of the BLAKE2b digest becomes the new session key, while the rightmost half becomes the encryption key.

    const SESSION_KEYS = {};/** * Note: In reality you'll want to have two separate sessions: * One for receiving data, one for sending data. * * @param {string} identity * @param {CryptographyKey} key */async function setSessionKey(identity, key) {    SESSION_KEYS[identity] = key;}async function getEncryptionKey(identity) {    if (!SESSION_KEYS[identity]) {        throw new Error("No session key for " + identity");    }    const blake2bMac = await sodium.crypto_generichash(        SESSION_KEYS[identity],        null,        64    );    SESSION_KEYS[identity] = new CryptographyKey(blake2bMac.slice(0, 32));    return new CryptographyKey(blake2bMac.slice(32, 64));}

    In the interest of time, a full DHRatchet implementation is left as an exercise to the reader (since it’s mostly a state machine), but using the appropriate functions provided by sodium-plus (crypto_box_keypair(), crypto_scalarmult()) should be relatively straightforward.

    Make sure your KDFs use domain separation, as per the Signal Protocol specifications.

    Group Key Agreement

    The Signal Protocol specified X3DH and the Double Ratchet for securely encrypting information between two parties.

    Group conversations are trickier, because you have to be able to encrypt information that multiple recipients can decrypt, add/remove participants to the conversation, etc.

    (The same complexity comes with multi-device support for end-to-end encryption.)

    The best design I’ve read to date for tackling group key agreement is the IETF Messaging Layer Security RFC draft.

    I am not going to implement the entire MLS RFC in this blog post. If you want to support multiple devices or group conversations, you’ll want a complete MLS implementation to work with.

    Brief Recap

    That was a lot of ground to cover, but we’re not done yet.

    (Art by Khia.)

    So far we’ve tackled encryption, initial key agreement, and session key management. However, we did not flesh out how Identity Keys (which are signing keys–Ed25519 specifically–rather than Diffie-Hellman keys) are managed. That detail was just sorta hand-waved until now.

    So let’s talk about that.

    III. Identity Key Management

    There’s a meme among technology bloggers to write a post titled “Falsehoods Programmers Believe About _____”.

    Fortunately for us, Identity is one of the topics that furries are positioned to understand better than most (due to fursonas): Identities have a many-to-many relationship with Humans.

    In an end-to-end encryption protocol, each identity will consist of some identifier (phone number, email address, username and server hostname, etc.) and an Ed25519 keypair (for which the public key will be published).

    But how do you know whether or not a given public key is correct for a given identity?

    This is where we segue into one of the hard problems in cryptography, where the solutions available are entirely dependent on your threat model: Public Key Infrastructure (PKI).

    Some common PKI designs include:

    1. Certificate Authorities (CAs) — TLS does this
    2. Web-of-Trust (WoT) — The PGP ecosystem does this
    3. Trust On First Use (TOFU) — SSH does this
    4. Key Transparency / Certificate Transparency (CT) — TLS also does this for ensuring CA-issued certificates are auditable (although it was originally meant to replace Certificate Authorities)

    And you can sort of choose-your-own-adventure on this one, depending on what’s most appropriate for the type of software you’re building and who your customers are.

    One design I’m particularly fond of is called Gossamer, which is a PKI design without Certificate Authorities, originally designed for making WordPress’s automatic updates more secure (i.e. so every developer can sign their theme and plugin updates).

    Since we only need to maintain an up-to-date repository of Ed25519 identity keys for each participant in our end-to-end encryption protocol, this makes Gossamer a suitable starting point.

    Gossamer specifies a limited grammar of Actions that can be performed: AppendKey, RevokeKey, AppendUpdate, RevokeUpdate, and AttestUpdate. These actions are signed and published to an append-only cryptographic ledger.

    I would propose a sixth action: AttestKey, so you can have WoT-like assurances and key-signing parties. (If nothing else, you should be able to attest that the identity keys of other cryptographic ledgers in the network are authentic at a point in time.)

    IV. Backdoor Resistance

    In the previous section, I proposed the use of Gossamer as a PKI for Identity Keys. This would provide Ed25519 keypairs for use with X3DH and the Double Ratchet, which would in turn provide session keys to use for symmetric authenticated encryption.

    If you’ve implemented everything preceding this section, you have a full-stack end-to-end encryption protocol. But let’s make intelligence agencies and surveillance capitalists even more mad by making it impractical to backdoor our software (and impossible to silently backdoor it).

    How do we pull that off?

    You want Binary Transparency.

    For us, the implementation is simple: Use Gossamer as it was originally intended (i.e. to secure your software distribution channels).

    Gossamer provides up-to-date verification keys and a commitment to a cryptographic ledger of every software update. You can learn more about its inspiration here.

    It isn’t enough to merely use Gossamer to manage keys and update signatures. You need independent third parties to use the AttestUpdate action to assert one or more of the following:

    1. That builds are reproducible from the source code.
    2. That they have reviewed the source code and found no evidence of backdoors or exploitable vulnerabilities.

    (And then you should let your users decide which of these independent third parties they trust to vet software updates.)

    Closing Remarks

    The U.S. Government cries and moans a lot about “criminals going dark” and wonders a lot about how to solve the “going dark problem”.

    If more software developers implement end-to-end encryption in their communications software, then maybe one day they won’t be able to use dragnet surveillance to spy on citizens and they’ll be forced to do actual detective work to solve actual crimes.

    Y’know, like their job description actually entails?

    Let’s normalize end-to-end encryption. Let’s normalize backdoor-resistant software distribution.

    Let’s collectively tell the intelligence community in every sophisticated nation state the one word they don’t hear often enough:

    Especially if you’re a furry. Because we improve everything! :3

    Questions You Might Have

    What About Private Contact Discovery?

    That’s one of the major reasons why the thing we’re building isn’t meant to compete with Signal (and it MUST NOT be advertised as such):

    Signal is a privacy tool, and their servers have no way of identifying who can contact who.

    What we’ve built here isn’t a complete privacy solution, it’s only providing end-to-end encryption (and possibly making NSA employees cry at their desk).

    Does This Design Work with Federation?

    Yes. Each identifier string can be [username] at [hostname].

    What About Network Metadata?

    If you want anonymity, you want to use Tor.

    Why Are You Using Ed25519 Keys for X3DH?

    If you only read the key agreement section of this blog post and the fact that I’m passing around Ed25519 public keys seems weird, you might have missed the identity section of this blog post where I suggested piggybacking on another protocol called Gossamer to handle the distribution of Ed25519 public keys. (Gossamer is also beneficial for backdoor resistance in software update distribution, as described in the subsequent section.)

    Furthermore, we’re actually using birationally equivalent X25519 keys derived from the Ed25519 keypair for the X3DH step. This is a deviation from what Signal does (using X25519 keys everywhere, then inventing an EdDSA variant to support their usage).

    const publicKeyX = await sodium.crypto_sign_ed25519_pk_to_curve25519(foxPublicKey);const secretKeyX = await sodium.crypto_sign_ed25519_sk_to_curve25519(wolfSecretKey);

    (Using fox/wolf instead of Alice/Bob, because it’s cuter.)

    This design pattern has a few advantages:

    1. It makes Gossamer integration seamless, which means you can use Ed25519 for identities and still have a deniable X3DH handshake for 1:1 conversations while implementing the rest of the designs proposed.
    2. This approach to X3DH can be implemented entirely with libsodium functions, without forcing you to write your own cryptography implementations (i.e. for XEdDSA).

    The only disadvantages I’m aware of are:

    1. It deviates from Signal’s core design in a subtle way that means you don’t get to claim the exact same advantages Signal does when it comes to peer review.
    2. Some cryptographers are distrustful of the use of birationally equivalent X25519 keys from Ed25519 keys (although there isn’t a vulnerability any of them have been able to point me to that doesn’t involve torsion groups–which libsodium’s implementation already avoids).

    If these concerns are valid enough to decide against my implementation above, I invite you to talk with cryptographers about your concerns and then propose alternatives.

    Has Any of This Been Implemented Already?

    You can find implementations for the designs discussed on this blog post below:

    • Rawr-X3DH implements X3DH in TypeScript (added 2020-11-23)

    I will update this section of the blog post as implementations surface.

    #authenticatedEncryption #authenticatedKeyExchange #crypto #cryptography #encryption #endToEndEncryption #libsodium #OnlinePrivacy #privacy #SecurityGuidance #symmetricEncryption

  11. If you’re ever tasked with implementing a cryptography feature–whether a high-level protocol or a low-level primitive–you will have to take special care to ensure you’re not leaking secret information through side-channels.

    The descriptions of algorithms you learn in a classroom or textbook are not sufficient for real-world use. (Yes, that means your toy RSA implementation based on GMP from your computer science 101 class isn’t production-ready. Don’t deploy it.)

    But what are these elusive side-channels exactly, and how do you prevent them? And in cases where you cannot prevent them, how can you mitigate the risk to your users?

    Art by Swizz.

    Contents

    • Cryptographic Side-Channels
      • Timing Leaks
      • Power Usage
      • Electromagnetic Emissions
    • Side-Channel Prevention and Mitigation
      • Prevention vs. Mitigation
      • What is Constant-Time?
      • Malicious Environments and Algorithmic Constant-Time
      • Mitigation with Blinding Techniques
    • Design Patterns for Algorithmic Constant-Time Code
      • Constant-Time String Comparison
      • Alternative: “Double HMAC” String Comparison
      • Constant-Time Conditional Select
      • Constant-Time String Inequality Comparison
      • Constant-Time Integer Multiplication
      • Constant-Time Integer Division
      • Constant-Time Modular Inversion
      • Constant-Time Null-Byte Trimming
    • Further Reading and Online Resources
    • Errata

    Cryptographic Side-Channels

    The concept of a side-channel isn’t inherently cryptographic, as Taylor Hornby demonstrates, but a side-channel can be a game over vulnerability in a system meant to maintain confidentiality (even if only for its cryptography keys).

    Cryptographic side-channels allow an attacker to learn secret data from your cryptography system. To accomplish this, the attacker doesn’t necessarily study the system’s output (i.e. ciphertext); instead, they observe some other measurement, such as how much time or power was spent performing an operation, or what kind of electromagnetic radiation was emitted.

    Important: While being resistant to side-channels is a prerequisite for implementations to be secure, it isn’t in and of itself sufficient for security. The underlying design of the primitives, constructions, and high-level protocols needs to be secure first, and that requires a clear and specific threat model for what you’re building.

    Constant-time ECDSA doesn’t help you if you reuse k-values like it’s going out of style, but variable-time ECDSA still leaks your secret key to anyone who cares to probe your response times. Secure cryptography is very demanding.

    Art by Riley.

    Timing Leaks

    Timing side-channels leak secrets through how much time it takes for an operation to complete.

    There are many different flavors of timing leakage, including:

    • Fast-failing comparison functions (memcmp() in C)
    • Cache-timing vulnerabilities (e.g. software AES)
    • Memory access patterns
    • Conditional branches controlled by secrets

    The bad news about timing leaks is that they’re almost always visible to an attacker over the network (including over the Internet (PDF)).

    The good news is that most of them can be prevented or mitigated in software.

    Art by Kyume.

    Power Usage

    Different algorithms or processor operations may require different amounts of power.

    For example, squaring a large number may take less power than multiplying two different large numbers. This observation has led to the development of power analysis attacks against RSA.

    Power analysis is especially relevant for embedded systems and smart cards, which are easier to extract a meaningful signal from than your desktop computer.

    Some information leakage through power usage can be prevented through careful engineering (for example: BearSSL, which uses Montgomery multiplication instead of square-and-multiply).

    But that’s not always an option, so generally these risks are mitigated.

    My reaction when I first learned of power leaks: WATT (Art by Swizz)

    Electromagnetic Emissions

    Your computer is a reliable source of electromagnetic emissions (such as radio waves). Some of these emissions may reveal information about your cryptographic secrets, especially to an attacker with physical proximity to your device.

    The good news is that research into EM emission side-channels isn’t as mature as side-channels through timing leaks or power usage. The bad news is that mitigations for breakthroughs will generally require hardware (e.g. electromagnetic shielding).

    Aren’t computers terrifying? (Art by Swizz)

    Side-Channel Prevention and Mitigation

    Now that we’ve established a rough sense of some of the types of side-channels that are possible, we can begin to identify what causes them and aspire to prevent the leaks from happening–and where we can’t, to mitigate the risk to a reasonable level.

    Note: To be clear, I didn’t cover all of the types of side-channels.

    Prevention vs. Mitigation

    Preventing a side-channel means eliminating the conditions that allow the information leak to occur in the first place. For timing leaks, this means making all algorithms constant-time.

    There are entire classes of side-channel leaks that aren’t possible or practical to mitigate in software. When you encounter one, the best you can hope to do is mitigate the risk.

    Ideally, you want to make the attack more expensive to pull off than the reward an attacker will gain from it.

    What is Constant-Time?

    Toto, I don’t think we’re in Tanelorn Kansas anymore.

    When an implementation is said to be constant-time, what we mean is that the execution time of the code is not a function of its secret inputs.

    Vulnerable AES uses table look-ups to implement the S-Box. Constant-time AES is either implemented in hardware, or is bitsliced.

    Malicious Environments and Algorithmic Constant-Time

    One of the greatest challenges with writing constant-time code is distinguishing between algorithmic constant-time and provably constant-time. The main difference between the two is that you cannot trust your compiler (especially a JIT compiler), which may attempt to optimize your code in a way that reintroduces the side-channel you aspired to remove.

    A sufficiently advanced compiler optimization is indistinguishable from an adversary.

    John Regehr, possibly with apologies to Arthur C. Clarke

    For compiled languages, this is a tractable but expensive problem to solve: You simply have to formally verify everything from the source code to the compiler to the silicon chips that the code will be deployed on, and then audit your supply chain to prevent malicious tampering from going undetected.

    For interpreted languages (e.g. PHP and JavaScript), this formal verification strategy isn’t really an option, unless you want to formally verify the runtime that interprets scripts and prove that the operations remain constant-time on top of all the other layers of distrust.

    Is this level of paranoia really worth the effort?

    For our cases, anyway! (Art by Khia.)

    For that reason, we’re going to assume that algorithmic constant-time is adequate for the duration of this blog post.

    If your threat model prevents you from accepting this assumption, feel free to put in the extra effort yourself and tell me how it goes. After all, as a furry who writes blog posts in my spare time for fun, I don’t exactly have the budget for massive research projects in formal verification.

    Mitigation with Blinding Techniques

    The best mitigation for some side-channels is called blinding: Obfuscating the inputs with some random data, then deobfuscating the outputs with the same random data, such that your keys are not revealed.

    Two well-known examples include RSA decryption and Elliptic Curve Diffie-Hellman. I’ll focus on the latter, since it’s not as widely covered in the literature (although several cryptographers I’ve talked with were somehow knowledgeable about it; I suspect gatekeeping is involved).

    Blinded ECDH Key Exchange

    In typical ECDH implementations, you will convert a point on a Weierstrass curve to a Jacobian coordinate system .

    The exact conversion formula is (, ). The conversion almost makes intuitive sense.

    Where does come from though?

    Art by circuitslime

    It turns out, the choice for is totally arbitrary. Libraries typically set it equal to 1 (for best performance), but you can also set it to a random number. (You cannot set it to 0, however, for obvious reasons.)

    Choosing a random number means the calculations performed over Jacobian coordinates will be obscured by a randomly chosen factor (and thus, if is only used once per scalar multiplication, the bitwise signal the attackers rely on will be lost).

    Blinding techniques are cool. (Art by Khia.)

    I think it’s really cool how one small tweak to the runtime of an algorithm can make it significantly harder to attack.

    Design Patterns for Algorithmic Constant-Time Code

    Mitigation techniques are cool, but preventing side-channels is a better value-add for most software.

    To that end, let’s look at some design patterns for constant-time software. Some of these are relatively common; others, not so much.

    Art by Scout Pawfoot.

    If you prefer TypeScript / JavaScirpt, check out Soatok’s constant-time-js library on Github / NPM.

    Constant-Time String Comparison

    Rather than using string comparison (== in most programming languages, memcmp() in C), you want to compare cryptographic secrets and/or calculated integrity checks with a secure compare algorithm, which looks like this:

    1. Initialize a variable (let’s call it D) to zero.
    2. For each byte of the two strings:
      1. Calculate (lefti XOR righti)
      2. Bitwise OR the current value of D with the result of the XOR, store the output in D
    3. When the loop has concluded, D will be equal to 0 if and only if the two strings are equal.

    In code form, it looks like this:

    <?phpfunction ct_compare(string $left, string $right): bool{    $d = 0;    $length = mb_strlen($left, '8bit');    if (mb_strlen($right, '8bit') !== $length) {        return false; // Lengths differ    }    for ($i = 0; $i < $length; ++$i) {        $leftCharCode = unpack('C', $left[$i])[1];        $rightCharCode = unpack('C', $right[$i])[1];        $d |= ($leftCharCode ^ $rightCharCode);    }    return $d === 0;}

    In this example, I’m using PHP’s unpack() function to avoid cache-timing leaks with ord() and chr(). Of course, you can simply use hash_equals() instead of writing it yourself (PHP 5.6.0+).

    Alternative: “Double HMAC” String Comparison

    If the previous algorithm won’t work (i.e. because you’re concerned your JIT compiler will optimize it away), there is a popular alternative to consider. It’s called “Double HMAC” because it was traditionally used with Encrypt-Then-HMAC schemes.

    The algorithm looks like this:

    1. Generate a random 256-bit key, K. (This can be cached between invocations, but it should be unpredictable.)
    2. Calculate HMAC-SHA256(K, left).
    3. Calculate HMAC-SHA256(K, right).
    4. Return true if the outputs of step 2 and 3 are equal.

    This is provably secure, so long as HMAC-SHA256 is a secure pseudo-random function and the key K is unknown to the attacker.

    In code form, the Double HMAC compare function looks like this:

    <?phpfunction hmac_compare(string $left, string $right): bool{    static $k = null;    if (!$k) $k = random_bytes(32);    return (        hash_hmac('sha256', $left, $k)            ===        hash_hmac('sha256', $right, $k)    );}

    Constant-Time Conditional Select

    I like to imagine a conversation between a cryptography engineer and a Zen Buddhist, that unfolds like so:

    • CE: “I want to eliminate branching side-channels from my code.”
    • ZB: “Then do not have branches in your code.”

    And that is precisely what we intend to do with a constant-time conditional select: Eliminate branches by conditionally returning between one of two strings, without an IF statement.

    Mind. Blown. (Art by Khia.)

    This isn’t as tricky as it sounds. We’re going to use XOR and two’s complement to achieve this.

    The algorithm looks like this:

    1. Convert the selection bit (TRUE/FALSE) into a mask value (-1 for TRUE, 0 for FALSE). Bitwise, -1 looks like 111111111…1111111111, while 0 looks like 00000000…00000000.
    2. Copy the right string into a buffer, call it tmp.
    3. Calculate left XOR right, call it x.
    4. Return (tmp XOR (x AND mask)).

    Once again, in code this algorithm looks like this:

    <?phpfunction ct_select(    bool $returnLeft,    string $left,    string $right): string {    $length = mb_strlen($left, '8bit');    if (mb_strlen($right, '8bit') !== $length) {        throw new Exception('ct_select() expects two strings of equal length');    }        // Mask byte    $mask = (-$returnLeft) & 0xff;    // X    $x = (string) ($left ^ $right);        // Output = Right XOR (X AND Mask)    $output = '';    for ($i = 0; $i < $length; $i++) {        $rightCharCode = unpack('C', $right[$i])[1];        $xCharCode = unpack('C', $x[$i])[1];        $output .= pack(            'C',            $rightCharCode ^ ($xCharCode & $mask)        );    }    return $output;}

    You can test this code for yourself here. The function was designed to read intuitively like a ternary operator.

    A Word of Caution on Cleverness

    In some languages, it may seem tempting to use the bitwise trickery to swap out pointers instead of returning a new buffer. But do not fall for this Siren song.

    If, instead of returning a new buffer, you just swap pointers, what you’ll end up doing is creating a timing leak through your memory access patterns. This can culminate in a timing vulnerability, but even if your data is too big to fit in a processor’s cache line (I dunno, Post-Quantum RSA keys?), there’s another risk to consider.

    Virtual memory addresses are just beautiful lies. Where your data lives on the actual hardware memory is entirely up to the kernel. You can have two blobs with contiguous virtual memory addresses that live on separate memory pages, or even separate RAM chips (if you have multiple).

    If you’re swapping pointers around, and they point to two different pieces of hardware, and one is slightly faster to read from than the other, you can introduce yet another timing attack through which pointer is being referenced by the processor.

    It’s timing leaks all the ways down! (Art by Swizz)

    If you’re swapping between X and Y before performing a calculation, where:

    • X lives on RAM chip 1, which takes 3 ns to read
    • Y lives on RAM chip 2, which takes 4 ns to read

    …then the subsequent use of the swapped pointers reveals whether you’re operating on X or Y in the timing: It will take slightly longer to read from Y than from X.

    The best way to mitigate this problem is to never design your software to have it in the first place. Don’t be clever on this one.

    Constant-Time String Inequality Comparison

    Sometimes you don’t just need to know if two strings are equal, you also need to know which one is larger than the other.

    To accomplish this in constant-time, we need to maintain two state variables:

    1. gt (initialized to 0, will be set to 1 at some point if left > right)
    2. eq (initialized to 1, will be set to 0 at some point if left != right)

    Endian-ness will dictate the direction our algorithm goes, but we’re going to perform two operations in each cycle:

    1. gt should be bitwise ORed with (eq AND ((right – left) right shifted 8 times)
    2. eq should be bitwise ANDed with ((right XOR left) – 1) right shifted 8 times

    If right and left are ever different, eq will be set to 0.

    If the first time they’re different the value for lefti is greater than the value for righti, then the subtraction will produce a negative number. Right shifting a negative number 8 places then bitwise ANDing the result with eq (which is only 1 until two bytes differ, and then 0 henceforth if they do) will result in a value for 1 with gt. Thus, if (righti – lefti) is negative, gt will be set to 1. Otherwise, it remains 0.

    At the end of this loop, return (gt + gt + eq) – 1. This will result in the following possible values:

    • left < right: -1
    • left == right: 0
    • left > right: 1

    The arithmetic based on the possible values of gt and eq should be straightforward.

    • Different (eq == 0) but not greater (gt == 0) means left < right, -1.
    • Different (eq == 0) and greater (gt == 1) means left > right, 1.
    • If eq == 1, no bytes ever differed, so left == right, 0.

    A little endian implementation is as follows:

    <?phpfunction str_compare(string $left, string $right): int{    $length = mb_strlen($left, '8bit');    if (mb_strlen($right, '8bit') !== $length) {        throw new Exception('ct_select() expects two strings of equal length');    }    $gt = 0;    $eq = 1;    $i = $length;    while ($i > 0) {        --$i;        $leftCharCode = unpack('C', $left[$i])[1];        $rightCharCode = unpack('C', $right[$i])[1];        $gt |= (($rightCharCode - $leftCharCode) >> 8) & $eq;        $eq &= (($rightCharCode ^ $leftCharCode) -1) >> 8;    }    return ($gt + $gt + $eq) - 1;}

    Demo for this function is available here.

    Constant-Time Integer Multiplication

    Multiplying two integers is one of those arithmetic operations that should be constant-time. But on many older processors, it isn’t.

    Of course there’s a microarchitecture timing leak! (Art by Khia.)

    Fortunately, there is a workaround. It involves an algorithm called Ancient Egyptian Multiplication in some places or Peasant Multiplication in others.

    Multiplying two numbers and this way looks like this:

    1. Determine the number of operations you need to perform. Generally, this is either known ahead of time or .
    2. Set to 0.
    3. Until the operation count reaches zero:
      1. If the lowest bit of is set, add to .
      2. Left shift by 1.
      3. Right shfit by 1.
    4. Return .

    The main caveat here is that you want to use bitwise operators in step 3.1 to remove the conditional branch.

    Rather than bundle example code in our blog post, please refer to the implementation in sodium_compat (a pure PHP polyfill for libsodium).

    For big number libraries, implementing Karatsuba on top of this integer multiplying function should be faster than attempting to multiply bignums this way.

    Constant-Time Integer Division

    Although some cryptography algorithms call for integer division, division isn’t usually expected to be constant-time.

    However, if you look up a division algorithm for unsigned integers with a remainder, you’ll likely encounter this algorithm, which is almost constant-time:

    if D = 0 then error(DivisionByZeroException) endQ := 0                  -- Initialize quotient and remainder to zeroR := 0                     for i := n − 1 .. 0 do  -- Where n is number of bits in N  R := R << 1           -- Left-shift R by 1 bit  R(0) := N(i)          -- Set the least-significant bit of R equal to bit i of the numerator  if R ≥ D then    R := R − D    Q(i) := 1  endend

    If we use the tricks we learned from implementing constant-time string inequality with constant-time conditional selection, we can implement this algorithm without timing leaks.

    Our constant-time version of this algorithm looks like this:

    if D = 0 then error(DivisionByZeroException) endQ := 0                  -- Initialize quotient and remainder to zeroR := 0                     for i := n − 1 .. 0 do  -- Where n is number of bits in N  R := R << 1           -- Left-shift R by 1 bit  R(0) := N(i)          -- Set the least-significant bit of R equal to bit i of the numerator  compared := ct_compare(R, D) -- Use constant-time inequality    -- if R > D  then compared ==  1, swap = 1  -- if R == D then compared ==  0, swap = 1  -- if R < D  then compared == -1, swap = 0  swap := (1 - ((compared >> 31) & 1))  -- R' = R - D  -- Q' = Q, Q[i] = 1  Rprime := R - D  Qprime := Q  Qprime(i) := 1 -- The i'th bit is set to 1  -- Replace (R with R', Q with Q') if swap == 1  R = ct_select(swap, Rprime, R)  Q = ct_select(swap, Qprime, Q)end

    It’s approximately twice as slow as the original, but it’s constant-time.

    (Art by Khia.)

    Constant-Time Modular Inversion

    Modular inversion is the calculation of for some prime . This is used in a lot of places, but especially in elliptic curve cryptography and RSA.

    Daniel J. Bernstein and Bo-Yin Yang published a paper on fast constant-time GCD and Modular Inversion in 2019. The algorithm in question is somewhat straightforward to implement (although determining whether or not that implementation is safe is left as an exercise to the rest of us).

    A simpler technique is to use Fermat’s Little Theorem: for some prime . This only works with prime fields, and is slower than a Binary GCD (which isn’t necessarily constant-time, as OpenSSL discovered).

    BearSSL provides an implementation (and accompanying documentation) for a constant-time modular inversion algorithm based on Binary GCD.

    (In the future, I may update this section of this blog post with an implementation in PHP, using the GMP extension.)

    Constant-Time Null-Byte Trimming

    Shortly after this guide first went online, security researchers published the Raccoon Attack, which used a timing leak in the number of leading 0 bytes in the pre-master secret–combined with a lattice attack to solve the hidden number problem–to break TLS-DH(E).

    To solve this, you need two components:

    1. A function that returns a slice of an array without timing leaks.
    2. A function that counts the number of significant bytes (i.e. ignores leading zero bytes, counts from the first non-zero byte).

    A timing-safe array resize function needs to do two things:

    1. Touch every byte of the input array once.
    2. Touch every byte of the output array at least once, linearly. The constant-time division algorithm is useful here (to calculate x mod n for the output array index).
    3. Conditionally select between input[x] and the existing output[x_mod_n], based on whether x >= target size.

    I’ve implemented this in my constant-time-js library:

    Further Reading and Online Resources

    If you’re at all interested in cryptographic side-channels, your hunger for knowledge probably won’t be sated by a single blog post. Here’s a collection of articles, papers, books, etc. worth reading.

    Errata

    • 2020-08-27: The original version of this blog post incorrectly attributed Jacobian coordinate blinding to ECDSA hardening, rather than ECDH hardening. This error was brought to my attention by Thai Duong. Thanks Thai!
    • 2020-08-27: Erin correctly pointed out that omitting memory access timing was a disservice to developers, who might not be aware of the risks involved. I’ve updated the post to call this risk out specifically (especially in the conditional select code, which some developers might try to implement with pointer swapping without knowing the risks involved). Thanks Erin!

    I hope you find this guide to side-channels helpful.

    Thanks for reading!

    Follow my blog for more Defense Against the Bark Arts posts in the future.

    https://soatok.blog/2020/08/27/soatoks-guide-to-side-channel-attacks/

    #asymmetricCryptography #constantTime #cryptography #ECDH #ECDSA #ellipticCurveCryptography #RSA #SecurityGuidance #sideChannels #symmetricCryptography

  12. Frage an das #fediverse Ich nutze auf meinem Mac die offizielle Mastodon App, bin aber nicht wirklich glücklich damit. Ich habe ein wenig recherchiert, aber nichts wirklich besseres gefunden. Habt ihr Tipps? Welchen #mastodonclient nutzt ihr auf dem #mac ?