home.social

#deanonymization — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #deanonymization, aggregated by home.social.

  1. Zero-Knowledge Proofs Evolve to Bypass Age-Verification Checks

    As the digital landscape continues to shift, it's only a matter of time before you'll have to face the music - and the cameras - when it comes to age verification checks. But what's really behind these on-camera checks: protecting kids or creating a way for governments to control access to online platforms?

    osintsights.com/zero-knowledge

    #AgeVerification #ZeroknowledgeProofs #Deanonymization #OnlinePrivacy #DigitalRights

  2. Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.

    This is a post about #systemd.

    #AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp

  3. Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.

    This is a post about #systemd.

    #AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp

  4. Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.

    This is a post about #systemd.

    #AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp

  5. Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.

    This is a post about #systemd.

    #AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp

  6. Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.

    This is a post about #systemd.

    #AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp

  7. [en] Paper: LLMs can be used to perform at-scale #deanonymization

    "With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."

    "Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."

    "We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."

    Note: also check paragraphs "Potential harms" and "Potential benefits".

    arxiv.org/html/2602.16800

    #llm #research

  8. [en] Paper: LLMs can be used to perform at-scale #deanonymization

    "With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."

    "Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."

    "We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."

    Note: also check paragraphs "Potential harms" and "Potential benefits".

    arxiv.org/html/2602.16800

    #llm #research

  9. [en] Paper: LLMs can be used to perform at-scale #deanonymization

    "With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."

    "Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."

    "We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."

    Note: also check paragraphs "Potential harms" and "Potential benefits".

    arxiv.org/html/2602.16800

    #llm #research

  10. [en] Paper: LLMs can be used to perform at-scale #deanonymization

    "With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."

    "Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."

    "We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."

    Note: also check paragraphs "Potential harms" and "Potential benefits".

    arxiv.org/html/2602.16800

    #llm #research

  11. [en] Paper: LLMs can be used to perform at-scale #deanonymization

    "With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."

    "Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."

    "We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."

    Note: also check paragraphs "Potential harms" and "Potential benefits".

    arxiv.org/html/2602.16800

    #llm #research

  12. "TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.

    While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."

    simonlermen.substack.com/p/lar

    #AI #GenerativeAI #Anonymity #Deanonymization #LLMs

  13. "TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.

    While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."

    simonlermen.substack.com/p/lar

    #AI #GenerativeAI #Anonymity #Deanonymization #LLMs

  14. "TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.

    While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."

    simonlermen.substack.com/p/lar

    #AI #GenerativeAI #Anonymity #Deanonymization #LLMs

  15. "TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.

    While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."

    simonlermen.substack.com/p/lar

    #AI #GenerativeAI #Anonymity #Deanonymization #LLMs

  16. "TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.

    While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."

    simonlermen.substack.com/p/lar

    #AI #GenerativeAI #Anonymity #Deanonymization #LLMs

  17. Large-scale online deanonymization with LLMs
    From Cornel University Computer Science

    We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

    Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè

    arxiv.org/abs/2602

    #computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
    #deanonymization

  18. Large-scale online deanonymization with LLMs
    From Cornel University Computer Science

    We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

    Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè

    arxiv.org/abs/2602

    #computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
    #deanonymization

  19. Large-scale online deanonymization with LLMs
    From Cornel University Computer Science

    We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

    Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè

    arxiv.org/abs/2602

    #computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
    #deanonymization

  20. Large-scale online deanonymization with LLMs
    From Cornel University Computer Science

    We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

    Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè

    arxiv.org/abs/2602

    #computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
    #deanonymization

  21. Large-scale online deanonymization with LLMs
    From Cornel University Computer Science

    We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

    Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè

    arxiv.org/abs/2602

    #computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
    #deanonymization

  22. You've got nothing to hide, do you?

    »We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«

    arxiv.org/abs/2602.16800

    "#AI" #privacy #pseudonymity #anonymity #LLM

  23. You've got nothing to hide, do you?

    »We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«

    arxiv.org/abs/2602.16800

    "#AI" #privacy #pseudonymity #anonymity #LLM

  24. You've got nothing to hide, do you?

    »We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«

    arxiv.org/abs/2602.16800

    "#AI" #privacy #pseudonymity #anonymity #LLM

  25. You've got nothing to hide, do you?

    »We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«

    arxiv.org/abs/2602.16800

    "#AI" #privacy #pseudonymity #anonymity #LLM

  26. You've got nothing to hide, do you?

    »We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«

    arxiv.org/abs/2602.16800

    "#AI" #privacy #pseudonymity #anonymity #LLM

  27. RE: tldr.nettime.org/@remixtures/1

    #privacy #identity #deanonymization

    This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.

    This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.

    #metadata matters more than ever.

  28. RE: tldr.nettime.org/@remixtures/1

    #privacy #identity #deanonymization

    This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.

    This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.

    #metadata matters more than ever.

  29. RE: tldr.nettime.org/@remixtures/1

    This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.

    This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.

    matters more than ever.

  30. RE: tldr.nettime.org/@remixtures/1

    #privacy #identity #deanonymization

    This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.

    This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.

    #metadata matters more than ever.

  31. RE: tldr.nettime.org/@remixtures/1

    #privacy #identity #deanonymization

    This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.

    This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.

    #metadata matters more than ever.

  32. Large-scale online deanonymization with LLMs

    "We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

    arxiv.org/html/2602.16800v1

    #AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization

  33. Large-scale online deanonymization with LLMs

    "We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

    arxiv.org/html/2602.16800v1

    #AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization

  34. Large-scale online deanonymization with LLMs

    "We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

    arxiv.org/html/2602.16800v1

    #AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization

  35. Large-scale online deanonymization with LLMs

    "We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

    arxiv.org/html/2602.16800v1

    #AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization

  36. Large-scale online deanonymization with LLMs

    "We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."

    arxiv.org/html/2602.16800v1

    #AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization

  37. Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party

    (And likely more #DRM & #DMCA bullshit)

    #AgeVerification #DeAnonymization

  38. Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party

    (And likely more & bullshit)

  39. Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party

    (And likely more #DRM & #DMCA bullshit)

    #AgeVerification #DeAnonymization

  40. Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party

    (And likely more #DRM & #DMCA bullshit)

    #AgeVerification #DeAnonymization