#deanonymization — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #deanonymization, aggregated by home.social.
-
Zero-Knowledge Proofs Evolve to Bypass Age-Verification Checks
As the digital landscape continues to shift, it's only a matter of time before you'll have to face the music - and the cameras - when it comes to age verification checks. But what's really behind these on-camera checks: protecting kids or creating a way for governments to control access to online platforms?
#AgeVerification #ZeroknowledgeProofs #Deanonymization #OnlinePrivacy #DigitalRights
-
The Risks Of Anonymity In The Age Of Generative AI
-
The Risks Of Anonymity In The Age Of Generative AI
-
The Risks Of Anonymity In The Age Of Generative AI
-
The Risks Of Anonymity In The Age Of Generative AI
-
The Risks Of Anonymity In The Age Of Generative AI
-
Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.
This is a post about #systemd.
#AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp
-
Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.
This is a post about #systemd.
#AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp
-
Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.
This is a post about #systemd.
#AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp
-
Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.
This is a post about #systemd.
#AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp
-
Well, we don't decide what the customers decide to ship, or where it goes. We don't enforce policy like that; our job is just to provide a neutral platform for others to use to accomplish their goals. We don't even know what they ship - we just build the train cars. We don't even know where Bergen-Belsen is.
This is a post about #systemd.
#AgeVerification #AgeGate #AgeGating #identification #deanonymization #anonymous #fascism #fascist #compliance #ComplyInAdvance #DoNotComply #DoNotComplyInAdvance #resistance #freedom #FreeSoftware #OpenSource #coconspirator #accomplice #DeathCamp #ConcentrationCamp
-
[en] Paper: LLMs can be used to perform at-scale #deanonymization
"With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."
"Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."
"We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."
Note: also check paragraphs "Potential harms" and "Potential benefits".
-
[en] Paper: LLMs can be used to perform at-scale #deanonymization
"With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."
"Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."
"We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."
Note: also check paragraphs "Potential harms" and "Potential benefits".
-
[en] Paper: LLMs can be used to perform at-scale #deanonymization
"With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."
"Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."
"We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."
Note: also check paragraphs "Potential harms" and "Potential benefits".
-
[en] Paper: LLMs can be used to perform at-scale #deanonymization
"With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."
"Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."
"We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."
Note: also check paragraphs "Potential harms" and "Potential benefits".
-
[en] Paper: LLMs can be used to perform at-scale #deanonymization
"With full Internet access, our #agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given #pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator."
"Our results show that the practical #obscurity protecting pseudonymous users online no longer holds and that #threat models for online #privacy need to be reconsidered."
"We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on #unstructured text at scale."
Note: also check paragraphs "Potential harms" and "Potential benefits".
-
"TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.
While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."
https://simonlermen.substack.com/p/large-scale-online-deanonymization
-
"TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.
While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."
https://simonlermen.substack.com/p/large-scale-online-deanonymization
-
"TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.
While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."
https://simonlermen.substack.com/p/large-scale-online-deanonymization
-
"TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.
While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."
https://simonlermen.substack.com/p/large-scale-online-deanonymization
-
"TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.
While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests – then search for you on the web. In our new research, we show that this is not only possible but increasingly practical."
https://simonlermen.substack.com/p/large-scale-online-deanonymization
-
LLMs are good at deanonymization.
https://arxiv.org/abs/2602.16800
#AI #LLMs #deanonymization #cybersecurity -
LLMs are good at deanonymization.
https://arxiv.org/abs/2602.16800
#AI #LLMs #deanonymization #cybersecurity -
LLMs are good at deanonymization.
https://arxiv.org/abs/2602.16800
#AI #LLMs #deanonymization #cybersecurity -
LLMs are good at deanonymization.
https://arxiv.org/abs/2602.16800
#AI #LLMs #deanonymization #cybersecurity -
LLMs are good at deanonymization.
https://arxiv.org/abs/2602.16800
#AI #LLMs #deanonymization #cybersecurity -
Large-scale online deanonymization with LLMs
From Cornel University Computer ScienceWe show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.
Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè
#computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
#deanonymization -
Large-scale online deanonymization with LLMs
From Cornel University Computer ScienceWe show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.
Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè
#computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
#deanonymization -
Large-scale online deanonymization with LLMs
From Cornel University Computer ScienceWe show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.
Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè
#computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
#deanonymization -
Large-scale online deanonymization with LLMs
From Cornel University Computer ScienceWe show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.
Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè
#computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
#deanonymization -
Large-scale online deanonymization with LLMs
From Cornel University Computer ScienceWe show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.
Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramè
#computerscience #cornelluniversity #AiResearch #privacy #anonymity #llm #HackNews #athropic #pseudonymity
#deanonymization -
You've got nothing to hide, do you?
»We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«
-
You've got nothing to hide, do you?
»We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«
-
You've got nothing to hide, do you?
»We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«
-
You've got nothing to hide, do you?
»We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«
-
You've got nothing to hide, do you?
»We show that large language models can be used to perform at-scale #deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline«
-
RE: https://tldr.nettime.org/@remixtures/116148578797801271
#privacy #identity #deanonymization
This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.
This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.
#metadata matters more than ever.
-
RE: https://tldr.nettime.org/@remixtures/116148578797801271
#privacy #identity #deanonymization
This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.
This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.
#metadata matters more than ever.
-
RE: https://tldr.nettime.org/@remixtures/116148578797801271
#privacy #identity #deanonymization
This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.
This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.
#metadata matters more than ever.
-
RE: https://tldr.nettime.org/@remixtures/116148578797801271
#privacy #identity #deanonymization
This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.
This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.
#metadata matters more than ever.
-
RE: https://tldr.nettime.org/@remixtures/116148578797801271
#privacy #identity #deanonymization
This is going to more and more problematic as we go. And I guarantee many companies are already using this to identify you or any individual of their interest.
This is not surprising though, but it reinforces that we must have better methods for privacy protection and data anonymization. Changing names and removing PII information is not enough.
#metadata matters more than ever.
-
Large-scale online deanonymization with LLMs
"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."
https://arxiv.org/html/2602.16800v1
#AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization
-
Large-scale online deanonymization with LLMs
"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."
https://arxiv.org/html/2602.16800v1
#AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization
-
Large-scale online deanonymization with LLMs
"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."
https://arxiv.org/html/2602.16800v1
#AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization
-
Large-scale online deanonymization with LLMs
"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."
https://arxiv.org/html/2602.16800v1
#AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization
-
Large-scale online deanonymization with LLMs
"We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered."
https://arxiv.org/html/2602.16800v1
#AI #GenerativeAI #LLMs #Anonymity #Privacy #Deanonymization
-
Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party
-
Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party
-
Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party
-
Especially with "generative AI" being available I don't see how one can do any kind of objective "age verification" reliably, without some kind of (government or (paid) 3rd party) doing what is essential an "identity verification" or de-anonymization & some kind of permanent audit trail being created outside your control maintained by middlemen: government or third party