#datasets — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #datasets, aggregated by home.social.
-
UC Berkeley: New Federal Data Field Guide Helps Americans Navigate the Rich Diversity of Our Federal Data Ecosystem. “Denice Ross, who served as the nation’s second U.S. Chief Data Scientist, and her former White House colleague Christopher Marcum have launched the Federal Data Field Guide, a free, plain-language resource designed to help Americans understand, use, and advocate for the full […]
https://rbfirehose.com/2026/05/28/uc-berkeley-new-federal-data-field-guide-helps-americans-navigate-the-rich-diversity-of-our-federal-data-ecosystem/ -
ProtoThema: Data.gov.gr is Greece’s new national open data portal – what it means for citizens, businesses, and AI. “The new open data portal Data.gov.gr, which aspires to become the central access point to public sector data in Greece for citizens, researchers, universities, businesses, and organisations, was presented today at the Ministry of Digital Governance.”
https://rbfirehose.com/2026/05/15/protothema-data-gov-gr-is-greeces-new-national-open-data-portal-what-it-means-for-citizens-businesses-and-ai/ -
ProtoThema: Data.gov.gr is Greece’s new national open data portal – what it means for citizens, businesses, and AI. “The new open data portal Data.gov.gr, which aspires to become the central access point to public sector data in Greece for citizens, researchers, universities, businesses, and organisations, was presented today at the Ministry of Digital Governance.”
https://rbfirehose.com/2026/05/15/protothema-data-gov-gr-is-greeces-new-national-open-data-portal-what-it-means-for-citizens-businesses-and-ai/ -
ProtoThema: Data.gov.gr is Greece’s new national open data portal – what it means for citizens, businesses, and AI. “The new open data portal Data.gov.gr, which aspires to become the central access point to public sector data in Greece for citizens, researchers, universities, businesses, and organisations, was presented today at the Ministry of Digital Governance.”
https://rbfirehose.com/2026/05/15/protothema-data-gov-gr-is-greeces-new-national-open-data-portal-what-it-means-for-citizens-businesses-and-ai/ -
ProtoThema: Data.gov.gr is Greece’s new national open data portal – what it means for citizens, businesses, and AI. “The new open data portal Data.gov.gr, which aspires to become the central access point to public sector data in Greece for citizens, researchers, universities, businesses, and organisations, was presented today at the Ministry of Digital Governance.”
https://rbfirehose.com/2026/05/15/protothema-data-gov-gr-is-greeces-new-national-open-data-portal-what-it-means-for-citizens-businesses-and-ai/ -
ProtoThema: Data.gov.gr is Greece’s new national open data portal – what it means for citizens, businesses, and AI. “The new open data portal Data.gov.gr, which aspires to become the central access point to public sector data in Greece for citizens, researchers, universities, businesses, and organisations, was presented today at the Ministry of Digital Governance.”
https://rbfirehose.com/2026/05/15/protothema-data-gov-gr-is-greeces-new-national-open-data-portal-what-it-means-for-citizens-businesses-and-ai/ -
#Wirestock, a platform for photographers, pivoted to a #dataprovider for #AIlabs in 2023, supplying #datasets of images, videos, and design assets. The company raised $23 million in Series A funding to expand its data supply business, which currently provides multimodal data to six major foundation model makers. https://techcrunch.com/2026/05/14/wirestock-raises-23m-to-supply-multi-modal-data-to-ai-labs/?eicker.news #tech #media #news
-
#Wirestock, a platform for photographers, pivoted to a #dataprovider for #AIlabs in 2023, supplying #datasets of images, videos, and design assets. The company raised $23 million in Series A funding to expand its data supply business, which currently provides multimodal data to six major foundation model makers. https://techcrunch.com/2026/05/14/wirestock-raises-23m-to-supply-multi-modal-data-to-ai-labs/?eicker.news #tech #media #news
-
#Wirestock, a platform for photographers, pivoted to a #dataprovider for #AIlabs in 2023, supplying #datasets of images, videos, and design assets. The company raised $23 million in Series A funding to expand its data supply business, which currently provides multimodal data to six major foundation model makers. https://techcrunch.com/2026/05/14/wirestock-raises-23m-to-supply-multi-modal-data-to-ai-labs/?eicker.news #tech #media #news
-
#Wirestock, a platform for photographers, pivoted to a #dataprovider for #AIlabs in 2023, supplying #datasets of images, videos, and design assets. The company raised $23 million in Series A funding to expand its data supply business, which currently provides multimodal data to six major foundation model makers. https://techcrunch.com/2026/05/14/wirestock-raises-23m-to-supply-multi-modal-data-to-ai-labs/?eicker.news #tech #media #news
-
I want speakers one can trust. Doorbells that don’t feed into massive #datasets. I want #algorithms that exploit human emotion to be gone. I want to save our planet and help the people in it. We get one life. We are living it with everyone here. Maybe I was meant to be born in the 60s.
#phone #AI #upscaled #datacenters #privacy -
I want speakers one can trust. Doorbells that don’t feed into massive #datasets. I want #algorithms that exploit human emotion to be gone. I want to save our planet and help the people in it. We get one life. We are living it with everyone here. Maybe I was meant to be born in the 60s.
#phone #AI #upscaled #datacenters #privacy -
UC San Diego: From Molecules to Meaning: A Search Engine for the Chemistry of Life. “An international team led by researchers at University of California San Diego and University of California, Riverside has developed a free, web-based platform designed to make public metabolomics data more accessible.”
https://rbfirehose.com/2026/05/14/from-molecules-to-meaning-a-search-engine-for-the-chemistry-of-life-uc-san-diego/ -
UC San Diego: From Molecules to Meaning: A Search Engine for the Chemistry of Life. “An international team led by researchers at University of California San Diego and University of California, Riverside has developed a free, web-based platform designed to make public metabolomics data more accessible.”
https://rbfirehose.com/2026/05/14/from-molecules-to-meaning-a-search-engine-for-the-chemistry-of-life-uc-san-diego/ -
UC San Diego: From Molecules to Meaning: A Search Engine for the Chemistry of Life. “An international team led by researchers at University of California San Diego and University of California, Riverside has developed a free, web-based platform designed to make public metabolomics data more accessible.”
https://rbfirehose.com/2026/05/14/from-molecules-to-meaning-a-search-engine-for-the-chemistry-of-life-uc-san-diego/ -
UC San Diego: From Molecules to Meaning: A Search Engine for the Chemistry of Life. “An international team led by researchers at University of California San Diego and University of California, Riverside has developed a free, web-based platform designed to make public metabolomics data more accessible.”
https://rbfirehose.com/2026/05/14/from-molecules-to-meaning-a-search-engine-for-the-chemistry-of-life-uc-san-diego/ -
UC San Diego: From Molecules to Meaning: A Search Engine for the Chemistry of Life. “An international team led by researchers at University of California San Diego and University of California, Riverside has developed a free, web-based platform designed to make public metabolomics data more accessible.”
https://rbfirehose.com/2026/05/14/from-molecules-to-meaning-a-search-engine-for-the-chemistry-of-life-uc-san-diego/ -
The Guardian: ‘Things were going dark left and right’: the race to save US government datasets before they’re deleted. “André is part of a group of ‘data rescuers’ who have banded together during Trump’s second term. They have been quietly racing to save hundreds of critical government datasets before they are no longer available. Now known as the Data Rescue Project, it’s a […]
https://rbfirehose.com/2026/05/09/things-were-going-dark-left-and-right-the-race-to-save-us-government-datasets-before-theyre-deleted-the-guardian/ -
The Guardian: ‘Things were going dark left and right’: the race to save US government datasets before they’re deleted. “André is part of a group of ‘data rescuers’ who have banded together during Trump’s second term. They have been quietly racing to save hundreds of critical government datasets before they are no longer available. Now known as the Data Rescue Project, it’s a […]
https://rbfirehose.com/2026/05/09/things-were-going-dark-left-and-right-the-race-to-save-us-government-datasets-before-theyre-deleted-the-guardian/ -
The Guardian: ‘Things were going dark left and right’: the race to save US government datasets before they’re deleted. “André is part of a group of ‘data rescuers’ who have banded together during Trump’s second term. They have been quietly racing to save hundreds of critical government datasets before they are no longer available. Now known as the Data Rescue Project, it’s a […]
https://rbfirehose.com/2026/05/09/things-were-going-dark-left-and-right-the-race-to-save-us-government-datasets-before-theyre-deleted-the-guardian/ -
The Guardian: ‘Things were going dark left and right’: the race to save US government datasets before they’re deleted. “André is part of a group of ‘data rescuers’ who have banded together during Trump’s second term. They have been quietly racing to save hundreds of critical government datasets before they are no longer available. Now known as the Data Rescue Project, it’s a […]
https://rbfirehose.com/2026/05/09/things-were-going-dark-left-and-right-the-race-to-save-us-government-datasets-before-theyre-deleted-the-guardian/ -
The Guardian: ‘Things were going dark left and right’: the race to save US government datasets before they’re deleted. “André is part of a group of ‘data rescuers’ who have banded together during Trump’s second term. They have been quietly racing to save hundreds of critical government datasets before they are no longer available. Now known as the Data Rescue Project, it’s a […]
https://rbfirehose.com/2026/05/09/things-were-going-dark-left-and-right-the-race-to-save-us-government-datasets-before-theyre-deleted-the-guardian/ -
"Ironically, several of the people who had been included in the set without any consent are known for their work critiquing surveillance and facial recognition itself, including filmmaker Laura Poitras, digital rights activist Jillian York, critic Evgeny Morozov, and author of Surveillance Capitalism Shoshana Zuboff. "
(re Microsoft's MS-CELEB)
#AI #Surveillance #Datasets #ImageNet #Microsoft #MS-CELEB #KateCrawford
-
University of Edinburgh: AI fails to make inroads with cybercriminals. “Cybercriminals have been struggling to adopt AI in their work, reports the first of its kind study that analysed a dataset of 100 million posts from underground cybercrime communities.”
https://rbfirehose.com/2026/05/05/university-of-edinburgh-ai-fails-to-make-inroads-with-cybercriminals/ -
Por más que tengan buenas intenciones, lo que para ustedes podría ser un uso «ético y responsable» es avalar y legitimar la vulneración de derechos sistemática que sostiene toda la industria de la IA generativa comercial.
📌 Ningún modelo de IAG comercial funciona sin VIOLAR derechos de autor.
#IA #IAgenerativa #AI #genAI #generativeAI #datasets #theft #technology #ethics
-
Scientific Data: Transcribing historical Canadian weather data. “Historical weather journals from across Canada, spanning 1768–1884, have been transcribed from handwritten records into machine readable formats. The NORTHERN (Nineteenth-century Overseas Records Transcribed for Historical Environmental Reconstruction in the North) project transcribed nearly 2 million weather observations from […]
https://rbfirehose.com/2026/05/01/scientific-data-transcribing-historical-canadian-weather-data/ -
Scientific Data: Transcribing historical Canadian weather data. “Historical weather journals from across Canada, spanning 1768–1884, have been transcribed from handwritten records into machine readable formats. The NORTHERN (Nineteenth-century Overseas Records Transcribed for Historical Environmental Reconstruction in the North) project transcribed nearly 2 million weather observations from […]
https://rbfirehose.com/2026/05/01/scientific-data-transcribing-historical-canadian-weather-data/ -
Scientific Data: Transcribing historical Canadian weather data. “Historical weather journals from across Canada, spanning 1768–1884, have been transcribed from handwritten records into machine readable formats. The NORTHERN (Nineteenth-century Overseas Records Transcribed for Historical Environmental Reconstruction in the North) project transcribed nearly 2 million weather observations from […]
https://rbfirehose.com/2026/05/01/scientific-data-transcribing-historical-canadian-weather-data/ -
Scientific Data: Transcribing historical Canadian weather data. “Historical weather journals from across Canada, spanning 1768–1884, have been transcribed from handwritten records into machine readable formats. The NORTHERN (Nineteenth-century Overseas Records Transcribed for Historical Environmental Reconstruction in the North) project transcribed nearly 2 million weather observations from […]
https://rbfirehose.com/2026/05/01/scientific-data-transcribing-historical-canadian-weather-data/ -
Scientific Data: Transcribing historical Canadian weather data. “Historical weather journals from across Canada, spanning 1768–1884, have been transcribed from handwritten records into machine readable formats. The NORTHERN (Nineteenth-century Overseas Records Transcribed for Historical Environmental Reconstruction in the North) project transcribed nearly 2 million weather observations from […]
https://rbfirehose.com/2026/05/01/scientific-data-transcribing-historical-canadian-weather-data/ -
JSONL y su Importancia en el Man…
El formato JSONL (JSON Lines) es una variante del formato JSON que permite almacenar grandes volúmenes de datos en un archivo, donde cada línea representa un objeto JSON.
https://norvik.tech/news/analisis-jsonl-formato-linea-por-linea-datasets-ai
#Technology #Jsonl #Datasets #Ai #ManejoDeDatos #NorvikTech #DesarrolloSoftware #TechInnovation
-
USGS: New Nationwide Tool Helps Answer: Do We Have Enough Water?. “The USGS National Water Availability Assessment Data Companion provides water managers, agricultural communities and researchers with detailed information about water supply and demand across approximately 80,000 watersheds nationwide.”
https://rbfirehose.com/2026/04/28/new-nationwide-tool-helps-answer-do-we-have-enough-water-usgs/ -
Arizona State University: Largest genomic dataset of Indigenous Americans to date sheds light on history, diversity and health. “In a new study published today in Nature, an international team led by the Institute of Evolutionary Biology, with partners at the University of São Paulo and Arizona State University, analyzed genomes from Indigenous populations spanning North America to Patagonia. […]
https://rbfirehose.com/2026/04/27/arizona-state-university-largest-genomic-dataset-of-indigenous-americans-to-date-sheds-light-on-history-diversity-and-health/ -
Arizona State University: Largest genomic dataset of Indigenous Americans to date sheds light on history, diversity and health. “In a new study published today in Nature, an international team led by the Institute of Evolutionary Biology, with partners at the University of São Paulo and Arizona State University, analyzed genomes from Indigenous populations spanning North America to Patagonia. […]
https://rbfirehose.com/2026/04/27/arizona-state-university-largest-genomic-dataset-of-indigenous-americans-to-date-sheds-light-on-history-diversity-and-health/ -
Max-Planck-Gesellschaft: Largest open dataset of great ape cognition. “A new publication introduces the EVApeCognition Dataset, a major open-access resource designed to advance research into the cognition of great apes. Compiling 262 experimental datasets from 150 scientific publications, the dataset was produced at the Wolfgang Köhler Primate Research Center in Leipzig, Germany, between 2004 […]
https://rbfirehose.com/2026/04/24/max-planck-gesellschaft-largest-open-dataset-of-great-ape-cognition/ -
MIT CSAIL: MIT researchers build the world’s largest collection of Olympiad-level math problems — and open it to everyone. “MathNet is the largest high-quality dataset of proof-based math problems ever created, and it is not close. Comprising more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, it is five times larger than the next […]
https://rbfirehose.com/2026/04/23/mit-csail-mit-researchers-build-the-worlds-largest-collection-of-olympiad-level-math-problems-and-open-it-to-everyone/ -
MIT CSAIL: MIT researchers build the world’s largest collection of Olympiad-level math problems — and open it to everyone. “MathNet is the largest high-quality dataset of proof-based math problems ever created, and it is not close. Comprising more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, it is five times larger than the next […]
https://rbfirehose.com/2026/04/23/mit-csail-mit-researchers-build-the-worlds-largest-collection-of-olympiad-level-math-problems-and-open-it-to-everyone/ -
MIT CSAIL: MIT researchers build the world’s largest collection of Olympiad-level math problems — and open it to everyone. “MathNet is the largest high-quality dataset of proof-based math problems ever created, and it is not close. Comprising more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, it is five times larger than the next […]
https://rbfirehose.com/2026/04/23/mit-csail-mit-researchers-build-the-worlds-largest-collection-of-olympiad-level-math-problems-and-open-it-to-everyone/ -
MIT CSAIL: MIT researchers build the world’s largest collection of Olympiad-level math problems — and open it to everyone. “MathNet is the largest high-quality dataset of proof-based math problems ever created, and it is not close. Comprising more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, it is five times larger than the next […]
https://rbfirehose.com/2026/04/23/mit-csail-mit-researchers-build-the-worlds-largest-collection-of-olympiad-level-math-problems-and-open-it-to-everyone/ -
MIT CSAIL: MIT researchers build the world’s largest collection of Olympiad-level math problems — and open it to everyone. “MathNet is the largest high-quality dataset of proof-based math problems ever created, and it is not close. Comprising more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, it is five times larger than the next […]
https://rbfirehose.com/2026/04/23/mit-csail-mit-researchers-build-the-worlds-largest-collection-of-olympiad-level-math-problems-and-open-it-to-everyone/ -
Nature: Dozens of AI disease-prediction models were trained on dubious data. “Dubious data sets are being used to train artificial-intelligence models that are designed to predict people’s risk of stroke and diabetes, researchers report in a preprint1 on medRxiv. Some of the models seem to have been used in clinical settings, although it’s not clear whether this has led to flawed diagnoses. […]
https://rbfirehose.com/2026/04/22/nature-dozens-of-ai-disease-prediction-models-were-trained-on-dubious-data/ -
Nature: Dozens of AI disease-prediction models were trained on dubious data. “Dubious data sets are being used to train artificial-intelligence models that are designed to predict people’s risk of stroke and diabetes, researchers report in a preprint1 on medRxiv. Some of the models seem to have been used in clinical settings, although it’s not clear whether this has led to flawed diagnoses. […]
https://rbfirehose.com/2026/04/22/nature-dozens-of-ai-disease-prediction-models-were-trained-on-dubious-data/ -
Nature: Dozens of AI disease-prediction models were trained on dubious data. “Dubious data sets are being used to train artificial-intelligence models that are designed to predict people’s risk of stroke and diabetes, researchers report in a preprint1 on medRxiv. Some of the models seem to have been used in clinical settings, although it’s not clear whether this has led to flawed diagnoses. […]
https://rbfirehose.com/2026/04/22/nature-dozens-of-ai-disease-prediction-models-were-trained-on-dubious-data/ -
Nature: Dozens of AI disease-prediction models were trained on dubious data. “Dubious data sets are being used to train artificial-intelligence models that are designed to predict people’s risk of stroke and diabetes, researchers report in a preprint1 on medRxiv. Some of the models seem to have been used in clinical settings, although it’s not clear whether this has led to flawed diagnoses. […]
https://rbfirehose.com/2026/04/22/nature-dozens-of-ai-disease-prediction-models-were-trained-on-dubious-data/ -
Nature: Dozens of AI disease-prediction models were trained on dubious data. “Dubious data sets are being used to train artificial-intelligence models that are designed to predict people’s risk of stroke and diabetes, researchers report in a preprint1 on medRxiv. Some of the models seem to have been used in clinical settings, although it’s not clear whether this has led to flawed diagnoses. […]
https://rbfirehose.com/2026/04/22/nature-dozens-of-ai-disease-prediction-models-were-trained-on-dubious-data/ -
Brazil Data Cube is an INPE technological RDI project to generate datasets from large volumes of remote sensing images and process and analyze them. #datasets
-
Methodology matters 🔍
A new AAP article by Isaac Ullah shows how simulation modeling and microrefuse sampling impact archaeological interpretation. These findings highlight how research design and training help us build more accurate #datasets.🏺
FirstView article:
https://www.cambridge.org/core/journals/advances-in-archaeological-practice/article/sampling-matters-what-simulation-modeling-and-microrefuse-sampling-practice-reveal-about-archaeological-sampling-training-and-design/7DAB02D2650D5E86DACCC3B1CA9BC18D -
USGS: Land Treatment Digital Library Version 2.0 Launch. “The U.S. Geological Survey launched an updated version (2.0) of the LTDL to improve user experience, include additional data, and enhance BLM access. Notable additions to the website include interactive figures for each treatment polygon that display the monthly average temperature and precipitation from PRISM Climate Group at Oregon […]
https://rbfirehose.com/2026/04/07/usgs-land-treatment-digital-library-version-2-0-launch/ -
USGS: Land Treatment Digital Library Version 2.0 Launch. “The U.S. Geological Survey launched an updated version (2.0) of the LTDL to improve user experience, include additional data, and enhance BLM access. Notable additions to the website include interactive figures for each treatment polygon that display the monthly average temperature and precipitation from PRISM Climate Group at Oregon […]
https://rbfirehose.com/2026/04/07/usgs-land-treatment-digital-library-version-2-0-launch/ -
USGS: Land Treatment Digital Library Version 2.0 Launch. “The U.S. Geological Survey launched an updated version (2.0) of the LTDL to improve user experience, include additional data, and enhance BLM access. Notable additions to the website include interactive figures for each treatment polygon that display the monthly average temperature and precipitation from PRISM Climate Group at Oregon […]
https://rbfirehose.com/2026/04/07/usgs-land-treatment-digital-library-version-2-0-launch/