#datascraping — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #datascraping, aggregated by home.social.
-
Learn everything you need to know about Data Scraping via these 70 free HackerNoon blog posts. https://hackernoon.com/70-blog-posts-to-learn-about-data-scraping #datascraping
-
https://winbuzzer.com/2026/04/09/youtubers-sue-apple-for-scraping-videos-to-train-ai-xcxwbn/
YouTubers Sue Apple for Scraping Videos to Train AI Models
#AI #Apple #YouTube #GenAI #AITraining #AIModels #DataScraping #Copyright #Lawsuits #FairUse #BigTech #AIEthics
-
I'm slightly creeped out but not surprised. I was editing a music score on my laptop recently and I added an instruction to play the piece "robotic". The next time I logged into Indeed, the first job recommendation to come up is for Robotics Operator. Is Indeed scraping data from my recent documents for keywords?
Always check your firewall.
-
The “17.5 million Instagram user data leak” making rounds in 2026? Old news
The data from 2022 was already leaked in 2023.
We broke down all 3 dumps - same records
Don’t fall for clickbait reports!
Read: https://hackread.com/instagram-user-data-leak-scraped-records-2022/
-
LinkedIn's 2025 Data Crisis: 4.3 Billion Records Leaked, Risks Rise https://www.webpronews.com/linkedins-2025-data-crisis-4-3-billion-records-leaked-risks-rise/ #cybersecurity #LinkedIn #DataTheft #scams #spam #DataScraping
-
Từ một dự án freelancing scrape Substack, một người đã biến giải pháp 1 lần thành công cụ tự phục vụ, mở ra cơ hội thị trường. Câu chuyện chuyển đổi từ làm thuê sang tạo sản phẩm. #FreelanceTips #ProductBuilding #Substack #DataScraping #StartupViecles #TaoSanPham #KinhNghiemTuDo
https://www.reddit.com/r/SideProject/comments/1pqwve3/a_peopleperhour_gig_taught_me_to_think/
-
**AI nợ công: Làm thế nào các công cụ đào tạo LLM phá vỡ hợp đồng xã hội của mã nguồn mở**
AI học hỏi từ mã nguồn mở nhưng không hoàn thiện nghĩa vụ, gây bất cập cho cộng đồng. Các dự án LLM (Large Language Models) "dựng" dữ liệu công khai nhưng xem nhẹ trách nhiệm bảo mật, tôn vinh tác giả và lợi ích lâu dài của phần mềm mở. Cần tái định hướng để công nghệ phát triển bền vững.#AI #Mãnguồnmở #Đàotạocôngnghệ #Bềnvững #ĐạođứcAI #OpenSource #SocialContract #TechEthics #AIdebt #DataScraping
h
-
New York Times Sues Perplexity AI for Copyright Infringement and ‘Trademark Tarnishment’
#AI #Copyright #PerplexityAI #NYT #GenAI #SearchEngines #RAG #MediaLaw #IntellectualProperty #Hallucinations #Journalism #DataScraping
-
New York Times Sues Perplexity AI for Copyright Infringement and ‘Trademark Tarnishment’
#AI #Copyright #PerplexityAI #NYT #GenAI #SearchEngines #RAG #MediaLaw #IntellectualProperty #Hallucinations #Journalism #DataScraping
-
New York Times Sues Perplexity AI for Copyright Infringement and ‘Trademark Tarnishment’
#AI #Copyright #PerplexityAI #NYT #GenAI #SearchEngines #RAG #MediaLaw #IntellectualProperty #Hallucinations #Journalism #DataScraping
-
New York Times Sues Perplexity AI for Copyright Infringement and ‘Trademark Tarnishment’
#AI #Copyright #PerplexityAI #NYT #GenAI #SearchEngines #RAG #MediaLaw #IntellectualProperty #Hallucinations #Journalism #DataScraping
-
New York Times Sues Perplexity AI for Copyright Infringement and ‘Trademark Tarnishment’
#AI #Copyright #PerplexityAI #NYT #GenAI #SearchEngines #RAG #MediaLaw #IntellectualProperty #Hallucinations #Journalism #DataScraping
-
Reddit Sues Perplexity and Data Scrapers for 'Industrial-Scale' AI Content Theft
#AI #Reddit #Perplexity #Lawsuit #DataScraping #Copyright #TechLaw #DMCA #AIEthics #BigTech #IntellectualProperty #SerpApi #Oxylabs #Litigation
-
Tuyển dụng: Vị trí scrape 300.000 tiêu đề sách PDF từ AbeBooks, tìm file từ Wayback Machine/Anna's Archive. Tổng 4TB dữ liệu sẽ được lưu trữ vào đĩa quang 128GB (Verbatim/Panasonic) để đảm bảo đọc được 100 năm. Ngân sách: $700 (chưa vật tư).
#TuyểnDụng #Scraping #LưuTrữDữLiệu #PDF #AbeBooks
#Hiring #DataScraping #DataArchiving #PDF -
Cloudflare Overhauls Web’s AI Rulebook with New Robots.txt ‘Content Signals’
#AI #Cloudflare #RobotsTxt #DataScraping #Publishing #GenerativeAI
-
LinkedIn, the social media titan known for its riveting inspirational #quotes and unsolicited connections, is now channeling its inner #superhero, battling the dastardly villains of data scraping. 🦸♂️💼 Apparently, charging $15k for harvested data is a crime—unless you're #LinkedIn, of course. 🤑🔍
https://therecord.media/linkedin-sues-data-scraping-company #DataScraping #SocialMedia #Crime #HackerNews #ngated -
Cloudflare launches Content Signals Policy to fight AI crawlers and scrapers
https://web.brid.gy/r/https://nerds.xyz/2025/09/cloudflare-content-signals-policy-ai-crawlers/
-
Cloudflare launches Content Signals Policy to fight AI crawlers and scrapers
https://web.brid.gy/r/https://nerds.xyz/2025/09/cloudflare-content-signals-policy-ai-crawlers/
-
Cloudflare launches Content Signals Policy to fight AI crawlers and scrapers
https://web.brid.gy/r/https://nerds.xyz/2025/09/cloudflare-content-signals-policy-ai-crawlers/
-
Cloudflare launches Content Signals Policy to fight AI crawlers and scrapers
https://web.brid.gy/r/https://nerds.xyz/2025/09/cloudflare-content-signals-policy-ai-crawlers/
-
Cloudflare launches Content Signals Policy to fight AI crawlers and scrapers
https://web.brid.gy/r/https://nerds.xyz/2025/09/cloudflare-content-signals-policy-ai-crawlers/
-
Perplexity Fires Back at Cloudflare, Denying ‘Stealth Crawler’ Accusations
#AI #Cloudflare #Perplexity #WebCrawling #AIethics #DataScraping #SearchEngines #Web #AISearch
-
Cloudflare Accuses Perplexity of Using ‘Stealth Crawlers’ to Evade Web Standards
#AI #PerplexityAI #Cloudflare #DataScraping #AIEthics #WebSecurity
-
BBC Threatens Lawsuit against Perplexity AI over Verbatim Copying of Content
#AI #Copyright #PerplexityAI #BBC #TechLaw #Media #GenAI #DataScraping #FairUse #IntellectualProperty #AIethics
-
Anthropic Sued by Reddit for Unauthorized Use of AI Training Data
#AI #Reddit #Anthropic #AILawsuit #DataScraping #AIethics #TechLaw #DataRights #Copyright
-
AI Crawlers Overwhelm Open-Source Projects, Forcing Developers to Block Entire Countries
#AI #Web #Robotstxt #AIScraping #OpenSource #Cybersecurity #DataScraping #Scraping #WebScraping
-
Google's crackdown on data scrapers triggered immediate disruptions across the marketing landscape, particularly for organizations whose business models depend on SEO. The move represents the latest evolution in the ongoing battle between major websites and data scrapers. Read more at @TechRadar. #Google #SEO #DataScraping #Tech #Technology https://flip.it/F5M7-d
-
Many companies have already completed #datascraping everything on the internet, and commercially available personal databases through #Experian and other available databases. The only thing left was government databases. #ElonMusk put himself first in line.
-
#AnotherQuickQuestion: What's #ThatThing that #Happens when a #BunchOfNobodies on a #DeadInstance keep #Crying about the #DataScraping of #OtherPeople's #Content, while #Simultaneously #Plagiarising #OtherPeople's #Fakebook / #RedditPosts... | #LookNoQuestionMark
#MassiveHypocrite(s) gonna #MassiveHypocrite... #DoubleStandardMuch
#DontForget... #Cuntards gonna #Cuntard...
🧙⚔️🤖:wolfparty:🤖⚔️🧙 | :fediverse:🦹🎈🦄🎈🦹:fediverse:
-
So first #TuneCore, now #YouTube. They are trying to scrap from our art, creativity and potential income making datasets for sale, without even giving us a single penny. Please be intelligent, do not allow a single step over your rights with this predatory and misleading companies. #DistroKid , #CDBaby probably will try to do the same over the next years.
@[email protected] @musicproduction @[email protected] #musicindustry #AI #IA #industriadelamusica #dataset #datascraping @radicalmusic
-
So first #TuneCore, now #YouTube. They are trying to scrap from our art, creativity and potential income making datasets for sale, without even giving us a single penny. Please be intelligent, do not allow a single step over your rights with this predatory and misleading companies. #DistroKid , #CDBaby probably will try to do the same over the next years.
@[email protected] @musicproduction @[email protected] #musicindustry #AI #IA #industriadelamusica #dataset #datascraping @radicalmusic
-
So first #TuneCore, now #YouTube. They are trying to scrap from our art, creativity and potential income making datasets for sale, without even giving us a single penny. Please be intelligent, do not allow a single step over your rights with this predatory and misleading companies. #DistroKid , #CDBaby probably will try to do the same in the next years.
@[email protected] @musicproduction @[email protected] #musicindustry #AI #IA #industriadelamusica #dataset #datascraping @radicalmusic
-
So first #TuneCore, now #YouTube. They are trying to scrap from our art, creativity and potential income making datasets for sale, without even giving us a single penny. Please be intelligent, do not allow a single step over your rights with this predatory and misleading companies. #DistroKid , #CDBaby probably will try to do the same in the next years.
@[email protected] @musicproduction @[email protected] #musicindustry #AI #IA #industriadelamusica #dataset #datascraping @radicalmusic
-
So first #TuneCore, now #YouTube. They are trying to scrap from our art, creativity and potential income making datasets for sale, without even giving us a single penny. Please be intelligent, do not allow a single step over your rights with this predatory and misleading companies. #DistroKid , #CDBaby probably will try to do the same over the next years.
@[email protected] @musicproduction @[email protected] #musicindustry #AI #IA #industriadelamusica #dataset #datascraping @radicalmusic
-
OpenAI Whistleblower Found Dead in San Francisco Apartment https://petapixel.com/2024/12/16/openai-whistleblower-found-dead-in-san-francisco-apartment/ #artificialintellgence #whistleblower #datascraping #suchirbalaji #openai #News #Law
-
OpenAI Whistleblower Found Dead in San Francisco Apartment https://petapixel.com/2024/12/16/openai-whistleblower-found-dead-in-san-francisco-apartment/ #artificialintellgence #whistleblower #datascraping #suchirbalaji #openai #News #Law
-
OpenAI Whistleblower Found Dead in San Francisco Apartment https://petapixel.com/2024/12/16/openai-whistleblower-found-dead-in-san-francisco-apartment/ #artificialintellgence #whistleblower #datascraping #suchirbalaji #openai #News #Law
-
OpenAI Whistleblower Found Dead in San Francisco Apartment https://petapixel.com/2024/12/16/openai-whistleblower-found-dead-in-san-francisco-apartment/ #artificialintellgence #whistleblower #datascraping #suchirbalaji #openai #News #Law
-
OpenAI Whistleblower Found Dead in San Francisco Apartment https://petapixel.com/2024/12/16/openai-whistleblower-found-dead-in-san-francisco-apartment/ #artificialintellgence #whistleblower #datascraping #suchirbalaji #openai #News #Law
-
“OpenAI’s data scraping wins big as Raw Story’s copyright lawsuit dismissed by NY court” https://venturebeat.com/ai/openais-data-scraping-wins-big-as-raw-storys-copyright-lawsuit-dismissed-by-ny-court/ #openai #ai #llm #data #datascraping
-
Former OpenAI Employee Condemns the Company’s Data Scraping Practices https://petapixel.com/2024/10/25/former-openai-employee-condemns-the-companys-data-scraping-practices/ #machinelearning #datascraping #Technology #aitraining #chatgpt #openai #dalle #News
-
📌⚖PUEDEN VER LA:
#Conferencia “#DataScraping con #IA y el enfoque basado en #riesgo como límite a los usos de la #InteligenciaArtificial.”
🔸️Dra. PhD Johanna Caterina Faliero
▶️ https://youtu.be/lO4C3BpdSD8?si=CH8V8jyDVOjbPEPK -
⚠️AHORA A LAS 17H🇦🇷⚠️ ⚖CONFERENCIA
🔷️“#DataScraping con #IA y el enfoque basado en #riesgo como límite a los usos de la #InteligenciaArtificial.”
🔹️Dra. PhD Johanna C Faliero
📌LINK DE CONEXIÓN➡️https://usal-edu-ar.zoom.us/j/89116192945 -
Digital locusts (bots) across the web:
#openai #anthropic #ai #bot #internet #www #datascraping
:web:https://www.businessinsider.com/openai-anthropic-ai-bots-havoc-raise-cloud-costs-websites-2024-9
-
Some things seem so obvious and yet still need doing anyway:
https://arstechnica.com/ai/2024/09/new-ai-standards-group-wants-to-make-data-scraping-opt-in/
#dataHarvest #dataScraping #optin #privacy #DeepLearning -
More than 330 Million Email Addresses Allegedly Scraped from Security Platform SOCRadar.io Exposed Online https://thecyberexpress.com/330-million-email-ids-scraped-from-socradar-io/ #TheCyberExpressNews #CybersecurityNews #CyberEssentials #TheCyberExpress #DataBreachNews #BreachForums #DataScraping #databreach #SOCRadario #Hackread #SOCRadar #USDoD
-
More than 330 Million Email Addresses Allegedly Scraped from Security Platform SOCRadar.io Exposed Online https://thecyberexpress.com/330-million-email-ids-scraped-from-socradar-io/ #TheCyberExpressNews #CybersecurityNews #CyberEssentials #TheCyberExpress #DataBreachNews #BreachForums #DataScraping #databreach #SOCRadario #Hackread #SOCRadar #USDoD
-
More than 330 Million Email Addresses Allegedly Scraped from Security Platform SOCRadar.io Exposed Online https://thecyberexpress.com/330-million-email-ids-scraped-from-socradar-io/ #TheCyberExpressNews #CybersecurityNews #CyberEssentials #TheCyberExpress #DataBreachNews #BreachForums #DataScraping #databreach #SOCRadario #Hackread #SOCRadar #USDoD
-
More than 330 Million Email Addresses Allegedly Scraped from Security Platform SOCRadar.io Exposed Online https://thecyberexpress.com/330-million-email-ids-scraped-from-socradar-io/ #TheCyberExpressNews #CybersecurityNews #CyberEssentials #TheCyberExpress #DataBreachNews #BreachForums #DataScraping #databreach #SOCRadario #Hackread #SOCRadar #USDoD
-
Ever wondered what a #bot really is? 🤖 From chatbots helping with customer service to web crawlers indexing the internet, bots are everywhere! 🌐
In our latest guide, we dive into:
🤖 The definition and evolution of bots
📜 Different types of bots
🌍 Impact of good & bad bots on our daily lives
🔍 Detecting and blocking malicious botsWith bot traffic accounting for almost 50% of all internet traffic and 2/3rd of them being bad bots, it’s crucial to understand their influence on our online activities.
Discover more now! https://bit.ly/3KkgVtI
#badbots #webcrawlers #maliciousbots #ddosattacks #datascraping #malware #botmanagement #managedservices #waap #apptrana #indusface