home.social

#dataextraction — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #dataextraction, aggregated by home.social.

  1. X has officially open-sourced its recommendation algorithm, unlocking new possibilities for real-time data extraction, AI analytics, and smarter business intelligence. Discover how transparent feed ranking systems are shaping the future of data-driven strategies.
    #XAlgorithm #OpenSourceAI #WebScraping #AIAnalytics #DataExtraction #RealTimeData #DataIntelligence #SocialMediaAnalytics #AIInnovation #TagXData

  2. Need to quickly extract links and contact points from a URL? Uscrapper Vanta is the tool for you. Snag the repo from Github or throw your URL into our hosted version of the tool for free!

    Git: github.com/z0m31en7/Uscrapper

    #OSINT #OSINT4good #urlosint #dataextraction #python #selenium #tor #infosec #osintcabal #crawl #dataextraction

  3. Interesting read on social media addiction.

    I think the real underlying issue relates to the intention economy based on data extraction.

    Addiction or not, data is stillexyracted, and intentions are derived.

    But they are focused on the addiction angle.

    techdirt.com/2026/04/03/the-so

    #socialmedia #dataextraction #intentioneconomy #consent #privacy

  4. How many links are buried inside a large PDF — and where do they really go?

    I extracted every URL from a 291-page Voron assembly manual, isolated shortlinks, resolved redirects, and built a TSV [tab-delimited] manifest with video duration + titles using:

    pdfgrep
    awk
    curl
    yt-dlp

    A practical method for auditing technical PDFs and embedded media.

    Full walk-through:
    salemdata.net/johnpress/?p=523

    #PDF #Linux #OpenSource #CommandLine #DataExtraction #UnixTools
    #Documentation #DigitalPreservation

  5. Công cụ Website-Crawler giúp thu thập dữ liệu từ website dưới dạng JSON hoặc CSV, phù hợp để dùng với mô hình ngôn ngữ lớn (LLM). Hỗ trợ crawl hoặc scrape toàn bộ website nhanh chóng, dễ sử dụng. #WebCrawler #DataExtraction #LLM #AI #CôngCụ #WebScraping #MachineLearning #AI #LLM #WebCrawler #DataExtraction

    reddit.com/r/LocalLLaMA/commen

  6. 🔥 Mới ra mắt Divparser – công cụ scraper AI chuyển bất kỳ trang web nào thành JSON sạch chỉ bằng một prompt. Đã được Google lập chỉ mục ngay và đang có người dùng thử. Nếu bạn quan tâm tới scraping, tự động hoá hay trích xuất dữ liệu AI, hãy cho phản hồi! #AI #Scraping #Automation #DataExtraction #TríTuệNhânTạo #ThuThậpDữLiệu #TựĐộng #CôngCụ

    reddit.com/r/SaaS/comments/1qo

  7. Maxun v0.0.32 ra mắt với tính năng ghi âm thời gian thực, hỗ trợ đồng bộ trạng thái website thực tế, thao tác live như gõ, nhấn, cuộn, điều hướng. Hỗ trợ tích hợp SDK: LlamaIndex, Google Sheets, Airtable, LangChain, OpenAI và nhiều hơn nữa. Chế độ AI tự động tìm và trích xuất dữ liệu mà không cần URL. Mã nguồn mở, tự lưu trữ. #Maxun #WebScraping #OpenSource #SelfHosted #AI #LlamaIndex #LangChain #NoCode #DataExtraction #CôngCụMãNguồnMở #TríchXuấtDữLiệu #AI #TựHost

    reddit.com/r/selfh

  8. Công cụ mới: pdfparse.net – trích xuất dữ liệu từ PDF theo schema, lưu trực tiếp vào SQLite hoặc xuất CSV. Hỗ trợ bảng lồng, khóa ngoại, xử lý theo batch. Phù hợp tự động hóa, dọn dẹp dữ liệu. Đang tìm người dùng thử đầu tiên. Dùng thử tại: pdfparse.net/demo #pdfparse #SQLite #CSV #dataextraction #tool #earlyadopter #côngcụ #tríchxuấtdữliệu

    reddit.com/r/SideProject/comme

  9. 📌 Need quick sentiment analysis on thousands of customer comments? With (Un)Perplexed Spready, categorize feedback automatically by sentiment and topic—save time for strategic thinking!  Let AI do tedious job for you, while you drunk your coffee!💡 matasoft.hr/QTrendControl/inde

    #SentimentAnalysis #AIComparison #NaturalLanguageAI #EfficiencyBoost #WorkSmarter #DataInsights #DataManagement #AIforBusiness #DataExtraction #SmartData #AItools #ProductComparison #DataCategorization #SmartSpreadsheets #Data

  10. Social Media Apps: LoL! WhatsApp, Facebook, Twitter, LinkedIn - they're all in the same surveillance/data extracting business. At least, when you use their services through a web browser, you're in control. You can install PrivacyBadger and uBlock Origin. But with a smartphone app? You're being continuously exploited.

    "Plaintiffs in a class-action case proved by a preponderance of evidence that Meta intentionally eavesdropped on and/or recorded conversations using an electronic device, said a verdict form released yesterday in US District Court for the Northern District of California. Plaintiffs also proved that they had a reasonable expectation of privacy and that Meta did not have consent from all parties to eavesdrop on and/or record the conversations, the jury found.

    The lawsuit was filed in 2021 against Flo Health, maker of an app for tracking periods, ovulation, and pregnancy. Facebook owner Meta, Google, and app analytics company Flurry were added as defendants later. The plaintiffs settled with Flo Health, Google, and Flurry before the trial, leaving Meta as the only remaining defendant.

    The plaintiffs' trial brief said that "Flo allowed Google and Meta to eavesdrop on users' private in-app communications" between November 2016 and February 2019. Flo app users had to complete an onboarding survey requiring them "to select a 'goal' indicating whether they are pregnant, want to be pregnant, or want to track their period, as well as input other information about their pregnancy or menstrual cycle," the brief said."

    arstechnica.com/tech-policy/20

    #SocialMedia #Flo #DataProtection #Privacy #Meta #Facebook #Google #Surveillance #DataExtraction

  11. Feedly launches AI Actions: New AI-powered features streamline article summarization, data extraction, and report creation for market intelligence professionals. ppc.land/feedly-launches-ai-ac #Feedly #AI #ArtificialIntelligence #MarketIntelligence #DataExtraction

  12. Scientific Data: TreeHub: a comprehensive dataset of phylogenetic trees . “In this study, we present a novel approach for automatically extracting phylogenetic data and integrating relevant species information from scientific papers and public databases. On this basis, we constructed a dataset TreeHub, including 135,502 corresponding phylogenetic trees from 7,879 phylogenetic research […]

    https://rbfirehose.com/2025/06/04/treehub-a-comprehensive-dataset-of-phylogenetic-trees-scientific-data/

  13. Need to grab specific info from a webpage regularly? 🤔 Browser Actions can help! Create a Shortcut to: Open URL ➡️ Wait for data element ➡️ Run JavaScript to extract text ➡️ Pass it back to Shortcuts!

    If you need help with that, just follow the Forum link on the site!

    actions.work/browser-actions?r

    #macOS #Shortcuts #WebScraping #DataExtraction #BrowserAutomation

  14. Fascinating how people use AI to generate cute images while businesses waste hours on manual data extraction. (Un)Perplexed Spready lets you connect directly to AI models through Ollama to extract, categorize, and analyze data right in your spreadsheet.
    matasoft.hr/qtrendcontrol/inde

    #PracticalAI #DataManagement #AI #Spreadsheets #DataExtraction #DataLabeling #DataAnotation #DataCategorization #DataClassification #SmartData #AItools #ProductComparison #SmartSpreadsheets #DataStandardization #BI #MDM