#apachenutch — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #apachenutch, aggregated by home.social.
-
@digitalpebble sadly homegrown one off: https://github.com/tballison/file-observatory/tree/main/commoncrawl-fetcher
If I were to do it again, I’d use #ApacheNutch or #StormCrawler
-
@elan also see #Heritrix and of course #StormCrawler as alternatives to #ApacheNutch