home.social

#scrapy — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #scrapy, aggregated by home.social.

  1. Released scrapy-contrib-bigexporter 1.0.0 (codeberg.org/ZuInnoTe/scrapy-c) - additional export formats for the webscraping framework Scrapy.

    Migrated parquet export from fastparquet to pyarrow as fastparquet is deprecated (docs.dask.org/en/stable/change)

    Migrated orc export from pyorc to pyarrow to reduce the number of dependencies

    #scrapy #crawling #python #parquet #orc #pyarrow #webcrawling #scraping

  2. In the latest release, auto-throttling* is enabled by default. The intervals between requests are dynamically adjusted to ensure you are not overwhelming servers.

    Check it out here:
    bit.ly/49kHBp4

    #SEO #TechSEO #DataScience #Python #digitalanalytics

    *magic provided by #scrapy
    2/2