#advertools β Public Fediverse posts
Live and recent posts from across the Fediverse tagged #advertools, aggregated by home.social.
-
Using a proxy while crawling
This is another feature of using the meta parameter while crawling with #advertools.
It's as simple as providing a proxy URL.
There is also a link to using rotating proxies if you're interested
-
Happy to share a new release of #advertools v0.16
This release adds a new parameter "meta" to the crawl function.
Options to use it:
π΅ Set arbitrary metadata about the crawl
π΅ Set custom request headers per URL
π΅ Limited support for crawling some JavaScript websitesDetails and example code:
#SEO #crawling #scraping #python #DataScience #advertools #scrapy
-
Day 75 of #100DaysOfCode
User-agent parser app refactor and update
π΅ Upload a list of user-agent strings
π΅ Get them split into their components (OS, family, device, brand, version...)
π΅ Download parsed UA's to a CSV file
π΅ Interactively visualize the UA's on multiple levels using any of the components -
Day 15 of #100DaysOfCode:
Created a bunch of custom #advertools crawlers, with one line of code each.
You can set your own defaults (e.g. follow_links=True by default?)
Examples:
π΅ Exploratory crawler: spider mode, on. Stop after 2k URLs.
π΅ Rude crawler: Don't obey robots rules.
π΅ Polite crawler: Obey robots (default), crawl very slowly, with long periods between crawled pages.#DataScience #crawling #scraping #scrapy #SEO #Python #data
1/2
-
Day 14 of #100DaysOfCode:
Created a tutorial on analyzing millions of URLs:
π΅ 2.4M URLs from a web server log file
π΅ Splitting into their components creates a 5.7GB (giga) DataFrame
π΅ Using the new output_file parameter saves the same data in a 67MB (mega) file
π΅ Read only the columns you want, while filtering for a subset of rows
π΅ Enjoy!Notebook and video:
-
Day 8 of #100DaysOfCode:
Added the option to specify custom date formats for log files:
π΅ advertools.logs_to_df will attempt to convert datetime columns to a datetime type according to default formats
π΅ Supply your own date format if your logs have a different one (or if you decide to change it)
π΅ Date format will be using the strftime format spec
π΅ Coming to adv v.0.15.0 -
Day 1 of #100DaysOfCode:
Added the ability to supply request headers while fetching sitemaps with #advertools
(available in the next release)This can help in changing the User-agent for example. It can also be used to only fetch sitemaps that haven't changed, using the If-None-Match header. You can keep a fresh set of sitemaps, check continuously, and only download updated ones.
You can use any other header of course.
-
Finding internal broken links is much easier than external ones with the crawlytics links function.
After crawling a website:
π΅ Get the link summary table with crawlytics[.]links
π΅ Filter the error pages from the crawl table by any status code you want e.g. >=400, != 200, etc.
π΅ Merge the two tables
π΅ DoneHere is a notebook if you want to test it out:
#advertools #DataScience #SEO #Python #DigitalAnalytics #DigitalMarketing
-
External link analysis with the #advertools crawlytics module
π΅ Use the links() function to map all links on website (URL, anchor text, nofollow, internal/external)
π΅ Count the most linked-to domains
π΅ Crawl external links and get status codes
π΅ Locate broken external links on the website using their location and anchor text
π΅ EnjoyGet a copy of the HTML report (includes link to code repo):
https://bit.ly/48OowL5#DataScience #SEO #Crawling #Python #DigitalAnalytics #DigitalMarketing
-
Data Science with Python for SEO Course: This Monday!
Get the full details and join here:If you have any questions let me know, and if you think others might benefit, please let them know.
#DataScience #SEO #Python #DigitalMarketing #DigitalAnalytics #advertools #pandas #plotly #DataVisualization
-
Internal links: How interlinked are the different sections of a website?
π΅ Using adv[.]crawlytics[.]links we get a mapping of all links (source -> destination)
π΅ Using adv[.]url_to_df we get each component of those links (scheme, domain, path, etc)
π΅ Count the combinations of the first directories to get the number of links from/to each section of the websiteWhat do you think?
-
#GSC analysis report template - 1st version
Discussed in #advertools office hours tomorrow
Here's a copy of the current report. Would love any recommendations, issues, suggestions...
#GoogleSearchConsole #DigitalAnalytics #SEO #DataScience #Python #advertools Report created using @Posit 's Quarto
-
Google Search Console Animated Monthly Clicks Chart
Download it here: https://bit.ly/41BUc2X
Code & sample data: http://bit.ly/4050W8hComing soon...
#DataScience #DataVisualization #Python #advertools #adviz #SEO #SEM #DigitalMarketing #DigitalAnalytics
-
The split of topics that The New York Times covered in 2022.
Interactive HTML chart & code:
https://bit.ly/3zSxbNhYou can check other years and see how/if their publishing has changed.
I removed the dates from URLs in this case (YYYY/MM/DD) to get a better overview. Note that you can include links* in the chart:
Links*: more than one
Links*: using a URL shortener like bit[.]ly
Links*: containing UTM codes#DataScience #DataVisualization #Python #treemap #advertools #adviz #SEO