#archiveteam — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #archiveteam, aggregated by home.social.
-
@AppleVis @DavidGoldfield Hey @textfiles how can we make sure this website is fully archived? This is 13 years of blindness history, and losing it would be an unimaginable tragedy. #archive #archiveteam #datahoarders -
@darnell General Web Search is ... sort of its own thing. That's manageable through robots.txt or permissive / exclusive in-page tags.
(Those will generally prevent content from being presented, but may not prevent crawling, and in the case of on-page headers cannot by the mechanism through which they work (the spider has to crawl and read the header to determine what's being said).
There are groups such as the #ArchiveTeam who explicitly ignore robots.txt: https://wiki.archiveteam.org/index.php/Robots.txt
Then there's the somewhat newly recognised issue of AI LLM training data and derived works.
Other than those, what is your threat model here?
- What risks do you see?
- What are you trying to avoid?
- What would you specifically like to see?
My view is that online content is ... online. It's published, in the sense of public. If you want closed content you need to find some way of disclosing to a limited group. That has tremendous impacts on reach and influence.
That is contrasted with community and interaction, and a Fediverse which is crawled by Google is very different from one that is interfaced by Google and Facebook, parallel with their existing social networks (FB, Instagram, YouTube, Blogger, say).
#Meta #Metablock #DefederateMeta #ThreatModels #Risk #GeneralWebSearch #LLM #ArtificialIntelligence #TrainingData
-
Continued Community Migration tips:
The #InternetArchive, and the independent though closely-working #ArchiveTeam are a blessing if you want your content permanently archived online. (And if you don't, they'll disable public access on request, easily, email [email protected]) This can be self-service or through an Archive Team Project, see: https://wiki.archiveteam.org/
To save any one page at the Wayback Machine, use a URL of the form
https://web.archive.org/save/<ORIGINAL_URL>. This can be scripted or automated if you have a list of URLs, say, from a downloaded archive. I've saved many thousands of my own pages across multiple sites this way.There's also #ArchiveToday, which is not a charity, is pretty opaque about operations, operators, financing, and goals, but does do a good job of capturing today's Web as it exists (IA can have ... issues with this). There is also no automated bulk-save option. You can streamline the process by generating sets of URLs to save, and clicking through those one-by-one. Depending on what you're trying to save and how motivated you are, this is also an option (and yes, I've also saved a few thousand of my own pages this way).
Keep in mind that archive sites may not be as accessible or functional as the original. For example, Google+ URLs archived at the Internet Archive carry only a subset of comments, and profile pages don't allow the listed posts to be opened.
For example, this G+ post shows only 6 of 82 comments:
https://web.archive.org/web/20190319215226/https://plus.google.com/104092656004159577193/posts/hEjRbVQmYSD
And my G+ profile page shows posts but those cannot be opened through the Wayback Machine. Heck, you can't even determine the URLs to request archived copies:
https://web.archive.org/web/20190331094038/https://plus.google.com/104092656004159577193/posts/
What's most useful is if you can indicate on a profile / landing page where you've gone off to and people might be able to, with luck, track you down there. My G+ profile page above does so.
#TwitterExodus #Plexodus #CommunityMigration #CommunityContinuity #SwitchingPlatforms #WaybackMachine
-
Continued Community Migration tips:
The #InternetArchive, and the independent though closely-working #ArchiveTeam are a blessing if you want your content permanently archived online. (And if you don't, they'll disable public access on request, easily, email [email protected]) This can be self-service or through an Archive Team Project, see: https://wiki.archiveteam.org/
To save any one page at the Wayback Machine, use a URL of the form
https://web.archive.org/save/<ORIGINAL_URL>. This can be scripted or automated if you have a list of URLs, say, from a downloaded archive. I've saved many thousands of my own pages across multiple sites this way.There's also #ArchiveToday, which is not a charity, is pretty opaque about operations, operators, financing, and goals, but does do a good job of capturing today's Web as it exists (IA can have ... issues with this). There is also no automated bulk-save option. You can streamline the process by generating sets of URLs to save, and clicking through those one-by-one. Depending on what you're trying to save and how motivated you are, this is also an option (and yes, I've also saved a few thousand of my own pages this way).
Keep in mind that archive sites may not be as accessible or functional as the original. For example, Google+ URLs archived at the Internet Archive carry only a subset of comments, and profile pages don't allow the listed posts to be opened.
For example, this G+ post shows only 6 of 82 comments:
https://web.archive.org/web/20190319215226/https://plus.google.com/104092656004159577193/posts/hEjRbVQmYSD
And my G+ profile page shows posts but those cannot be opened through the Wayback Machine. Heck, you can't even determine the URLs to request archived copies:
https://web.archive.org/web/20190331094038/https://plus.google.com/104092656004159577193/posts/
What's most useful is if you can indicate on a profile / landing page where you've gone off to and people might be able to, with luck, track you down there. My G+ profile page above does so.
#TwitterExodus #Plexodus #CommunityMigration #CommunityContinuity #SwitchingPlatforms #WaybackMachine
-
Continued Community Migration tips:
The #InternetArchive, and the independent though closely-working #ArchiveTeam are a blessing if you want your content permanently archived online. (And if you don't, they'll disable public access on request, easily, email [email protected]) This can be self-service or through an Archive Team Project, see: https://wiki.archiveteam.org/
To save any one page at the Wayback Machine, use a URL of the form
https://web.archive.org/save/<ORIGINAL_URL>. This can be scripted or automated if you have a list of URLs, say, from a downloaded archive. I've saved many thousands of my own pages across multiple sites this way.There's also #ArchiveToday, which is not a charity, is pretty opaque about operations, operators, financing, and goals, but does do a good job of capturing today's Web as it exists (IA can have ... issues with this). There is also no automated bulk-save option. You can streamline the process by generating sets of URLs to save, and clicking through those one-by-one. Depending on what you're trying to save and how motivated you are, this is also an option (and yes, I've also saved a few thousand of my own pages this way).
Keep in mind that archive sites may not be as accessible or functional as the original. For example, Google+ URLs archived at the Internet Archive carry only a subset of comments, and profile pages don't allow the listed posts to be opened.
For example, this G+ post shows only 6 of 82 comments:
https://web.archive.org/web/20190319215226/https://plus.google.com/104092656004159577193/posts/hEjRbVQmYSD
And my G+ profile page shows posts but those cannot be opened through the Wayback Machine. Heck, you can't even determine the URLs to request archived copies:
https://web.archive.org/web/20190331094038/https://plus.google.com/104092656004159577193/posts/
What's most useful is if you can indicate on a profile / landing page where you've gone off to and people might be able to, with luck, track you down there. My G+ profile page above does so.
#TwitterExodus #Plexodus #CommunityMigration #CommunityContinuity #SwitchingPlatforms #WaybackMachine
-
Continued Community Migration tips:
The #InternetArchive, and the independent though closely-working #ArchiveTeam are a blessing if you want your content permanently archived online. (And if you don't, they'll disable public access on request, easily, email [email protected]) This can be self-service or through an Archive Team Project, see: https://wiki.archiveteam.org/
To save any one page at the Wayback Machine, use a URL of the form
https://web.archive.org/save/<ORIGINAL_URL>. This can be scripted or automated if you have a list of URLs, say, from a downloaded archive. I've saved many thousands of my own pages across multiple sites this way.There's also #ArchiveToday, which is not a charity, is pretty opaque about operations, operators, financing, and goals, but does do a good job of capturing today's Web as it exists (IA can have ... issues with this). There is also no automated bulk-save option. You can streamline the process by generating sets of URLs to save, and clicking through those one-by-one. Depending on what you're trying to save and how motivated you are, this is also an option (and yes, I've also saved a few thousand of my own pages this way).
Keep in mind that archive sites may not be as accessible or functional as the original. For example, Google+ URLs archived at the Internet Archive carry only a subset of comments, and profile pages don't allow the listed posts to be opened.
For example, this G+ post shows only 6 of 82 comments:
https://web.archive.org/web/20190319215226/https://plus.google.com/104092656004159577193/posts/hEjRbVQmYSD
And my G+ profile page shows posts but those cannot be opened through the Wayback Machine. Heck, you can't even determine the URLs to request archived copies:
https://web.archive.org/web/20190331094038/https://plus.google.com/104092656004159577193/posts/
What's most useful is if you can indicate on a profile / landing page where you've gone off to and people might be able to, with luck, track you down there. My G+ profile page above does so.
#TwitterExodus #Plexodus #CommunityMigration #CommunityContinuity #SwitchingPlatforms #WaybackMachine
-
Continued Community Migration tips:
The #InternetArchive, and the independent though closely-working #ArchiveTeam are a blessing if you want your content permanently archived online. (And if you don't, they'll disable public access on request, easily, email [email protected]) This can be self-service or through an Archive Team Project, see: https://wiki.archiveteam.org/
To save any one page at the Wayback Machine, use a URL of the form
https://web.archive.org/save/<ORIGINAL_URL>. This can be scripted or automated if you have a list of URLs, say, from a downloaded archive. I've saved many thousands of my own pages across multiple sites this way.There's also #ArchiveToday, which is not a charity, is pretty opaque about operations, operators, financing, and goals, but does do a good job of capturing today's Web as it exists (IA can have ... issues with this). There is also no automated bulk-save option. You can streamline the process by generating sets of URLs to save, and clicking through those one-by-one. Depending on what you're trying to save and how motivated you are, this is also an option (and yes, I've also saved a few thousand of my own pages this way).
Keep in mind that archive sites may not be as accessible or functional as the original. For example, Google+ URLs archived at the Internet Archive carry only a subset of comments, and profile pages don't allow the listed posts to be opened.
For example, this G+ post shows only 6 of 82 comments:
https://web.archive.org/web/20190319215226/https://plus.google.com/104092656004159577193/posts/hEjRbVQmYSD
And my G+ profile page shows posts but those cannot be opened through the Wayback Machine. Heck, you can't even determine the URLs to request archived copies:
https://web.archive.org/web/20190331094038/https://plus.google.com/104092656004159577193/posts/
What's most useful is if you can indicate on a profile / landing page where you've gone off to and people might be able to, with luck, track you down there. My G+ profile page above does so.
#TwitterExodus #Plexodus #CommunityMigration #CommunityContinuity #SwitchingPlatforms #WaybackMachine