Archival of top ~100k G+ Community homepages only in process completed
I'd posted a listing of the top 103,000 or so Google+ communities by the criteria of > 100 members and activity within the previous 30 days (to when the analysis was made). This is the bulk of active communities on G+, though a small fraction of the 8.1 million total communities.
This is NOT a full Community archive. Rather, it is a breadcrumb (or set of breadcrumbs) which can be used to join SignalFlare posts to future homes, by creating a permanent archive of the community homepage.
What will be archived is only the homepage of the community, up to ten recent posts, and some of their contents. Enough to provide a stepping stone (or reference point) to future homes.
As an example of the archive format, see the G+MM archive here:
https://web.archive.org/web/20190330071102/https://plus.google.com/communities/112164273001338979772
(Note that I've also been archiving this about 2x daily for the past few months, so there are a lot of saves of this.)
Be including forwarding information in the Community "About" description (added earlier today), late-comers can find future points of contact. I strongly recommend other communities follow this practice.
The full 100k set should take about 4 hours to run, if the stars align. I may run a subsequent archive again on the 1st.
(Actual runtime: 103 minutes.)
And I may also expand the selection criteria to communities active over a longer period -- say, six months. If that's a reasonable number of communities.
https://web.archive.org/web/20190330071102/https://plus.google.com/communities/112164273001338979772
I'd posted a listing of the top 103,000 or so Google+ communities by the criteria of > 100 members and activity within the previous 30 days (to when the analysis was made). This is the bulk of active communities on G+, though a small fraction of the 8.1 million total communities.
This is NOT a full Community archive. Rather, it is a breadcrumb (or set of breadcrumbs) which can be used to join SignalFlare posts to future homes, by creating a permanent archive of the community homepage.
What will be archived is only the homepage of the community, up to ten recent posts, and some of their contents. Enough to provide a stepping stone (or reference point) to future homes.
As an example of the archive format, see the G+MM archive here:
https://web.archive.org/web/20190330071102/https://plus.google.com/communities/112164273001338979772
(Note that I've also been archiving this about 2x daily for the past few months, so there are a lot of saves of this.)
Be including forwarding information in the Community "About" description (added earlier today), late-comers can find future points of contact. I strongly recommend other communities follow this practice.
The full 100k set should take about 4 hours to run, if the stars align. I may run a subsequent archive again on the 1st.
(Actual runtime: 103 minutes.)
And I may also expand the selection criteria to communities active over a longer period -- say, six months. If that's a reasonable number of communities.
https://web.archive.org/web/20190330071102/https://plus.google.com/communities/112164273001338979772
The task has completed. (Though I should check my work ;-)
ReplyDeleteEdward Morbius bravo! You deserve a medal and a planet of your own.
ReplyDeleteEdward Morbius Any way to do this for whole community(https://plus.google.com/u/0/communities/113478898218396403991) or collection(https://plus.google.com/u/0/collection/cFBeTB) in one go? I tried it other day, it only has 5 saves, from which first 2-3 don't work, even though it shows on calendar. Also, no save goes beyond 2016. And I don't have VM or coding skills to run this on large scale.
ReplyDeleteDeepak Ravlani For personal archives, the Friends+Me Google+ Exporter, or possibly Webrecorder:
ReplyDeletewebrecorder.io - Webrecorder
I think that the URL lists from those can then be saved to the Archive. Alois Bělaška said he'd build IA submissions into the Exporter, though I'm not sure that happened.
Also; strip the "/u/0" from the URLs you're referencing, that's a G+ mechanism for distinguishing multiple account sessions. (Though my stripped counts match yours, IA may be accounting for this.)
So try:
https://plus.google.com/communities/113478898218396403991
https://plus.google.com/collection/cFBeTB
First has 6 saves:
https://web.archive.org/web/*/https://plus.google.com/communities/113478898218396403991
(One is mine earlier today).
2nd has 5 saves:
https://web.archive.org/web/*/https://plus.google.com/collection/cFBeTB