Skip to main content

Enhanced G+Communities Takeout is available


Enhanced G+Communities Takeout is available

Hooray! The enhanced G+Communities Takeout is available with full content of Public communities available to Owners and Moderators. The Posts section is generating a single file for each post just as with g+Stream.posts and in the same format. Posts.JSON is correctly producing a JSON file for each post.

The one oddity is that Community.Summary is always in HTML even if you specify JSON.

https://takeout.google.com/settings/takeout/custom/plus_communities

Thank you Google, for delivering on the promise under the wire. It's still just early March!

If anyone successfully uses this for a big community, please report back.

Comments

  1. My net2o importer can deal with the new format, only one new tag was added. I had to make sure that all the duplicates really are perfectly deduplicated, and that the attribution signature is correct.

    Fortunately, the summary isn't needed; if you import all the postings, you get the summary.

    What's missing is the category inside the community.

    ReplyDelete
  2. Nasreen Malik Collections are part of your normal takeout.

    ReplyDelete
  3. Nasreen Malik as Bernd Paysan indicated, Collections are indeed part of the Google+ Stream Takeout.
    To be precise, in your Google+ Stream Takeout archive you'll find a separate JSON file for each post you've made, in the Takeout/Google+ Stream/Posts folder.
    In this JSON file there is an item named "postAcl" (where Acl stands for Access Control List), which contains one of several other Acl items that indicate what kind of audience setting(s) it had:
    visibleToStandardAcl: the original basic visibility control that is used when a post is public to all, or limited to one or more specific circles, all your circles, extended circles, or specific people.
    eventAcl: used for posts made within Events
    communityAcl: used for posts made within Communities
    collectionAcl: used for posts made to Collections

    this last item also contains sub-items to indicate the resourceName (a sort of unique ID, the part of the URL which indicates which Collection it's part of), and the displayName.

    So, if you want to collect all your posts that were made to a specific Collection, you'll have to find all the JSON files that have the right collectionAcl; my Plexodus-Tools has instructions on how to achieve that with the command-line tool `jq`, and the library of jq functions I wrote: https://github.com/FiXato/Plexodus-Tools/blob/master/README.md#examples

    More details about the data structure in these Post.json files can be found at: https://github.com/FiXato/Plexodus-Tools/blob/master/activity_data_structure.md

    github.com - FiXato/Plexodus-Tools

    ReplyDelete
  4. Nasreen Malik Your own Collections are included in your takeout, if you've included Streams data.

    You cannot directly archive other's Collections.

    ReplyDelete
  5. Is there a way to get the comments on the Community posts? If not, it's pretty much useless to me. The purpose of a community is the interaction.

    ReplyDelete
  6. So if we do this must we learn how to convert JSON to HTML?Does this require a tech learning curve?

    ReplyDelete
  7. Wi aM hEFF! ... good catch!
    I hadn't actually noticed yet that the comments were missing from the Google+ Communities/$communityName/Posts/*.json files...

    I agree with you that it's fairly essential for comments to actually be included in a Community export...

    These are the top-level keys that I found in the G+MM community's JSON post files:
    "activityId",
    "album",
    "author",
    "collectionAttachment",
    "communityAttachment",
    "content",
    "creationTime",
    "link",
    "location",
    "media",
    "poll",
    "postAcl",
    "resharedPost",
    "resourceName",
    "updateTime",
    "url"

    "comments" indeed seems to be missing :(

    ReplyDelete
  8. Filip H.F. Slagter GAH! Oh, come on Google. This is ridiculous.

    ReplyDelete
  9. I've sent in Feedback through the Send Feedback option in the vertical ellipsis (⋮) menu on https://takeout.google.com

    It might be a good idea for some others to leave similar, polite and constructive, feedback to show it's a missing feature more than just me cares about.

    Likewise, I just found out that the comments (.primaryText) in Google+ Stream/ActivityLog/*.json (Comments, +1s on comments/posts and Poll Votes) don't include any kind of formatting; no HTML and even no newlines. This makes a lot of the more insightful/complex comments rather illegible, especially those with bold/italic formatting or lists.

    ReplyDelete
  10. Filip H.F. Slagter Julian Bond Wi aM hEFF! Yes, no comments section in my community takeout, either. Just the plain postings.

    Google, WTF? I'll probably send them a GPDR request on March 31, with all the missing parts requested.

    ReplyDelete
  11. Leathur Rokk If you just want to quickly read the archive with minimal effort or post the whole directory somewhere, HTML format is good enough and requires very little work. If you want to use the contents to import it into some other system you probably need to start with JSON and then write code to do the import. However actually writing the code is left as a project for the student. There's not a great deal of help from anyone official although people here are building a body of code to help.

    ReplyDelete
  12. Bernd Paysan not sure though if original formatting would fall under the requirements of GDPR... I have a feeling that even returning plain text would be sufficient for them.

    ReplyDelete
  13. Filip H.F. Slagter Machine readable in a common format. JSON is ok, but losing an important part of the contents is a real problem.

    We don't have the +1's of the comments in normal posts, either. The +1s are important if you have a lot of comments, and want to see the relevant ones or such.

    ReplyDelete
  14. Bernd Paysan do you mean your own +1s on comments? If so, there is Google+ Stream/ActivityLog/+1s on comments.json

    If you mean +1s on the posts in Google+ Communities/Posts/*.json, then that indeed also seems to be missing. No 'resharers', no 'replies' and no 'plusOners' are included in the JSON files for Community posts either...

    ReplyDelete
  15. Filip H.F. Slagter There are no +1s from others on comments in the normal stream takeout (so you can only extract your own +1s from the activities), and in the Communities/Posts no nothing.

    ReplyDelete
  16. Bernd Paysan huh, you're right, I actually also hadn't noticed that before...
    The Google+ Comments API did include a .items[] .plusoners .totalItems count, but not a list of the actual people who gave it a plus one.

    I guess the only way to retrieve that information now would be to scrape the pages.

    ReplyDelete
  17. Filip H.F. Slagter The takeout does not even include the "totalItems" field. There simply is no plusOners for any comment.

    So far, I think, all these takeouts were just more theoretical options, mandated by laws like GDPR, or before offered voluntarily, without anyone seriously using and analyzing them. Google+ is the first case where people take the takeouts serious. That's due to the large amount of nerdy contents here.

    ReplyDelete
  18. Just launched the archive process now on my communities, largest being linked below. Will report back when it's done. Edit: Done, 20 megs. Apparently I'm not very popular :P
    Thorium Now

    ReplyDelete
  19. Johnny Stork, MSc Look at the index.html, it will give you some more details.

    In general, you need to “warm up” the cache to get everything, so the first two or three takeouts won't be complete.

    ReplyDelete
  20. Johnny Stork, MSc as Bernd Paysan wrote (and as is mentioned in the red text), open the index.html file from the archive in your browser. It will list the files that are missing/incomplete, and often will include links that you can click to manually try to download them.
    It's likely some of the files are in 'cold storage', and thus need a bit longer to be retrieved than the Takeout process is set to use for retrieval.
    Once those files have successfully been requested again, it's likely a next Takeout request will be more complete.

    ReplyDelete
  21. Bernd Paysan In general, you need to “warm up” the cache to get everything, so the first two or three takeouts won't be complete.

    What complete bullshit! Not your comment, but that professionals in the biggest IT company the world has ever seen could design a system like that. "Run it down the hill again and let's see if the brakes fail again.", "Just kick it a couple of times and maybe it will work on the 3rd time".

    ReplyDelete
  22. Julian Bond Yes, Google+ is a mess behind the scenes...

    ReplyDelete
  23. Johnny Stork, MSc It does download but saves even pages u don't want and u need to hv enough spaceBrandon Sergent space in GDrive...

    ReplyDelete
  24. Nasreen Malik
    Thanks :) I made another archive and it was again only like 21 megs, guess that's accurate /shrugs

    ReplyDelete
  25. Brandon Sergent Try copying collections using Coral draw...I hv managed to save my poems on Poemia...And a friend transferred all my poems on Collections on Coral draw...

    ReplyDelete
  26. That's excellent, I'm glad you're not going to lose any work, pretty sure all my stuff is captured as well :) Sidenote: Fuck Google for closing gplus >:(

    ReplyDelete

Post a Comment

New comments on this blog are moderated. If you do not have a Google identity, you are welcome to post anonymously. Your comments will appear here after they have been reviewed. Comments with vulgarity will be rejected.

”go"