Continuing the takeout data migration process with a first look at the data that is in the archive.
https://blog.kugelfish.com/2018/10/google-migration-part-ii-understanding.html
https://blog.kugelfish.com/2018/10/google-migration-part-ii-understanding.html
What I hate for automatic processing: They localize the takeout directory names. I get something like “Takeout/Stream in Google+/Beiträge” instead of ”Takeout/Google+ Stream/Posts”.
ReplyDeleteBernd Paysan - thanks for pointing out! This is indeed rather unhelpful for developing easily reusable tools...
ReplyDeleteBernhard Suter I would probably deal with that by collecting their localization database (probably requires setting up a test user, and changing the language of this test user), and use an i18n package to localize the takeout directory, too… but yes, this will be more error-prone, and require user interactions, if it doesn't quite work.
ReplyDelete"Once we the takeout archive" - the first five words.
ReplyDeleteAn example of Collections JSON info for a post in a single collection.
ReplyDelete"postAcl": {
"collectionAcl": {
"collection": {
"resourceName": "collections/AB2YX",
"displayName": "Politics"
}
}
}
Here's the HTML
https://voidstar.com/Takeout/Google+/Posts/20160216%20-%20I%20wonder%20how%20much%20easier%20travelling%20by_.html
Shared to the collection https://plus.google.com/collection/AB2YX">Politics - Private
Nice start already! :)
ReplyDeleteYou might also find my analysis of the JSON activity files useful as reference: https://social.antefriguserat.de/index.php/Data_Migration_Process_and_Considerations#Takeout_Data_Structure (location within the wiki will change, but I'll make sure to leave a reference to the new location once I've extracted it to a page of its own).
The detailed example there might still be missing some data (I noticed for instance that I haven't included Location example data in it), but I do think the flat structure of hash keys is quite complete. Perhaps you can run the jq-command as well on your json files, and diff it against mine to see if there are any more keys I'm missing?
In the comments of https://plus.google.com/104092656004159577193/posts/VXFuh7kJFyd?fscid=z13mffthxm3jghkpe04cjjgi3ti3zp1gz0c.1540739963224711 you'll find some additional jq library method definitions to quickly filter down the contents of the JSON files to public posts, with or without comments, media (or even narrowed down to images, video or audio), and interactions with specific people. I'll release this as well as a public git repo on Github and/or Gitlab once I've finished writing documentation for it.
The takeout data structures seem different, but related to the API structures.
ReplyDeletedevelopers.google.com - Activities | Google+ Platform for Web | Google Developers
Have you matched them off? Is there data available in the API that doesn't appear in the Takeout?
Julian Bond years ago, atleast back in 2013, the json files actually matched the API's Activity resource structure. Its google-api-client models were actually what I initially used to try and load in my json files, as I was developing against an old takeout backup. It's also why I was quite disappointed to find out they'd changed part of their Takeout data structure (and not updated their api-client, nor provided docs).
ReplyDeleteAs for changes: the Access Control List (Acl) structure has changed significantly. It's no longer a single type identifier that decides access, but split up into visibleToStandardAcl (which controls circle and individual user visibility), communityAcl (which community it is posted to), eventAcl (for Events, and who have been invited to them), collectionAcl (giving access to those following a collection), and there is an 'isLegacyAcl' key, of which I'm not quite sure yet what the purpose is.
Another significant change is that the current format no longer contains an originalContent key anymore, which used to contain the contents of the Activity unformatted, that is, it would contain the exact same text as you'd submit, complete with asterisks, underscores and dashes, without them interpreted as HTML formatting instead. What is left is just 'content' keys, which contain the HTML-formatted content.
Julian Bond as a reference, compare this flat structure of all the possible keys (at least as found in my own json files) of the old format:
ReplyDeleteaccess
access.description
access.items
access.items[]
access.items[].type
access.kind
actor
actor.displayName
actor.id
actor.image
actor.image.url
actor.url
annotation
etag
id
kind
location
location.address
location.address.formatted
location.displayName
location.kind
location.position
location.position.latitude
location.position.longitude
object
object.actor
object.actor.displayName
object.actor.id
object.actor.image
object.actor.image.url
object.actor.url
object.attachments
object.attachments[]
object.attachments[].categories
object.attachments[].categories[]
object.attachments[].categories[].schema
object.attachments[].categories[].term
object.attachments[].content
object.attachments[].displayName
object.attachments[].embed
object.attachments[].embed.type
object.attachments[].embed.url
object.attachments[].fullImage
object.attachments[].fullImage.height
object.attachments[].fullImage.type
object.attachments[].fullImage.url
object.attachments[].fullImage.width
object.attachments[].id
object.attachments[].image
object.attachments[].image.height
object.attachments[].image.type
object.attachments[].image.url
object.attachments[].image.width
object.attachments[].objectType
object.attachments[].thumbnails
object.attachments[].thumbnails[]
object.attachments[].thumbnails[].description
object.attachments[].thumbnails[].image
object.attachments[].thumbnails[].image.height
object.attachments[].thumbnails[].image.type
object.attachments[].thumbnails[].image.url
object.attachments[].thumbnails[].image.width
object.attachments[].thumbnails[].url
object.attachments[].url
object.content
object.id
object.objectType
object.originalContent
object.plusoners
object.plusoners.items
object.plusoners.items[]
object.plusoners.items[].displayName
object.plusoners.items[].etag
object.plusoners.items[].id
object.plusoners.items[].image
object.plusoners.items[].image.url
object.plusoners.items[].kind
object.plusoners.items[].url
object.plusoners.totalItems
object.replies
object.replies.items
object.replies.items[]
object.replies.items[].actor
object.replies.items[].actor.displayName
object.replies.items[].actor.id
object.replies.items[].actor.image
object.replies.items[].actor.image.url
object.replies.items[].actor.url
object.replies.items[].etag
object.replies.items[].id
object.replies.items[].kind
object.replies.items[].object
object.replies.items[].object.content
object.replies.items[].object.objectType
object.replies.items[].object.originalContent
object.replies.items[].plusoners
object.replies.items[].plusoners.totalItems
object.replies.items[].published
object.replies.items[].updated
object.replies.items[].verb
object.replies.totalItems
object.resharers
object.resharers.items
object.resharers.items[]
object.resharers.items[].displayName
object.resharers.items[].etag
object.resharers.items[].id
object.resharers.items[].image
object.resharers.items[].image.url
object.resharers.items[].kind
object.resharers.items[].url
object.resharers.totalItems
object.statusForViewer
object.statusForViewer.canComment
object.statusForViewer.canPlusone
object.statusForViewer.isPlusOned
object.statusForViewer.resharingDisabled
object.url
ReplyDeleteprovider
provider.title
published
title
updated
url
verb
to the current format:
album
album.media
album.media[]
album.media[].contentType
album.media[].description
album.media[].height
album.media[].resourceName
album.media[].url
album.media[].width
author
author.avatarImageUrl
author.displayName
author.profilePageUrl
author.resourceName
comments
comments[]
comments[].author
comments[].author.avatarImageUrl
comments[].author.displayName
comments[].author.profilePageUrl
comments[].author.resourceName
comments[].content
comments[].creationTime
comments[].link
comments[].link.imageUrl
comments[].link.title
comments[].link.url
comments[].media
comments[].media.contentType
comments[].media.height
comments[].media.resourceName
comments[].media.url
comments[].media.width
comments[].postUrl
comments[].resourceName
comments[].updateTime
communityAttachment
communityAttachment.coverPhotoUrl
communityAttachment.displayName
communityAttachment.resourceName
content
creationTime
link
link.imageUrl
link.title
link.url
location
location.displayName
location.latitude
location.longitude
location.physicalAddress
media
media.contentType
media.description
media.height
media.resourceName
media.url
media.width
plusOnes
plusOnes[]
plusOnes[].plusOner
plusOnes[].plusOner.avatarImageUrl
plusOnes[].plusOner.displayName
plusOnes[].plusOner.profilePageUrl
plusOnes[].plusOner.resourceName
postAcl
postAcl.communityAcl
postAcl.communityAcl.community
postAcl.communityAcl.community.displayName
postAcl.communityAcl.community.resourceName
postAcl.communityAcl.users
postAcl.communityAcl.users[]
postAcl.communityAcl.users[].displayName
postAcl.communityAcl.users[].resourceName
postAcl.eventAcl
postAcl.eventAcl.event
postAcl.eventAcl.event.resourceName
postAcl.isLegacyAcl
postAcl.visibleToStandardAcl
postAcl.visibleToStandardAcl.circles
postAcl.visibleToStandardAcl.circles[]
postAcl.visibleToStandardAcl.circles[].displayName
postAcl.visibleToStandardAcl.circles[].resourceName
postAcl.visibleToStandardAcl.circles[].type
postAcl.visibleToStandardAcl.users
postAcl.visibleToStandardAcl.users[]
postAcl.visibleToStandardAcl.users[].displayName
postAcl.visibleToStandardAcl.users[].resourceName
resharedPost
resharedPost.album
resharedPost.album.media
resharedPost.album.media[]
resharedPost.album.media[].contentType
resharedPost.album.media[].description
resharedPost.album.media[].height
resharedPost.album.media[].resourceName
resharedPost.album.media[].url
resharedPost.album.media[].width
resharedPost.author
resharedPost.author.avatarImageUrl
resharedPost.author.displayName
resharedPost.author.profilePageUrl
resharedPost.author.resourceName
resharedPost.content
resharedPost.link
resharedPost.link.imageUrl
resharedPost.link.title
resharedPost.link.url
resharedPost.media
resharedPost.media.contentType
resharedPost.media.description
resharedPost.media.height
resharedPost.media.resourceName
resharedPost.media.url
resharedPost.media.width
resharedPost.resourceName
resharedPost.url
reshares
reshares[]
reshares[].resharer
reshares[].resharer.avatarImageUrl
reshares[].resharer.displayName
reshares[].resharer.profilePageUrl
reshares[].resharer.resourceName
resourceName
ReplyDeleteupdateTime
url
Julian Bond - thanks for providing the collection example. This seems to be another ACL, which is not how I would have collections expected to work...
ReplyDeleteFilip H.F. Slagter - I have a subset of the keys listed in the wiki. In particular no communityAttachment/... and postAcl.communityAcl.users/...
ReplyDeleteWhat is a communityAttachment? Sharing a community through a post? What is the communityAcl.users? Limiting a community post only to a few users? Can this be done for cirlces as well?
the communityAcl.users[] is probably when the post, shared to a community, also mentions other users, thus automatically including them in the audience. I'd have to double check the actual json file from which it's grabbed to be sure though.
ReplyDeleteAs for communityAttachment, that's exactly what it is. See https://plus.google.com/112064652966583500522/posts/5YqznvvKu7c as an example of such post.
plus.google.com - Nature Photography Community The Nature Photography Community by +Nature Phot...