Skip to main content

Has anybody actually had any luck with Takeout exporting their Google+ Stream ActivityLog, JSON or otherwise?

Has anybody actually had any luck with Takeout exporting their Google+ Stream ActivityLog, JSON or otherwise? It fails consistently for me for at least two and usually all three of these:

+1s on comments
+1s on posts
Comments

Everything else I can get just fine, but not these three. In nine attempts across roughly a week, I managed to get one that claimed to contain "Comments.json", but actually didn't, and one that contained a "+1s on comments.json" with some sane-looking data even though the report overview claimed that it didn't.

Comments

  1. I hear from many people who are wrapped around the axle over difficulties exporting +1s. So here's my question. What is the plan for using these on a new platform?

    ReplyDelete
  2. Hm, the earliest entries in my "+1s on comments" are from April 2012.

    I've looked through earlier posts on here that I commented on, and none of them show any +1s on comments, even though I know for a fact certain comments that had a ton of them.

    Could be that the same reason that Takeout trips up also makes G+ trip up and not show older +1s...

    ReplyDelete
  3. ❨❨❨David C. Frier❩❩❩ they're not really critical I guess, but they do have some semantics.

    For example, we regularly used +1s on comments as a polling mechanism before polls were a thing.

    On the other hand, since you apparently only get your own +1s, but not the number of +1s on comments of your own posts, it's a bit of a moot point...

    Also, just technical curiosity and a bit of OCD...

    ReplyDelete
  4. The Activitylog always fails for me. Already posted feedback.

    ReplyDelete
  5. I just sent an accumulated version of this thread as feedback as well:

    Takeout for the Google+ ActivityLog results in errors on the following items all the time:

    +1s on comments
    +1s on posts
    Comments

    Also, comments before April 2012 seem to have lost all +1s in Google+ itself, which could be related to the Takeout failure.

    Last but not least, it would be useful to get all +1 data on comments of one's own posts, similar to what we already get for the post itself.

    Rationale: esp. in the earlier days before native polling capabilities, +1s on comments were often used as poor man's polls. And even later, opinions expressed in comments were often weighted by the +1s they received. Also, just completeness.

    ReplyDelete
  6. Could you kindly explain the steps that you used for successfully exporting what you did? TIA. :-)
    https://plus.google.com/collection/UtTceE

    ReplyDelete
  7. I'm wondering if recent export failures are because they time out. Google Plus has been extremely slow for me since a week or two before they made the 8/2019 announcement. Are they adding insult to injury here?

    ReplyDelete
  8. Mike Waters

    1. Go to https://takeout.google.com
    2. Click "Select All" button to deselect everything
    3. Scroll down to the entries starting with "Google+" (Ignore the "+1s" entry at the very beginning of the list)
    4. Select what you want from the three Google+ sections by switching the buttons on the right to "on" (For me Circles and Streams, since I don't own any Communities)
    5. Expand the selected sections and configure export formats (I exported everything under Stream as JSON, and my Circles once as vCard and once as CSV)
    6. Click Next
    7. Choose an archive format and max size (I used TGZ and 50GB since I didn't want it to split and zip can be problematic with larger archived)
    8. Choose delivery method (I used "Send download link via email")
    9. Press "Create Archive"
    10. On the next page, you should get the option to download the archive directly after a while. Otherwise check your mail for a link.
    -----
    11. If there were errors: The downloaded archive will have an index.html in the Takeout folder. Open that in a browser and you should see which parts failed exactly. Follow all steps from above, but only choose the failing bits in order to try again.
    (Hint: If you expand the "Google+ Stream" section, you get the option to "Select specific data". That way, you can select the specific bits of your stream that failed, like ActivityLog)

    ReplyDelete
  9. Mike Waters I prefer to think of this as stress-testing.

    I don't have any access to Google Internal data, but presume:

    1. That the GDTO volume has increased.
    2. That it is a small fraction of the anticipated volume as the Sunset nears.

    If nothing else, Google are discovering the hot spots within the system. I'm going to guess that there are possibly search, retrieval, and working-set bottlenecks. And some data that's not been touched in a long while.

    ReplyDelete
  10. Carsten Reckord Given non-ASCII filenames (Google use content data to create filenames, caution advised), .tgz formats may fail.

    Filip H.F. Slagter is suggesting ZIP formats on that basis, though I think the problem may be worst on Mac platforms.

    ReplyDelete
  11. Edward Morbius I had non-ASCII chars in content. Takeouts replaces their UTF-8 bytes with "?". So that should be safe even on systems that can't handle UTF-8 strings in tar (which any halfway decent ones should).

    ReplyDelete
  12. Edward Morbius Carsten Reckord it's not as much that tgz will fail, but rather that non-ascii content could cause illegible filenames since (as Edward pointed out), the first x characters of the content determines the filename. Theoretically you could probably convert those with another tool, but best to have to right from the get-go.

    Also, since non-ascii characters can use up more than a single byte per character, the max filename length might mismatch the actual length, potentially causing issues with cropped file-extensions (.jso or even .j rather than .json for instance).

    Haven't yet verified that a zip rather than tgz export has solved this last issue for me, though it seems to have solved the non-ascii filenames for me.

    Example from my G+ Photos archives:
    .tgz: Kafe__ Belgie__ - beertasting - 06.jpg.metadata.json
    .zip: Kafé België - beertasting - 06.jpg.metadata.json

    ReplyDelete
  13. Filip H.F. Slagter Unfinished initial paragraph?

    ReplyDelete
  14. eh, yeah. Baby duty called and I didn't want to lose progress, lol

    ReplyDelete
  15. Filip H.F. Slagter Your progress, or baby's ;-)

    ReplyDelete
  16. Speaking of other tools, iconv is very good at restricting content to specified charactersets and/or converting between them.

    Or you could just us tr or sed to drop everything outside a specific set.

    IMO Google should Be Somewhat Less Clever About Extended Characters in Filenames.

    Curious as to the arguments for not sticking to 128 ASCII characters, or a subset of that even.

    [A-Za-z0-9_-]

    No space, no quotes, no symbols. 64 chars. They're just bloody files.

    ReplyDelete
  17. Edward Morbius to make them more legible, and to quickly see what's in the post without having to open it?

    Also, not all languages use US-ASCII for their alphabet. :)
    åæø for instance are quite common in Norwegian, and umlauts are really common in German, though all of those can fairly well be TRANSLITerated with iconv and the likes.
    Languages with a completely different alphabet, such as Hebrew, are a whole other cookie to break y'r teeth on.

    ReplyDelete
  18. Repeating myself here. This sequence failed on any filename with a non US-Ascii character.
    Takeout
    Zip
    MS Windows unzip with Zip7
    dir /b /O-D *.html > dir.txt
    notepad++ convert to $FILENAME for each row

    That's like the minimum viable tech solution to turning the takeout into a static archive site on some webhosting. Which means writing code, (bash, php, etc) to iconv rename the files prior to listing them. If it comes to that then you might as well use the code to create the alternate index.html completely. For something that one is only really going to do once. And the reason for doing it is because the index.html provided by Google is *SO F*CKING FULL OF CR*P".

    Why do Google's programmers like obfuscated javascript libraries and dense, random CSS classes so much? Do they get paid according to how impenetrable their web pages are? Index.html is a web page you're giving to your user, FFS!

    Feedback sent.

    ReplyDelete
  19. Julian Bond when dealing with the Windows command prompt and/or batch files, you might also want to look into changing the codepage with chcp
    1252 is for Latin-1
    65001 is for UTF-8
    1200 is for UTF-16 LE-BOM
    1201 for UTF-16 BE-BOM aka unicodeFFFE
    see https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers for a full list

    As well as using CMD /U

    https://stackoverflow.com/questions/32182619/chcp-65001-and-a-bat-file might have some useful solutions, especially https://stackoverflow.com/a/32183229 and https://stackoverflow.com/a/33158980

    Alternatively using PowerShell rather than CMD might also be a good idea.

    ReplyDelete
  20. Filip H.F. Slagter What is native to computing environments, though?

    Window, Mac, Linux, BSDs, Android, ... are all ASCII-centric.

    As I just went through with Christian Conrad a day or so back, for exchange formats you want limited-option standards. Be expressive inside the files.

    Is the Latin characterset inaccessible from elsewhere?

    Because Hebrew, Greek, Arabic, Chinese, Korean, Thai, Japanese, Cyrillic, and Indonesia charactersets do not flow freely from my fingertips. Nor do wingdings, line-drawing symbols, emoji, maths notations, astronomical, astrological, or other symbols.

    Or all but a scant fraction of unicode.

    ReplyDelete
  21. Carsten Reckord Thanks! However, I think that I see at least two things that might be missing.
    1. The login method. Don't we have to use our G+ URL as the login name? I think that's where I failed.
    2. Possibly, what to select. It seems to me that I saw more details some time back.
    Sorry if I sound unappreciative. :-)

    ReplyDelete
  22. Mike Waters no problem at all.

    1. No, Takeout is for everything in your Google profile, including but not limited to G+. So you just need to login on the takeout site with the same Google account as you use for G+ (if you're logged in at G+ that should usually already be the case).

    2. Just selecting the three "Google+" items (Circles, Communities, Stream) should give you everything G+ related and is a safe choice. Just make sure to expand them afterwards and change the format from HTML to JSON everywhere.

    Limiting to sub-sections like ActivityLog could just come in handy if specific parts fail to export, so you don't have to download everything again and again (for me that was around 2.5 GB of data after all...)

    ReplyDelete
  23. Something else to watch out for in Takeout filenames if they somehow end up in HTML Links. I had to go through and remove # / \ characters.

    Here's some minimal php code (with no error handling or dupe filename checking!).
    foreach (glob('*.html') as $filename) {
    $fileNameClean = iconv('UTF-8', 'ASCII//IGNORE', $filename);
    $fileNameClean = str_replace(array('#','\','/'), '', $fileNameClean);
    if ($filename != $fileNameClean) {
    echo "
    $filename $fileNameClean \n";
    rename ($filename, $fileNameClean);
    $filename = $fileNameClean;
    }

    ReplyDelete

Post a Comment

New comments on this blog are moderated. If you do not have a Google identity, you are welcome to post anonymously. Your comments will appear here after they have been reviewed. Comments with vulgarity will be rejected.

”go"