Has anybody actually had any luck with Takeout exporting their Google+ Stream ActivityLog, JSON or otherwise?
Has anybody actually had any luck with Takeout exporting their Google+ Stream ActivityLog, JSON or otherwise? It fails consistently for me for at least two and usually all three of these:
+1s on comments
+1s on posts
Comments
Everything else I can get just fine, but not these three. In nine attempts across roughly a week, I managed to get one that claimed to contain "Comments.json", but actually didn't, and one that contained a "+1s on comments.json" with some sane-looking data even though the report overview claimed that it didn't.
+1s on comments
+1s on posts
Comments
Everything else I can get just fine, but not these three. In nine attempts across roughly a week, I managed to get one that claimed to contain "Comments.json", but actually didn't, and one that contained a "+1s on comments.json" with some sane-looking data even though the report overview claimed that it didn't.
I hear from many people who are wrapped around the axle over difficulties exporting +1s. So here's my question. What is the plan for using these on a new platform?
ReplyDeleteHm, the earliest entries in my "+1s on comments" are from April 2012.
ReplyDeleteI've looked through earlier posts on here that I commented on, and none of them show any +1s on comments, even though I know for a fact certain comments that had a ton of them.
Could be that the same reason that Takeout trips up also makes G+ trip up and not show older +1s...
I haven't gone thru this in detail, but maybe there's something here to help?
ReplyDeleteblog.thatagency.com - How to Download Your Google+ Data Using Google Takeout
❨❨❨David C. Frier❩❩❩ they're not really critical I guess, but they do have some semantics.
ReplyDeleteFor example, we regularly used +1s on comments as a polling mechanism before polls were a thing.
On the other hand, since you apparently only get your own +1s, but not the number of +1s on comments of your own posts, it's a bit of a moot point...
Also, just technical curiosity and a bit of OCD...
The Activitylog always fails for me. Already posted feedback.
ReplyDeleteI just sent an accumulated version of this thread as feedback as well:
ReplyDeleteTakeout for the Google+ ActivityLog results in errors on the following items all the time:
+1s on comments
+1s on posts
Comments
Also, comments before April 2012 seem to have lost all +1s in Google+ itself, which could be related to the Takeout failure.
Last but not least, it would be useful to get all +1 data on comments of one's own posts, similar to what we already get for the post itself.
Rationale: esp. in the earlier days before native polling capabilities, +1s on comments were often used as poor man's polls. And even later, opinions expressed in comments were often weighted by the +1s they received. Also, just completeness.
Could you kindly explain the steps that you used for successfully exporting what you did? TIA. :-)
ReplyDeletehttps://plus.google.com/collection/UtTceE
I'm wondering if recent export failures are because they time out. Google Plus has been extremely slow for me since a week or two before they made the 8/2019 announcement. Are they adding insult to injury here?
ReplyDeleteMike Waters
ReplyDelete1. Go to https://takeout.google.com
2. Click "Select All" button to deselect everything
3. Scroll down to the entries starting with "Google+" (Ignore the "+1s" entry at the very beginning of the list)
4. Select what you want from the three Google+ sections by switching the buttons on the right to "on" (For me Circles and Streams, since I don't own any Communities)
5. Expand the selected sections and configure export formats (I exported everything under Stream as JSON, and my Circles once as vCard and once as CSV)
6. Click Next
7. Choose an archive format and max size (I used TGZ and 50GB since I didn't want it to split and zip can be problematic with larger archived)
8. Choose delivery method (I used "Send download link via email")
9. Press "Create Archive"
10. On the next page, you should get the option to download the archive directly after a while. Otherwise check your mail for a link.
-----
11. If there were errors: The downloaded archive will have an index.html in the Takeout folder. Open that in a browser and you should see which parts failed exactly. Follow all steps from above, but only choose the failing bits in order to try again.
(Hint: If you expand the "Google+ Stream" section, you get the option to "Select specific data". That way, you can select the specific bits of your stream that failed, like ActivityLog)
Mike Waters I prefer to think of this as stress-testing.
ReplyDeleteI don't have any access to Google Internal data, but presume:
1. That the GDTO volume has increased.
2. That it is a small fraction of the anticipated volume as the Sunset nears.
If nothing else, Google are discovering the hot spots within the system. I'm going to guess that there are possibly search, retrieval, and working-set bottlenecks. And some data that's not been touched in a long while.
Carsten Reckord Given non-ASCII filenames (Google use content data to create filenames, caution advised), .tgz formats may fail.
ReplyDeleteFilip H.F. Slagter is suggesting ZIP formats on that basis, though I think the problem may be worst on Mac platforms.
Edward Morbius I had non-ASCII chars in content. Takeouts replaces their UTF-8 bytes with "?". So that should be safe even on systems that can't handle UTF-8 strings in tar (which any halfway decent ones should).
ReplyDeleteEdward Morbius Carsten Reckord it's not as much that tgz will fail, but rather that non-ascii content could cause illegible filenames since (as Edward pointed out), the first x characters of the content determines the filename. Theoretically you could probably convert those with another tool, but best to have to right from the get-go.
ReplyDeleteAlso, since non-ascii characters can use up more than a single byte per character, the max filename length might mismatch the actual length, potentially causing issues with cropped file-extensions (.jso or even .j rather than .json for instance).
Haven't yet verified that a zip rather than tgz export has solved this last issue for me, though it seems to have solved the non-ascii filenames for me.
Example from my G+ Photos archives:
.tgz: Kafe__ Belgie__ - beertasting - 06.jpg.metadata.json
.zip: Kafé België - beertasting - 06.jpg.metadata.json
Filip H.F. Slagter Unfinished initial paragraph?
ReplyDeleteeh, yeah. Baby duty called and I didn't want to lose progress, lol
ReplyDeleteFilip H.F. Slagter Your progress, or baby's ;-)
ReplyDeleteSpeaking of other tools, iconv is very good at restricting content to specified charactersets and/or converting between them.
ReplyDeleteOr you could just us tr or sed to drop everything outside a specific set.
IMO Google should Be Somewhat Less Clever About Extended Characters in Filenames.
Curious as to the arguments for not sticking to 128 ASCII characters, or a subset of that even.
[A-Za-z0-9_-]
No space, no quotes, no symbols. 64 chars. They're just bloody files.
Edward Morbius to make them more legible, and to quickly see what's in the post without having to open it?
ReplyDeleteAlso, not all languages use US-ASCII for their alphabet. :)
åæø for instance are quite common in Norwegian, and umlauts are really common in German, though all of those can fairly well be TRANSLITerated with iconv and the likes.
Languages with a completely different alphabet, such as Hebrew, are a whole other cookie to break y'r teeth on.
Repeating myself here. This sequence failed on any filename with a non US-Ascii character.
ReplyDeleteTakeout
Zip
MS Windows unzip with Zip7
dir /b /O-D *.html > dir.txt
notepad++ convert to $FILENAME for each row
That's like the minimum viable tech solution to turning the takeout into a static archive site on some webhosting. Which means writing code, (bash, php, etc) to iconv rename the files prior to listing them. If it comes to that then you might as well use the code to create the alternate index.html completely. For something that one is only really going to do once. And the reason for doing it is because the index.html provided by Google is *SO F*CKING FULL OF CR*P".
Why do Google's programmers like obfuscated javascript libraries and dense, random CSS classes so much? Do they get paid according to how impenetrable their web pages are? Index.html is a web page you're giving to your user, FFS!
Feedback sent.
Julian Bond when dealing with the Windows command prompt and/or batch files, you might also want to look into changing the codepage with chcp
ReplyDelete1252 is for Latin-1
65001 is for UTF-8
1200 is for UTF-16 LE-BOM
1201 for UTF-16 BE-BOM aka unicodeFFFE
see https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers for a full list
As well as using CMD /U
https://stackoverflow.com/questions/32182619/chcp-65001-and-a-bat-file might have some useful solutions, especially https://stackoverflow.com/a/32183229 and https://stackoverflow.com/a/33158980
Alternatively using PowerShell rather than CMD might also be a good idea.
Filip H.F. Slagter What is native to computing environments, though?
ReplyDeleteWindow, Mac, Linux, BSDs, Android, ... are all ASCII-centric.
As I just went through with Christian Conrad a day or so back, for exchange formats you want limited-option standards. Be expressive inside the files.
Is the Latin characterset inaccessible from elsewhere?
Because Hebrew, Greek, Arabic, Chinese, Korean, Thai, Japanese, Cyrillic, and Indonesia charactersets do not flow freely from my fingertips. Nor do wingdings, line-drawing symbols, emoji, maths notations, astronomical, astrological, or other symbols.
Or all but a scant fraction of unicode.
Carsten Reckord Thanks! However, I think that I see at least two things that might be missing.
ReplyDelete1. The login method. Don't we have to use our G+ URL as the login name? I think that's where I failed.
2. Possibly, what to select. It seems to me that I saw more details some time back.
Sorry if I sound unappreciative. :-)
Mike Waters no problem at all.
ReplyDelete1. No, Takeout is for everything in your Google profile, including but not limited to G+. So you just need to login on the takeout site with the same Google account as you use for G+ (if you're logged in at G+ that should usually already be the case).
2. Just selecting the three "Google+" items (Circles, Communities, Stream) should give you everything G+ related and is a safe choice. Just make sure to expand them afterwards and change the format from HTML to JSON everywhere.
Limiting to sub-sections like ActivityLog could just come in handy if specific parts fail to export, so you don't have to download everything again and again (for me that was around 2.5 GB of data after all...)
Something else to watch out for in Takeout filenames if they somehow end up in HTML Links. I had to go through and remove # / \ characters.
ReplyDeleteHere's some minimal php code (with no error handling or dupe filename checking!).
foreach (glob('*.html') as $filename) {
$fileNameClean = iconv('UTF-8', 'ASCII//IGNORE', $filename);
$fileNameClean = str_replace(array('#','\','/'), '', $fileNameClean);
if ($filename != $fileNameClean) {
echo "
$filename $fileNameClean \n";
rename ($filename, $fileNameClean);
$filename = $fileNameClean;
}
alternatively uri-encode them :)
ReplyDelete