Skip to main content

JPGs in Takeout.G+Stream.Posts.JSON

JPGs in Takeout.G+Stream.Posts.JSON

I've just gone down a rabbit hole and now I'm confused. I was puzzled about the location of images that are currently stored in https://$XX.googleusercontent.com/ and are then referenced in the JSON. But then I noticed that my takeout download has 1000 or so img9999.jpg files that are from somebody else. I do kind of recognise them and I think what might have happened is that somebody commented on one of my posts with a link to a large Google Photos Album. But why should these end up in MY takeout file?

There's also a large number of images that do belong to me, but they're stored in Google Photos. And I just made a link to them in various G+ posts. They weren't uploaded to G+ specifically. So why are they in the G+Posts Takeout archive?

None of these photos are visible to me in https://get.google.com/albumarchive/ or in my Photos store, or in my local synced directory.

I wonder if this is part of the source of the HUGE takeout files some people are getting.

Any ideas on what the hell is going on here? It feels like the Takeout code is attempting to de-reference images in posts and comments, including the files in the download, but being much too liberal about it. It's doing it for image files that are really not part of G+.

ps. I'm also getting really irritated by just how chaotic the Google ecosystem is, especially around photos. It's exhausting just trying to find the URL for the other, other display of the same files.

pps. We really need a statement about image files that may be somehow connected by G+ but are actually currently located in https://$XX.googleusercontent.com/ Will they be deleted when all the G+ content disappears?

Comments

  1. Yeah I think this was what happened to my takeout archive. I'm also afraid that it might have made duplicates of albums if you have shared the same albums multiple times. I did that quite a lot during the first few years of G+, back when Photos was still part of G+. Whenever I added new pictures to an album I often reshared the album.

    ReplyDelete
  2. Just done another Takeout to check. And now all the image files have been renamed to a hash, something like 1e6o5zfta2b0i.jpg However, all the same extraneous files are in there. They're definitely from a circled contact/friend. And sharing their whole Google Photos archive with me is just the sort of thing they'd do.

    ReplyDelete
  3. I don't see a lot of evidence of duplicates. There's a small number of my own photos that are repeated like this. 17217utaxisc2.JPG.jpg, 17217utaxisc2[1].JPG.jpg 17217utaxisc2[2].JPG.jpg Not entirely sure yet why they're referenced 3 times.

    And note the .JPG.jpg this has broken all the links in the html files that point at the plain .jpg filename. Dammit Google!

    ReplyDelete
  4. 3 items of Feedback posted.

    1) 3rd party owned images appearing in my takeout archive when they're not even linked in my posts. Making the archive MUCH bigger than it needs to be.

    2) Un-necessary duplication of the same image served from the same URL bloating the archive eg 18mav32gy1s1e.JPG.jpg, 18mav32gy1s1e.JPG(1).jpg

    3) Broken filenames eg 18mav32gy1s1e.JPG.jpg which is then referenced in the HTML file as 18mav32gy1s1e.JPG leaving a broken image.

    ReplyDelete
  5. There was a problem with file names getting truncated so they switched to hashes.

    ReplyDelete
  6. John Lewis Well that's good because it's now consistent. Bad because it's apparently broken occasionally (.JPG.jpg). And a bit of a problem if you previously did a takeout, did work on it and then downloaded the same data again.

    Hey ho. I hope they make any other necessary changes quickly.

    Does anyone know of a Takeout changelog?

    ReplyDelete
  7. not that I'm aware of any. When I get back home again in a week or so, I need to do a new Takeout again. I'd do one sooner, but my in-laws's internet is not suitable for downloading anything, let alone 50+GB Takeout archives...
    I really wish they'd do something about making the archives significantly smaller...

    ReplyDelete
  8. If they can't deduplicate the files before adding them to an archive, then perhaps instead they can add a non-authenticated download URL + md5 or shasum for each of them instead of the actual file, so I can just decide to manually download them with wget afterwards myself.
    That would significantly reduce the archive size, and allow me to only download the images I'm interested in.

    ReplyDelete
  9. No change. The three related bugs are still present. IMHO Takeout.G+Streams.Posts is still broken.

    ReplyDelete

Post a Comment

New comments on this blog are moderated. If you do not have a Google identity, you are welcome to post anonymously. Your comments will appear here after they have been reviewed. Comments with vulgarity will be rejected.

”go"