I own the Surfing community here, which has 139,000+ members. I put up notice that I'm moving the community to MeWe, started a new community there, and there aren't even a 100 members there now. My point is that no matter what the owner does or says, everyone is going to do their own thing.
MeWe is still cool though. I already have over 300 contacts there, and the platform works OK. Better than going back to Facebook anyway.
Ethan Boyle I'm in the middle of an assessment of Community vitality. My read is that in most cases, you're lucky to have 1:1,000 members active.
G+MM, created 8 October 2018, has about a 25% activity rate among its 3,500 members, on a weekly basis. And that's on a brand-spanking-new Community.
Older communities, particularly dating to < 2015 or so, likely have a huge attrition rate, whether from tyre-kickers, spammers, or ... ?
Related concept is that explicit measures (users joining / plussing / friending, etc.) are at best a weak measure of true interaction. What's done matters far more.
The first page scraped of a profile includes a few posts -- I'm trying to sort out the count.
From that I can get the age ([0-9][0-9]?[smhdw] or date), the +1s, the comments, and the reshares. Also the author's ID and a few of the commenters. Text of the post is also available. And pin status.
My thought is to get a general sense of what the post rate is (posts / elapsed-time = posting rate), and the engagement level. Also possibly of users who show up in multiple Communities, in the event of overlap. Any of both are going to be highly significant, due to sampling.
I'm working on the data extraction right now -- I've got 36,000 HTML files waiting to have the marrow extracted from them, if I can find a reliable way to parse the HTML itself.
title, description, public/private status, members (for public), membership policy ("join" vs. "ask to join"), description, sections, links, and comments are all available. Possibly some textual analysis as well.
Edward Morbius so we are clear, are you doing that with the Surfing community? I don't care if you do, although I would be curious about the final analysis.
4 plus ones 4 plus ones 36 plus ones 14 plus ones 10 plus ones 3 plus ones 25 plus ones 8 plus ones 8 plus ones 11 plus ones
(This is just using grep on the dumped output, I'm working on parsing the HTML itself to avoid match collisions with text, and more general robustness.)
... etc, on a community by community basis. Then I can look to see what communities have engagement -- plus ones, reshares, comments, and how those are distributed across the whole set.
Ethan Boyle w3m reports the ten most recent comments. You can try the command I'd run. You might also care to confirm that we're talking about the same community. I guessed at "the surfing community" and called up the largest of the set Google+ Search offered me.
Not to cut this short, but ... to cut this short:
1. The goal isn't to report on any one specific community, but to provide an overall sense of communities. Statistics offers a way to do this, through sampling and methodology. Methology so that the reports are repeatable, sampling to avoid bias.
2. If you want to talk about some specific instance, please specify it. Specifically. (Your profile, this thread, etc., don't make clear just what community you're talking about, as an example.) The handy thing with commandline tools is that they're both specific and repeatable. Unless Google are gaslighting both of us, running the command I'd given should (at least at several minutes or hours resolution) offer roughly the same result on some URL thrown at it.
3. The challenge for me is systematically snarfing the signal I'm looking for out of Google's crufty HTML. And frankly, I need to get back to that.
Ethan Boyle Using the same community as you'd had in mind, a quick-and-dirty check:
$ w3m -dump 'https://plus.google.com/communities/117086044705199039350' | egrep '(comments|plus ones|share|[0-9][0-9]*[smhdw])' Do not post and disable comments or reshares. Do not post and disable comments or reshares. 2h 2 plus ones no shares Post has shared content 9h Originally shared by Reef Master 6 plus ones no shares Post has shared content 9h Originally shared by Reef Master 12 plus ones no shares 10h 20 plus ones 2 shares 23h 41 plus ones 2 shares 21h 26 plus ones 2 shares 15h 12 plus ones no shares 21h 21 plus ones one share 22h 16 plus ones no shares Post has shared content 15h Originally shared by Reef Master 12 plus ones no shares
Ten posts in 15 hours, about 1.5/hr, 10:10 have plus ones, mean 17.7, 4:10 reshared, 0.7 mean shares, 7 total. No comments.
What my larger set of data will give is a comparison of this to a larger baseline, as well as indications of what high-engagement communities look like. Textual analysis to suss out substantive comments (say, like this thread) vs. "wow!" "nice!" "call me" "best bangalore visa tuk tuk", etc., would also be nice. I probably won't get to that though.
For my own scraping scripts I tend to use Ruby. For the downloading of pages I use either Faraday gem or OpenURI standard library, wrapped in a bit of caching logic that will check a local on-disk storage directory and read from that, or if a local file is not present or outdated, will download and store a fresh copy. For the actual parsing I tend to use the Nokogiri gem, usually with CSS selector queries, though an occasional XPath query where CSS selectors just don't cut it.
This article fails to mention that MeWe charges monthly fees for storage exceeding 8 gigs and there's also a fee-based chat. Click the Cloud icon at the top of your MeWe page to see the monthly fees. If you post a lot of images and gifs, you might hit the 8 gigs limit over time. Found this at: Kathie “Kat” Gifford Alternate Sites & Platforms
New comments on this blog are moderated. If you do not have a Google identity, you are welcome to post anonymously. Your comments will appear here after they have been reviewed. Comments with vulgarity will be rejected.
I own the Surfing community here, which has 139,000+ members. I put up notice that I'm moving the community to MeWe, started a new community there, and there aren't even a 100 members there now. My point is that no matter what the owner does or says, everyone is going to do their own thing.
ReplyDeleteMeWe is still cool though. I already have over 300 contacts there, and the platform works OK. Better than going back to Facebook anyway.
Ethan Boyle I'm in the middle of an assessment of Community vitality. My read is that in most cases, you're lucky to have 1:1,000 members active.
ReplyDeleteG+MM, created 8 October 2018, has about a 25% activity rate among its 3,500 members, on a weekly basis. And that's on a brand-spanking-new Community.
Older communities, particularly dating to < 2015 or so, likely have a huge attrition rate, whether from tyre-kickers, spammers, or ... ?
Related concept is that explicit measures (users joining / plussing / friending, etc.) are at best a weak measure of true interaction. What's done matters far more.
I'm trying to get some action metrics, if I can.
That all sounds accurate to me.
ReplyDeleteHow do you collect your action metrics?
Ethan Boyle Working on that.
ReplyDeleteThe first page scraped of a profile includes a few posts -- I'm trying to sort out the count.
From that I can get the age ([0-9][0-9]?[smhdw] or date), the +1s, the comments, and the reshares. Also the author's ID and a few of the commenters. Text of the post is also available. And pin status.
My thought is to get a general sense of what the post rate is (posts / elapsed-time = posting rate), and the engagement level. Also possibly of users who show up in multiple Communities, in the event of overlap. Any of both are going to be highly significant, due to sampling.
I'm working on the data extraction right now -- I've got 36,000 HTML files waiting to have the marrow extracted from them, if I can find a reliable way to parse the HTML itself.
title, description, public/private status, members (for public), membership policy ("join" vs. "ask to join"), description, sections, links, and comments are all available. Possibly some textual analysis as well.
Edward Morbius so we are clear, are you doing that with the Surfing community? I don't care if you do, although I would be curious about the final analysis.
ReplyDeleteEthan Boyle No idea. I've randomly selected the communities by URL, so I won't know what I've grabbed until I parse the HTML to get labels.
ReplyDeleteYou can just look at the community in a console-mode browser (I prefer w3m for this).
For this particular surfing community, I'd get something like:
$ w3m - dump 'https://plus.google.com/communities/106294999376739081268' | grep
'^[0-9][0-9]*[smhdw]' | cat - n-
1 12h
2 2d
3 3d
4 4d
5 4d
6 4d
7 6d
8 5d
9 6d
10 5d
So: ten posts over 5 days, or 2/day
Two of those have comments. Plus ones:
4 plus ones
4 plus ones
36 plus ones
14 plus ones
10 plus ones
3 plus ones
25 plus ones
8 plus ones
8 plus ones
11 plus ones
(This is just using grep on the dumped output, I'm working on parsing the HTML itself to avoid match collisions with text, and more general robustness.)
The idea is to come up with some index like:
posts: 10
newest post: 12h
oldest post: 5d
post rate: 2/d
plussed posts: 10/10
plusses/post: 12.3
... etc, on a community by community basis. Then I can look to see what communities have engagement -- plus ones, reshares, comments, and how those are distributed across the whole set.
Edward Morbius you run Linux?
ReplyDeleteEthan Boyle Frequently.
ReplyDeleteEdward Morbius I just did a manual count and came up with roughly 13 posts in the last 24 hours. How did you come up with only 10?
ReplyDeleteNever mind, I see you are looking at a different surfing community. Mine is this one: https://plus.google.com/communities/117086044705199039350
ReplyDeleteEdward Morbius Linux is all I use and have used for maybe 15 years. Anyway, not to get too off topic.
ReplyDeleteEthan Boyle w3m reports the ten most recent comments. You can try the command I'd run. You might also care to confirm that we're talking about the same community. I guessed at "the surfing community" and called up the largest of the set Google+ Search offered me.
ReplyDeleteNot to cut this short, but ... to cut this short:
1. The goal isn't to report on any one specific community, but to provide an overall sense of communities. Statistics offers a way to do this, through sampling and methodology. Methology so that the reports are repeatable, sampling to avoid bias.
2. If you want to talk about some specific instance, please specify it. Specifically. (Your profile, this thread, etc., don't make clear just what community you're talking about, as an example.) The handy thing with commandline tools is that they're both specific and repeatable. Unless Google are gaslighting both of us, running the command I'd given should (at least at several minutes or hours resolution) offer roughly the same result on some URL thrown at it.
3. The challenge for me is systematically snarfing the signal I'm looking for out of Google's crufty HTML. And frankly, I need to get back to that.
Hold off a bit on the questions, thanks.
Ethan Boyle Using the same community as you'd had in mind, a quick-and-dirty check:
ReplyDelete$ w3m -dump 'https://plus.google.com/communities/117086044705199039350' | egrep '(comments|plus ones|share|[0-9][0-9]*[smhdw])'
Do not post and disable comments or reshares.
Do not post and disable comments or reshares.
2h
2 plus ones
no shares
Post has shared content
9h
Originally shared by Reef Master
6 plus ones
no shares
Post has shared content
9h
Originally shared by Reef Master
12 plus ones
no shares
10h
20 plus ones
2 shares
23h
41 plus ones
2 shares
21h
26 plus ones
2 shares
15h
12 plus ones
no shares
21h
21 plus ones
one share
22h
16 plus ones
no shares
Post has shared content
15h
Originally shared by Reef Master
12 plus ones
no shares
Ten posts in 15 hours, about 1.5/hr, 10:10 have plus ones, mean 17.7, 4:10 reshared, 0.7 mean shares, 7 total. No comments.
What my larger set of data will give is a comparison of this to a larger baseline, as well as indications of what high-engagement communities look like. Textual analysis to suss out substantive comments (say, like this thread) vs. "wow!" "nice!" "call me" "best bangalore visa tuk tuk", etc., would also be nice. I probably won't get to that though.
Can you do a wc on the comments?
ReplyDeleteany specific reason you're doing a w3m -dump rather than just curl? Not that it matters much, just curious about your reasoning :)
ReplyDeletealso, for parsing the html using XPath queries, you could consider any of the cli tools mentioned in this thread: https://stackoverflow.com/questions/15461737/how-to-execute-xpath-one-liners-from-shell
ReplyDeleteIf you prefer CSS selectors instead, there's the W3C tools such as hxselect: https://www.w3.org/Tools/HTML-XML-utils/
or Keegan Street's Element Finder: https://github.com/keeganstreet/element-finder/blob/master/readme.md
stackoverflow.com - How to execute XPath one-liners from shell?
Shame Cruiser Motorcycles went to MeWe; didn't Cake.co's founder run a big motorcycle community on Cake?
ReplyDeleteFilip H.F. Slagter I'd curled the actual pages, with a head dump as well (37 or so 3xx errors).
ReplyDeletew3m for quick and dirty analysis.
xmllint and a local HTML formatter for prelim analysis / dev.
Filip H.F. Slagter element finder looks useful.
ReplyDeleteFilip H.F. Slagter And HTML-XML-utils too. Thanks.
ReplyDeleteFor my own scraping scripts I tend to use Ruby.
ReplyDeleteFor the downloading of pages I use either Faraday gem or OpenURI standard library, wrapped in a bit of caching logic that will check a local on-disk storage directory and read from that, or if a local file is not present or outdated, will download and store a fresh copy.
For the actual parsing I tend to use the Nokogiri gem, usually with CSS selector queries, though an occasional XPath query where CSS selectors just don't cut it.
Installing html-xml-utils, Debian. Beautiful Soup (Python) already installed, didn't realise it.
ReplyDeleteThis article fails to mention that MeWe charges monthly fees for storage exceeding 8 gigs and there's also a fee-based chat. Click the Cloud icon at the top of your MeWe page to see the monthly fees. If you post a lot of images and gifs, you might hit the 8 gigs limit over time. Found this at:
ReplyDeleteKathie “Kat” Gifford
Alternate Sites & Platforms