Skip to main content

GitLab Pages: publish static websites directly from a repository in GitLab

GitLab Pages: publish static websites directly from a repository in GitLab

...To publish a website with Pages, you can use any Static Site Generator (SSG), such as Jekyll, Hugo, Middleman, Harp, Hexo, and Brunch, just to name a few. You can also publish any website written directly in plain HTML, CSS, and JavaScript.

Pages does not support dynamic server-side processing, for instance, as .php and .asp requires. See this article to learn more about static websites vs dynamic websites: https://about.gitlab.com/2016/06/03/ssg-overview-gitlab-pages-part-1-dynamic-x-static/

If you’re using GitLab.com, your website will be publicly available to the internet. If you’re using self-managed instances (Core, Starter, Premium, or Ultimate), your websites will be published on your own server, according to the Pages admin settings chosen by your sysdamin, who can opt for making them public or internal to your server.


Disclaimer: this approach is closest to how I plan to base my future online activity, and I'd had a migration in process before Google announced the G+ sunset.

The idea isn't to run all my online presence through here, but to make this the hub. My content will be published through the blog, syndicated elsewhere by various means (in particular: Mastodon, Diaspora, and other Fediverse / The Federation, and decentralised platforms, possibly also proprietary ones), with feedback and interactions mediated back to the blog by ... as yet undetermined mechanisms.

A massive gain is that by using Git, I own and control all my data directly. The act of "publishing" is the same as the act of migrating to a new platform. It is also the same as backing up the site to a new location. That is, the entire headache of being captured by Google+'s, or any other site's, possession of my data and content, and of needing to extract, transfer, transform, and republish it, is gone.

Dust. Dead. Finito. A non-problem. An ex-Parrot. Pining for the data silos.

And you can host either on GitLab (~10GB storage -- well above what I've accrued on G+ over the past 7.5 years, if images are de-duped), or on any device of your choosing via a private GitLab instance.

And of course there's RSS and Atom feeds, plus other syndications options built in. Creating an RSS/Atom feed syndicator on Mastodon, Diaspora, or toher platforms is bog simple.


There are a few hurdles.

Yes, you've got to know and use Git. For technical types, this is pretty much a basic employment criterion at this stage. For the rest of you, it's not nearly so bad as it looks, there are tools that do (and very likely will be that will) make it easier. And as noted, the advantages are tremendous.

The sites are "static", in that there is no back-end server majick going on. This has the advantage of tremendously reducing the serving complexity and resources. Static sites do not need to be entirely non-dynamic. Any processing that can happen in the browser is still available, such as in-page interactions, typically by Javascript or CSS. But without some other means of drawing this back into the site itself, those changes don't persist.

The biggest impacts are on search and comments. I've been looking at both of these ... well, not very actively since 8 October 2018, because, um, distractions (thanks, Google). But there are tools for permitting this, manual or automatics.

Lunar.js and several other tools effectively allow publishing the entire search index in a web page and letting the browser find matches. Experiments suggest that this works for single keywords but not compound searches. E.g., you could find "static" or "site", but not "static site" and not "static -site" (that is, pages matching static but not including site). Fuzzy, stemmed, and other forms of indirect matches are also funky. I'm thinking of other approaches which might work, though most involve either shipping the search archive directly or allowing piecemeal assembly, with the heavy lifting occurring in the browser.

Comments can be provided by various mechanisms, including third-party tools (Disqus, etc.), ye olde backstoppe of having people email their responses and posting them manually, or others. Again, I'm exploring the space. My general thought is that making comments a little bit harder for users might be an overall win.


You don't have to go this route. But if you're looking at a blog-central-type future presence, you might want to give it a really hard look.



https://docs.gitlab.com/ee/user/project/pages/index.html
https://docs.gitlab.com/ee/user/project/pages/index.html

Comments

  1. if your site is well-indexed by search engines, I would outsource the searching of your site to one or more search engines anyway...

    ReplyDelete
  2. I'm not sure about Gitlab but for comments in GitHub pages I've been exploring the option of using the GitHub issues because I really don't want to rely on Disqus. I wonder if similar hacks also exist for Gitlab.

    ReplyDelete
  3. https://github.com/imsun/gitment for GitHub looks interesting, though having to manually initialise it for every page you create, sounds tedious.
    github.com - imsun/gitment

    ReplyDelete
  4. There's also Staticman that seems to do that automatically but haven't tried it yet.

    ReplyDelete
  5. Ideally though the comments should also live in the Git repository, so it's also portable.
    Theoretically you could use the pull requests system for it, though that would be rather tedious as well on the maintainer's side.

    ReplyDelete
  6. Filip H.F. Slagter Depends.

    Do you have any other industries you'd like to disrupt for Boxing Day?

    As a sketch, suppose you could come up with a distributed, self-service, reasonably reliable (availability, integrity, non-gaming), relevance-based alternative to Web search. What would that take?


    Right now, if you want to create a search engine, you've got to:

    * Build a spidering infrastructure. This alone is immense.

    * Define your search logic.

    * Come up with a relevance metric that is supportable by available data.

    * Deal with adversaries. Principally black-hat SEO and various DoS / DDoS threats.


    How could we pick this apart, and why does the concept exist in the first place?

    Web spidering exists in part because there are no standards for publishing and verifying site indices. There's Robots.txt and sitemaps, but that just tells the spiders where to search. There's no sense of "show me what's on your site_.

    Hand wave: Most websites are small. Anywhere from a single to, maybe, a few score pages. They're effectively static, and could be reasonably readily indexed.

    Though many significant websites are large. A Wikipedia or Facebook is not going to ship a full site index to you. Some other method of incrementally requesting the relevant index portions quickly is required. Something like a cascade of Bloom filters, perhaps? Prefetch might be used to reduce latency.

    The search logic is moved to the browser, though that interacts with the site search system. Exact, single-term, fu𝔃𝔃y / folded search, phrase match, mispeled terms, Boolean logic and/or/not terms, metadata / field search, date ranges.

    Ranking becomes an interesting problem, though sites' own sets of references to other sites might be used here, plus higher-order checks. Or several Aribters of Truth might be designated, say, the set of existing fact-checking sites, Wikipedia, and others.

    Black-hat SEO defeat would be based both on ranking and on various forms of audit. Effectively, there might be a role for search validators confirming that stated results match actual page contents, to a specific level of accuracy. If you say that you have "xyzzy" but the term does not appear on the specified page, credibility takes a hit.


    All of this is reasonably doable for single site search. The question is whether or not such a system could expand to cover a substantial portion of the Web. I'm not convinced that's possible, but I'm not convinced it's not, either.

    One approach would be to effectively ask some set of sites "what are the most authoritative sources you have for a query X?" and to forward queries on according to this. A big question is whether you're looking for all results on X or a sufficiently suitable result for X.

    Because results are forwarded as elements of the search corpus itself, clients could cache and store query results to be re-accessed locally. Potentially for an extended period of time.


    In thinking through this, one of the clearest conclusions I'm coming to is that a large reason for the existence of a Web search industry itself is lack of standardisation for search data formats such that sites could self-provision. Though trust and anti-fraud aspects are also significant. The fact that many sites would have to be back-engineered to provide search is another factor.

    These may not be tractable challenges, but it's an interesting question to explore.

    ReplyDelete
  7. Filip H.F. Slagter Automating (or semi-automating) portions of the comment pulls might be viable. From trusted sources, say, and with source sanitation.

    NEVER trust remote user inputs.

    (Never trust local user inputs.)

    (Never trust user inputs.)

    (Never trust users.)

    (Never trust inputs.)

    (Never trust.)

    (Never.)

    ReplyDelete
  8. DO ИOꓕ DꓵbΓICⱯꓕE Filip H.F. Slagter

    It's been awhile since I played around on Github but each project has its own wiki. Just click on the wiki button near the top. Github uses its own version of Markdown for creating pages. It is a wiki: if you're not familiar with them, you may find using them difficult to use.
    help.github.com - Basic writing and formatting syntax - User Documentation

    ReplyDelete
  9. DO ИOꓕ DꓵbΓICⱯꓕE Positive: GitLab includes its own wiki (as Shawn H Corey mentioned), comments, and issues-tracking system. Those are available and may be subject to various forms of abuse / creative reapplication.

    Negative: their contents are not directly under Git versioning AFAIAA. So these are intrinsically less portable than your primary code / site.

    I believe they're still largely extractable / archiveable, and re-hostable.


    For Wikis, there are several static-site wiki engines which might be deployed. These lack the features of MediaWiki, the domain leader, but may be suitable for some projects.

    I'm kicking around ideas as to how a ticketing system might be abused as some sort of a scheduling system. It's effectively a planning / progress tool, which is a superset of scheduling for the most part.

    ReplyDelete
  10. My experience with wikis is that they have to be locked down to prevent spam, which tends to defeat the purpose of a wiki...
    Anti-spam measures might work, though I usually end up bashing my head against the spamfilter walls because my code-related edits look too much like spam... Edward Morbius 's plexodus wiki is not the first wiki I've had my contributions refused to by an overzealous anti-spam system...

    ReplyDelete
  11. Filip H.F. Slagter Agreed that a limited access Wiki is generally preferable. For most sites, a grant-on-request policy is reasonably safe.

    Wikipedia can get away with (mostly) open access due to the huge number of eyes on the project. Even there, access is not universal, and there are tiers of control.

    ReplyDelete
  12. Edward Morbius The wikis on Github are under git revision control. You can use them the same way you use the project, that is, use git to remote access them. So maintaining a local copy is as easy as using git.

    And AFAIK, they can only be modify by those who can modify the project, which you have full control over.

    ReplyDelete
  13. DO ИOꓕ DꓵbΓICⱯꓕE GitHub has GitHub pages, for years already, so gitlab is copying them here, it seems

    ReplyDelete
  14. Dima Pasechnik Yes, and generally, GitLab is a Free Software challenger to the now-Microsoft-owned GitHub.

    But there are key differences. Principally, that you can operate and host your own GitLab instance. That is not an option with GitHub.

    ReplyDelete
  15. Edward Morbius well.. there is GitHub Enterprise.

    ReplyDelete
  16. Edward Morbius I think you can self-host github pages. After all, I know I can "host" them on localhost (and I did when I needed to debug github pages sites). It's just open-source Ruby on Rails code...

    ReplyDelete
  17. You don't say, but I presume Git is a paid for choice?

    ReplyDelete
  18. Diana Studer no, git is free open-source software: git-scm.com - Git

    ReplyDelete
  19. Diana Studer No. Git is a free-software version control tool.

    The cost isn't monetary, but in learning the skillset.

    There are several project hosting sites that are based on Git. GitHub and GitLab are two of the largest of these. Both offer free tiers of service. They recoup costs through corporate sales, largely.

    ReplyDelete
  20. The learning curve to become a giter is steep, there are even Jewish folk songs about it:
    "ekh vel zayn a khosidl a giter, a khosidl a getrayer..." ;-)

    ReplyDelete
  21. Dima Pasechnik The Klezmer music will continue until the GitSite improves!

    ReplyDelete

Post a Comment

New comments on this blog are moderated. If you do not have a Google identity, you are welcome to post anonymously. Your comments will appear here after they have been reviewed. Comments with vulgarity will be rejected.

”go"