Show HN: Makesite – A static site generator in 125 lines of Python

vram22 · on March 17, 2018

A while ago I had written a tool called PySiteCreator.

It lets you create simple web sites (I originally thought of it for creating wikis, but later realized it was more general) by writing them purely in Python, using Python function calls to generate various HTML elements.

It uses a make-like approach (file timestamp checking between input and output files) and relies on a simple convention that users have to follow, of defining a Python function called create() in each .py file they write, where each .py file will be used by the tool to generate a corresponding .html file. Other than that, it places no restrictions on the user, and any arbitrary Python code can be used to generate or pull in data from anywhere, to be included in the web pages it creates.

Blog post about PySiteCreator here:

https://jugad2.blogspot.in/2009/11/early-release-of-pysitecr...

Source code here:

https://bitbucket.org/vasudevram/pysitecreator

charlesdaniels · on March 17, 2018

Personally, I used Jekyll for a while before switching to a simple Makefile based approach. I write everything in Markdown, which gets compiled via Pandoc, concatenated with a header and footer. I also recently started using pygments to generate colorized inline HTML to syntax highlight code blocks.

If anyone is interested, I wrote up the process[1], although I have not updated the post with the pygments script yet.

[1] - http://cdaniels.net//2017-11-22_make-static-site.html

sigil · on March 17, 2018

Fellow Jekyll refugee here. I also switched to the Makefile SSG approach for the much faster incremental rebuilds. Parallel speedup with make -j is also nice.

My SSG is called tinysite [1] [2]. Templates are jinja2. Posts are written in markdown with Pygments highlighting for fenced code blocks. Posts also have JSON frontmatter which can #include other JSON data files, and this is where it gets Make-y: all affected pages get rebuilt when a data file changes with the help of a `gcc -M`-style file dependency scanner. I use these frontmatter "data includes" for site wide metadata and for data driven pages like post indexes. There's also a dev server so you can quickly preview changes.

The biggest frustration I have with the Makefile + interpreted language approach is the slow startup time of interpreters these days. When every page requires a new interpreter process it really starts to add up, if not in incremental builds, then definitely in full rebuilds of larger sites. Some quick tests I did recently of `time $INTERPRETER -[c|e] ""`:

  node 6   ... 70ms
  ruby 2   ... 58ms
  python 2 ... 27ms
  perl 5   ...  5ms
  sh       ...  3ms

Perl compares quite favorably here, but I just can't bring myself to go back to it. I wish these other interpreters would get their startup time act together! I guess this is an argument for go / Hugo?

[1] https://github.com/acg/tinysite

[2] https://github.com/acg/alangrow.com

charlesdaniels · on March 17, 2018

This is very impressive, and definitely more elegant than my approach. Getting make -j for free was definitely a major plus; before I started using a Python script to syntax highlight inline code, my site built in under a second... now it takes around 3.

I am honestly surprised to find Python is so slow in terms of start up time. This made me wonder if it was generating / searching for .pyc files, but a quick test revealed this is not the case:

  sh-4.3$ ls
  nop.py	nop.sh
  sh-4.3$ hexdump -C nop.py
  sh-4.3$ hexdump -C nop.sh
  sh-4.3$ time python3 nop.py

  real	0m0.064s
  user	0m0.048s
  sys	0m0.016s
  sh-4.3$ time sh nop.sh

  real	0m0.002s
  user	0m0.002s
  sys	0m0.000s
  sh-4.3$ python3 -m compileall .
  Listing '.'...
  Compiling './nop.py'...
  sh-4.3$ time python3 nop.py

  real	0m0.062s
  user	0m0.057s
  sys	0m0.004s

  sh-4.3$ python3 __pycache__/nop.cpython-35.pyc 
  sh-4.3$ time python3 __pycache__/nop.cpython-35.pyc

  real	0m0.060s
  user	0m0.048s
  sys	0m0.012s

I would add that shell can be surprisingly fast for certain tasks when used correctly. It's very easy to write slow (ba)sh code though, so most shell scripts wind up being fairly non-performent.

Edit: formatting.

dfox · on March 17, 2018

My (nowadays totally non-updated) website is built on similar approach with make+python (but with xml+xslt, which was the technology of the day in mid-00s) and the CPython's startup time was always issue even to the point that the incremental rebuilds essentially didn't make much sense (but writing the machinery was at least a fun experiment).

bArray · on March 17, 2018

I do something similar, except with a `bash` script and hand writing the titles + dates. I thought about automating that, but work in different time zones so prefer to choose.

One thing I might have of interest to you is a small DNT (DoNotTrack) JS file [1] and some CSS that adds some visual markdown features to the page after the fact [2].

P.S. Your output looks like it's missing `<HTML>` and other tags? And how are you getting automated code colour highlighting?

[1] http://coffeespace.org.uk/dnt.js

[2] http://coffeespace.org.uk/style.css

charlesdaniels · on March 17, 2018

Since I may not have time to update my article for a while, here is a link to the script[1].

I am unclear on your dnt script? What does it do? I don't have any facilities on my site to track users at all as it stands.

I will probably be mooching parts of your CSS though, thanks for posting it here!

Edit: you are correct, I seem to have forgotten to include <HTML> tags... I should probably fix that.

[1] https://gist.github.com/charlesdaniels/ac5fa6e3e77aef5ff5c85...

bArray · on March 18, 2018

>Since I may not have time to update my article for a while, here is a link to the script[1].

Thank you. I was hoping you had figured out how to get it going in `pandoc`, tried it myself to no avail but just assumed I had done something wrong.

>I am unclear on your dnt script? What does it do? I don't have any facilities on my site to track users at all as it stands.

No, but I can imagine it useful if you did want to put a relevant YouTube video or Tweet embedded in, but wanted to respect DNT. It's worked pretty well so far.

>I will probably be mooching parts of your CSS though, thanks for posting it here!

No problem! Some parts to make the font size a little better for mobiles too at the top :)

>Edit: you are correct, I seem to have forgotten to include <HTML> tags... I should probably fix that.

Yeah, not a massive deal but who knows, could break something in the future!

mwambua · on March 17, 2018

This is cool, and if I had the time I'd want to do something like this for automatically generating a google-photos style site from my Pictures directory.

Also, if you're looking for a stable Python-based static site generator... I've been pretty happy with Pelican (https://blog.getpelican.com/).

edwinksl · on March 17, 2018

Agree, Pelican is easy to use and configure using Python.

dfee · on March 17, 2018

Practically speaking, flask + jinja2 + wget [0] seems like a much more robust and approachable solution.

[0] http://www.dheinemann.com/2011/archiving-with-wget/

almata · on March 17, 2018

After using several different blogging platforms/tools, I finally decided to use instead a GitHub repo for my notes. Something really simple that lets me 1) create a note in plain Markdown, and 2) run a publi.sh script. That's all. After that, the note is already in my GitHub repo ready to be ctrl-f'ed when I'm looking for something I know I documented. In case anyone is interested I used this tool [0], but it was first time in my life creating a Bash script, so fairly assume the code is, well... you know :)

[0]: https://github.com/almata/BlogGit/blob/master/README.md

thisacctforreal · on March 17, 2018

I'd recommend shellcheck[1] to help avoid common pitfalls with bash/sh, particularly in regards to word splitting.

[1] https://www.shellcheck.net/

almata · on March 17, 2018

Many thanks, I'll do.

Animats · on March 18, 2018

Maybe browsers should just understand Markdown. Cut out the middleman.

adamisntdead · on March 17, 2018

Writing static site generators is my favourite way of learning a language. Usually covers all the basics, libraries, FS, error checking and so on

pvinis · on March 18, 2018

Do you have an good example for this, yours or someone else's? I've never thought of that as a learning-a-language project.

adventured · on March 18, 2018

Conveniently there seems to be a dozen or more static site generators written for every popular language, often with varying levels of complexity to learn from.

For example, Go: https://gohugo.io/

List: https://github.com/myles/awesome-static-generators

nkantar · on March 20, 2018

Additional resource: https://www.staticgen.com/

gabrielcsapo · on March 18, 2018

Shameless plug, I have been a little upset about how convoluted the static site offerings have been. So I have been building https://www.gabrielcsapo.com/sweeney/ in my free time. It needs a ton of work, but trying to keep it compact and 100% tested. The main source of inspiration is making static sites configurable enough for people to stop using Wordpress and making their clients lives incredible difficult. Really cool project @makesite!

Philipp__ · on March 17, 2018

Hugo is still most convenient, at least for me and my workflow. I just pull it with homebrew, created my theme/template, write blogposts in Emacs org-mode (using ox-hugo), host it on GH pages for free, and use custom domain and free tier cloudflare.

I am really enjoying bare essentials I got, very streamlined process that does the job for me. This looks interesting, I really like minimalist aspects of software, but what ain't broken don't fix it.

devposter · on March 17, 2018

Hugo is nice and extremely fast. But one thing that I find unintuitive in Hugo is how the layout files are arranged. I can never remember them without referring to the documentation or the source files of an existing Hugo site everytime I need to create a new Hugo site.

For example, the base layout template needs to go to themes/<THEME>/layouts/_default/base.html but the layout for a blog needs to go to themes/<theme>/layouts/<TYPE>/single.html. Then there is list type layout too to define the blog index pages. Is the home page a single page or a list page? Can the entire home page be defined as a base template? It gets confusing.

Then the whole {{ define <BLOCK> }} and {{ block <BLOCK> }} syntax to embed one template in another is quite unintuitive as well. I think Jekyll has much more sane layout that is easy to keep in head. Also in Jekyll one can define list pages without any special naming convention just by using its templating for-loops. I find Hugo less intuitive than others but the fact that Hugo does not require me to learn Ruby is a win.

Custom written shell commands or Python code or even plain SSI includes are a great way to host static content too.

paulgb · on March 17, 2018

Another free solution that may involve even less setup is Netlify's free tier, which includes 1-click HTTPS setup on a custom domain and reruns Hugo every time you push to the repo.

chiefalchemist · on March 17, 2018

Moi? I love the idea of minimal. KISS is a wonderful thing.

That said, anything more and a handful of pages (and especially something blog-y) needs search. Else the UX could take on too much friction.

A couple weeks ago I saw something about a tool Facebook OS'ed for doing project documentation. I believe that was static and had search. I think. Unfortunately, I didn't go so far as to see if you could fake a (not for docs) website with it.

seanwilson · on March 17, 2018

> That said, anything more and a handful of pages (and especially something blog-y) needs search. Else the UX could take on too much friction.

I've added search functionality to static sites before and the various JavaScript libraries available for this like lunrjs.com are super fast and easy to integrate. You can create a static search index at build time from a JSON file of your core content (probably a few 100KB compressed for many sites) and the search as you type functionality is instant which can be a better UX than other solutions.

aoeusnth1 · on March 17, 2018

Hey, thanks for pointing this out. I didn’t even know to go looking for something like this, but it will be perfect for a site I have.

chiefalchemist · on March 18, 2018

Yea. Thanks.

IanCal · on March 17, 2018

There's lunr.js, this collection of bits of docs might be useful to look at about search in a Hugo site : https://discourse.gohugo.io/t/how-to-add-lunr-js-to-your-sit...

kaushalmodi · on March 18, 2018

I'm not very (I mean, at all) conversant with Javascript. So stuff like Gulp, Grunt, etc is outside my comfort zone. But one fine day, someone posts step-by-step tutorial[1] on how to set up static site search using Fuse.js on Hugo sites without Gulp, Grunt, etc., and I am finally able to set up search[2] on my Hugo generated blog in no time.

[1]: https://discourse.gohugo.io/t/site-search-implemented-using-... [2]: https://scripter.co/search/

citilife · on March 17, 2018

Alright - I like your website layout: https://defphil.com/

But to be fair, some people may want different features than the bare minimum. Also, even if Hugo is the most convenient as an org-mode user, I highly doubt it'd be the most convenient for others (who don't use org-mode lol).

Also, you should write more posts!

kaushalmodi · on March 18, 2018

The Hugo generated site features can vary with the theme used (I believe there are 200+ options), and the features can get as detailed as a user wants if they just template their own site.

Ox-hugo that the OP mentioned is just an Emacs package that exports Org to Markdown + front-matter (without having to manually specify many of be front matter data in the Org source.. most of it is auto derived.).

Philipp__ · on March 18, 2018

I still haven’t wrote any posts, I just finished website few days ago. But thanks for stopping by. :)

kaushalmodi · on March 18, 2018

Hey! Ox-hugo dev here. Do you have your site source (Org source) public? I am just collecting Ox-hugo Org source repos, that I plan to publish on the ox-hugo doc site (of source, that won't happen if you don't wish, even if your source is public).

For folks, if interested in seeing an example of the Org source, and ox-hugo + Hugo generated site, here's the source[1] of my blog[2].

More: https://ox-hugo.scripter.co/#demo

[1]: https://gitlab.com/kaushalmodi/kaushalmodi.gitlab.io/raw/mas... [2]: https://scripter.co

mojoe · on March 17, 2018

I use Jinja2 to template all my static sites -- it helps me host compellingsciencefiction.com straight from AWS S3 very, very cheaply.

est · on March 17, 2018

generate static site with 1 line of shell command

    echo "<html>" > index.html

secura · on March 17, 2018

Is this a valid HTML? https://validator.w3.org/ seems to require this as minimal HTML to validate successfully.

    <!DOCTYPE html><html><title>0</title>

I would thus fix your 1 line shell static site generator like this.

    echo '<!DOCTYPE html><html><title>0</title>' > index.html

butz · on March 17, 2018

You can trim <html> tag and still get valid html: <!DOCTYPE html><title>0</title>

cryptoz · on March 17, 2018

It's still a site even if it doesn't validate. HN (news.ycombinator.com) generates more than 100 errors on that validator. Doesn't mean HN isn't a site.

rambojazz · on March 17, 2018

You can even go more minimalist. An empty file is a site too.

52-6F-62 · on March 17, 2018

Still takes one line, but it would be smaller in size

    touch i

e12e · on March 17, 2018

No need to use such a bloated tool as "touch" to create a site. Your shell can do it:

  > index.html

And there's ofcourse:

  <<eof > index.html
  <html>
  <p>hello, world! from
  $(whoami)</p>
  </html>
  eof

See also: m4.

ariofrio · on March 17, 2018

    > index.html

You win.

52-6F-62 · on March 17, 2018

Agreed. I concede!

petercooper · on March 18, 2018

    >index.htm

Two characters shorter and will work fine on most HTTP servers ;-)

52-6F-62 · on March 18, 2018

Now you're just being pedantic, Peter!

petercooper · on March 18, 2018

Yes, but it looked like this was turning into a golfing thread ;-)

noobermin · on March 17, 2018

Snark aside, does it matter if most of the "site" is already in templates? Why shouldn't the templates be included in the metric?

foo101 · on March 17, 2018

This seems to use the commonmark library to render Markdown? Can it render Pandoc flavour of Markdown?

How widespread commonmark really is? Any popular sites using it? If I write my blog posts in commonmark is it safe to assume site generation tools 10 years from now will correctly render commonmark?

reificator · on March 17, 2018

The main competitor to CommonMark is GFM, which is now based on CommonMark.

https://githubengineering.com/a-formal-spec-for-github-markd...

vram22 · on March 17, 2018

I had checked out CommonMark a bit and blogged about it here, including an example of using it, and a bit about their goals:

CommonMark, a pure Python Markdown parser and renderer:

https://jugad2.blogspot.in/2014/09/commonmark-pure-python-ma...

One interesting point from them was this:

[ Reddit user bracewel, who seems to be a CommonMark team member, said on the Py Reddit thread:

eventually we'd like to add a few more renderers, PDF/RTF being the first.... ]

reificator · on March 18, 2018

Interesting, I have a little sideproject at work that would really benefit from markdown -> rtf.

Avamander · on March 18, 2018

I absolutely love Pelican with M.CSS[1], automatically deployed by my already existing TeamCity instance that triggers a rebuild on git repository changes.

[1] http://mcss.mosra.cz

whalesalad · on March 17, 2018

Can’t speak to the usefulness of the code but really admire the style. Very clean, concise and lots of composition. Love it.

CydeWeys · on March 18, 2018

I did something similar in C++ way back in high school. Even called it makeSite too: http://www.cydeweys.com/archive/makeSite3.cpp

Funny how they're fundamentally not that different.

jitans · on March 17, 2018

how stupid is to measure code in terms of lines?

Make all that a library then you can claim:

A static site generator in only 3 lines of Python

Measuring lines of codes is completely wrong. See the Scala trend were people race to find the shortest way to express something generating hard to understand code. 1) Code has to be written to be maintenable. 2) Code has to be written to be read by your coworkers. 3) The bottleneck while coding is not the keyboard.

z3t4 · on March 17, 2018

Unless you have put effort into squeezing as much logic as possible into each line, the less code the easier it is to maintain. With more code, more stuff can go wrong. It's not just because we are bad at writing code, we have probability against us. If the bug average is 1 bug per 100 LOC then 100 LOC will have less bugs then 10k LOC. We also have physics against us, reading and comprehending 10k LOC of code will take much longer then reading 100 LOC. With that said I do agree with you that LOC don't say much about anything. It's a stupid metric. With higher level languages, a 100 LOC file might actually be millions LOC if you include all the libraries. Them being libraries means they are mostly decoupled, which makes things better, because coupling is the root of most complexity.

jitans · on March 17, 2018

That's not the point. Trying to squeeze the code in less lines as much possible is not the way to go if you loose clarity. Python coders keep repeting the mantra "less line of code" but in the mean time they keep typing: self. self. self.

make3 · on March 17, 2018

Please don't use the number of lines as a metric for simplicity, it really doesn't mean anything

zeep · on March 18, 2018

For two programs that do the same thing coded in the same language, I would pick the one with the least amount of lines but that doesn't mean that less work went into it... Probably the opposite

jitans · on March 17, 2018

As you can see we are the only one thinking this is absurd :D