Our Engineering Blog was launched in June 2020 after a long break of the previous tech blog. This post describes the technical setup behind
You will learn:
- Which static site generator we selected and why.
- What customizations we applied to design the blog and the publishing process.
- How we serve static HTML using Skipper and S3.
Static Site Generator
Our previous tech blog used a CMS which only a limited number of people had access to. The CMS system also lacked a workflow to propose and review drafts. As authors of the Engineering Blog will (mostly) be software engineers, we decided to switch to a git-based workflow and a static site generator.
StaticGen provides a nice overview of many different static site generators. Nearly all of them provide the necessary features to generate a static HTML site from blog posts written in Markdown. So which static site generator to choose?
With the need to customize the blog engine, e.g. with custom templates and features like author titles, the main criteria for the static site generator is to use a familiar programming language for templating and for plugins. The static site generator should generate plain HTML and not contain unnecessary features we won't use. The winner was Pelican:
- Pelican is written in Python. Python is the language the most people are familiar with in Zalando, so it's a safe bet.
- Templates are written in Jinja. Jinja is a popular templating system, it's used in Zalando Open Source and I use it in my own OSS projects.
- Atom/RSS feeds are supported out-of-the-box
- There are many existing plugins and it's easy to write your own in Python.
- It's actively developed. The last git commit was 16 days ago at the time of writing.
Other customizations we did:
- Enable the Atom feed via the
- Generate the sitemap XML with the sitemap plugin.
- Add author titles with the pelican-metadataparsing plugin.
- Minify generated HTML with the pelican-htmlmin plugin.
Additionally to the above, we want to make sure that automatic linting is in place for blog posts:
- Required meta keys must be present, e.g. title, summary, and author names.
- The blog post Markdown file must be in the right year/month folder.
- Article tags should be curated via an explicit allowlist. We want to avoid introducing many unnecessary tags and different tags for the same concept, e.g. "Postgres" vs. "PostgreSQL".
Linting is done via pre-commit which calls a custom Python script to validate blog post Markdown files. The
.pre-commit-config.yaml looks something like this:
minimum_pre_commit_version: 1.21.0 repos: - repo: meta hooks: - id: check-hooks-apply - id: check-useless-excludes - repo: local hooks: - id: validate-content name: Validate blog content language: system # run with poetry to get dependencies (Pelican) entry: poetry run ./validate-content.py types: [markdown] exclude: ^content/pages/.*.md$ - repo: https://github.com/pre-commit/pre-commit-hooks rev: v3.1.0 hooks: - id: check-added-large-files - id: end-of-file-fixer - id: trailing-whitespace - id: mixed-line-ending
Zalando's CI/CD system automatically lints all files by executing
Writing a blog post
Anybody in Zalando can pitch a blog post idea by creating an issue in the git repo:
Bootstrapping a new blog post looks like this:
hjacobs@ZALANDO-123:~/workspace/engineering-blog$ make new poetry run ./scripts/new-post.py This will create a new blog post, please answer a few questions.. Title of blog post: Launching the Engineering Blog Slug [launching-the-engineering-blog]: Date (estimated) of publishing [2020-07-04]: Author names (separate with semicolon) [Henning Jacobs]: Author titles (separate with semicolon) [Senior Principal Engineer]: ======================================== Title: Launching the Engineering Blog Slug: launching-the-engineering-blog Authors: Henning Jacobs Author Titles: Senior Principal Engineer Date: 2020-07-04 URL: /posts/2020/07/launching-the-engineering-blog.html ======================================== Does this look correct? Answer 'y' or 'n': y Creating content/2020/07/launching-the-engineering-blog/2020-07-04-launching-the-engineering-blog.md .. Useful commands: - make devserver Start local webserver, find your draft on http://localhost:8000/drafts/ - make lint Validate content and formatting. Please edit your article in content/2020/07/launching-the-engineering-blog/2020-07-04-launching-the-engineering-blog.md and don't forget to open a PR :-)
Opening a PR to the Engineering Blog repository will trigger a build (
make html) on our Zalando Continuous Delivery Platform. The PR build will publish a preview of the blog under a private (authenticated) URL.
After merging the blog post PR, it will automatically be published on the live site
Serving static HTML
Zalando's Continuous Delivery Platform has a built-in feature to upload files to a given S3 bucket. This feature is used to upload all files from the
output directory (generated by Pelican) to the blog's S3 bucket. The S3 bucket is created via CloudFormation which also configures the S3 website:
AWSTemplateFormatVersion: 2010-09-09 Metadata: StackName: "engineering-blog" Tags: application: "engineering-blog" Resources: S3Bucket: Type: AWS::S3::Bucket Properties: BucketName: "<BUCKET-NAME>" AccessControl: PublicRead WebsiteConfiguration: IndexDocument: index.html ErrorDocument: error.html DeletionPolicy: Retain BucketPolicy: Type: AWS::S3::BucketPolicy Properties: PolicyDocument: # ...
The WebsiteConfiguration property will make the bucket contents available on
http://<BUCKET-NAME>.s3-website.<REGION>.amazonaws.com. The S3 website only provides an HTTP endpoint (no SSL) and not a domain we would want to use publicly.
One way to serve the contents with a custom domain and SSL is to create a CloudFront web distribution. I decided to not use CloudFront as all the required infrastructure for domain+SSL is already in place.
We have Skipper as the Kubernetes Ingress proxy running for all our 140+ Kubernetes clusters. External DNS automatically configures the DNS name and the Kubernetes Ingress Controller for AWS configures the AWS ALB with the right ACM SSL certificate. So let's reuse this infrastructure and let Skipper proxy all requests to the S3 website bucket endpoint. This can be achieved by adding a default Skipper route as Ingress annotation:
apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: name: "engineering-blog" labels: application: "engineering-blog" annotations: zalando.org/skipper-routes: | redirect_app_default: * -> compress() -> setDynamicBackendUrl("http://<BUCKET-NAME>.s3-website.<REGION>.amazonaws.com") -> <dynamic>; spec: rules: - host: "engineering.zalando.com" http: paths: - backend: serviceName: "engineering-blog" servicePort: 80
compress() filter enables
gzip compression as the S3 endpoint does not provide response compression out-of-the-box. The ACM certificate, HTTP/2 support, the S3 website response, and the enabled compression are visible when doing a curl request (output shortened):
$ curl -v --compressed https://engineering.zalando.com -o /dev/null * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * Server certificate: * subject: CN=engineering.zalando.com * subjectAltName: host "engineering.zalando.com" matched cert's "engineering.zalando.com" * issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon * SSL certificate verify ok. > GET / HTTP/2 > Host: engineering.zalando.com > user-agent: curl/7.68.0 > accept: */* > accept-encoding: deflate, gzip, br < HTTP/2 200 < content-type: text/html < content-encoding: deflate < etag: "304fcc9c31aac19255bf1d84669059df" < last-modified: Sat, 27 Jun 2020 07:23:19 GMT < server: AmazonS3 < vary: Accept-Encoding
The static website should be fast. So let's test. We can use Vegeta for some basic HTTP load testing. 60ms as p99 latency looks good:
$ echo "GET https://engineering.zalando.com/" | vegeta attack -duration=60s | vegeta report Requests [total, rate, throughput] 3000, 50.02, 50.00 Duration [total, attack, wait] 59.995s, 59.98s, 15.246ms Latencies [min, mean, 50, 90, 95, 99, max] 12.418ms, 19.751ms, 17.049ms, 25.05ms, 38.382ms, 59.958ms, 244.094ms Bytes In [total, mean] 51441000, 17147.00 Bytes Out [total, mean] 0, 0.00 Success [ratio] 100.00% Status Codes [code:count] 200:3000 Error Set:
The user experience with a real browser is much more interesting. Chrome Lighthouse can be used to assess the page performance. Google's PageSpeed Insights uses Lighthouse for its score calculation. Running PageSpeed Insights for the blog reports a nice score of 100 out of 100 (desktop):
Thanks go out to our Employer Branding colleagues who created the design and implemented the responsive HTML/CSS layout!
I hope this blog post gives you some inspiration for setting up your own blog with Pelican or some other static site generator. After re-launching our Engineering Blog, our main focus will be providing regular and high quality content. We still have to figure out the best way to source, review, and schedule blog posts.