hackdoc:// HSCloud Docs

Using the k1 cluster

This document describes the differences between k0 and the k1 clusters. Familiarity with the cookbook is assumed.

k1 is meant to replace k0 as the main production cluster. After years of failing to bring k0 entirely up-to-date (due to the difficulty and risk of breaking prod), we’re starting from scratch with up-to-date dependencies, new hardware, better tooling, and architectural learnings from k1. Once productionized, we intend to move services from k0, eventually adding k2 as the crash/test cluster, and retiring k0.

Important: k1 is currently work-in-progress. Do not use it for serious production services just yet.

Jsonnet differences

Instead of:

local kube = import "../../../kube/hscloud.libsonnet";

use:

local kube = import "../../../kube/k1.libsonnet";

There are many minor differences in Kubernetes manifests due to a large version difference. For example, Ingress custom rule paths now require a pathType: 'ImplementationSpecific'.

Using jsonnet libraries

When using .libsonnets that are shared between clusters, kube.libsonnet needs to be passed as a parameter, for example:

local kube = import "../../../kube/k1.libsonnet";

// ...

pki: ns.Contain(hspki) {
    kube: kube, // <-- new
    cfg+: {
       // ...
    },
}

Some libraries might also need a cluster parameter now. For example, hspki will take cfg.cluster set to k1.

Cluster tooling

To authenticate, use prodaccess -cluster k1.

Once authenticated, you can see currently selected cluster using kubectl config current-context and switch between them using kubectl config use-context {k0,k1}.hswaw.net.

You can also pass --context {k0,k1}.hswaw.net to all standard kube tooling like kubectl, kubecfg or stern.

DNS

For test Ingress domains, you can use these as equivalents of *.cloud.of-a.cat:

  • *.k8.s-a.cat
  • *.kubernete.s-a.cat
  • *.kartongip.s-a.cat

For your own domains, use CNAME to ingress.k1.hswaw.net.

Block Storage

As of writing this, there’s no long-term storage on k1 yet, sorry!

Temporary workaround: you can use host path volumes

kube.HostPathVolume("/var/lib/yourthing"),

Be careful though, talk to k1 ops beforehand, and be prepared to migrate to ceph later or have this data be destroyed.

Object Storage

For now, use k0’s S3/radosgw as per the cookbook

CockroachDB

For now, use k0’s CockroachDB as per the cookbook

Cross-cluster services

Both clusters are on the same network, so kube services can talk to each other between clusters. You can also use cluster DNS - instead of service.namespace or services.namespace.svc.cluster.local use service.namespace.svc.{k0,k1}.hswaw.net.

TODO: cross-sign hspki certificates so that hspki mTLS authentication works cross-cluster as well

Dual-stack IP

(work in progress)

k1 supports IPv6. To use dual-stack LoadBalancer services, instead of the deprecated:

kube.Service("name") {
    spec+: {
        type: "LoadBalancer",
        loadBalancerIP: "185.236.242.137", // <-- old
    }
}

Use:

kube.Service("name") {
    metadata+: {
        annotations+: {
            "metallb.universe.tf/loadBalancerIPs": "185.236.242.137, 2a0d:eb00:2137::abcd:1", // <-- new
        },
    },
}

User Namespaces

k1 supports user namespaces, a security hardening feature not available on k0. Please set hostUsers: false on all your Pod specs and only opt out if it causes issues. (Note that this will likely be made into an opt-out default value later on.)

Migrating web services to k1 without downtime

It’s possible to move services from k0 to k1 while avoiding downtime due to DNS propagation (ingress address change). It’s a bit involved, so you probably want to ignore this tutorial unless dealing with a high-traffic or otherwise critical service.

Here’s how:

Step 1: Deploy to k1 under a test domain

This is to check that the service will continue to work on new cluster.

// (k1)
cfg:: {
    // ...
    domains: ['mything.k8.s-a.cat'],
}
// ...
ingress: ns.Contain(kube.TLSIngress(cfg.name)) {
    hosts:: cfg.domains,
    target:: top.service,
},

Step 2: Add original domain to k1 deployment

This needs to be a separate step. Since original domain does not yet point to k1, TLS certificates for the new test domain would fail to generate.

// (k1)
cfg:: {
    // ...
    domains: ['mything.k8.s-a.cat', 'mything.hackerspace.pl'],
}

If you kubectl -n mything get pod,ing you should see a cm-acme-http-solver running, trying (for now, unsuccessfully) to obtain a certificate for mything.hackerspace.pl.

Now, run this sanity check:

curl -i --resolve "mything.hackerspace.pl:443:185.236.240.161" "https://mything.hackerspace.pl"

This will try to access the service (which still points to ingress.k0.hswaw.net) by contacting ingress.k1.hswaw.net. You should see something like curl: (60) SSL certificate problem. If it doesn’t fail, you probably messed up the --resolve option.

Step 3: Forward ACME requests from k0 to k1

Back on k0, modify the Ingress so that ACME challenges (TLS certificate verification) on mything.hackerspace.pl get forwarded to k1, like so:

// (k0)
k1IngressProxy: ns.Contain(kube.Service("k1-ingress")) {
    spec: {
        type: 'ExternalName',
        ports: [
            { port: 80, name: 'http', targetPort: 80 }
        ],
        externalName: "ingress.k1.hswaw.net",
    },
},
ingress: ns.Contain(kube.TLSIngress(cfg.name)) {
    hosts:: cfg.domains,
    target:: top.service,
    extraPaths:: [
        { path: '/.well-known/acme-challenge', backend: top.k1IngressProxy.name_port }
    ]
},

After applying, observe cm-acme-http-solver on k1. It should quickly succeed in obtaining the certificate. Now we can serve HTTPS content on this domain on k1.

To be sure, you can check kubectl -n mything get cert to see if the READY status changed to True. You can also re-do the curl check - now, it should successfully return content through k1’s ingress. (This is why we did the sanity check before - to rule out that the curl command is bad and we’re checking k0 all this time. TODO: Come up with a simpler way of reliably verifying this).

Step 4: Forward all traffic to k1

Now we’re ready to switch all traffic from k0’s Deployment to k1, with zero downtime.

To do that, we’ll modify the k0’s Service to point to k1’s Service (using ExternalName and cross-cluster DNS) instead of k0’s Pods:

// (k0)
service: ns.Contain(kube.Service(cfg.name)) {
    // target:: top.deployment, // <-- comment this out
    spec: {
        type: 'ExternalName',
        clusterIP: null,
        ports: [
            { port: 8080, name: 'http', targetPort: 8080 }
        ],
        externalName: "%s.k1.hswaw.net" % top.service.host,
    },
},

Please note: - clusterIP: null is needed when modifying existing Service (otherwise kube apiserver will reject this change) - Make sure that the targetPort is the correct port on which k1’s service serves content - externalName assumes that the k1 version has the same service and namespace names as on k0

If we didn’t make any mistakes, we should get the final result immediately after application. Otherwise, you might see errors 502 or long delay followed by a 509. Most likely, externalName or ports are wrong.

Step 5: Clean up

  • Scale k0 Deployment to 0 replicas
  • Remove test domain from k1’s Ingress
  • Switch DNS from CNAME ingress.k0.hswaw.net to CNAME ingress.k1.hswaw.net

TODO: Can we avoid the ACME ingress shenanigans by just copying certificates&secrets?

TODO: Come up with and write up advice for PVC/database migration

k1 ops and architecture differences

k1 stuff resides at //cluster/k1/. For applying changes, instead of multiple “view” jsonnets, use: kubecfg diff cluster/k1/k1-view.jsonnet -A view=NAME.

We have a NixOS integration test for the k1 cluster. Use it to test cluster/node-level changes before applying to the live cluster. See //cluster/k1/test.nix for details.

Networking: calico-metallb interaction is now simplified. MetalLB more-or-less only serves as a kube operator for IP allocation, but the BGP announcements are handled by Calico now.

Storage: We plan a new Ceph cluster that is managed not in the kube cluster (by Rook), but directly on NixOS nodes.

Dependencies: For cluster-level dependencies (like calico, coredns, cert-manager, etc.), we now try to use vendored yaml manifests as much as possible, applying jsonnet “patches” as needed, instead of recreating them entirely in jsonnet. This is meant to simplify updates.

Wanna help? Talk to k1 ops (radex, informatic, mikedlr, et al) or ask on #hswaw-infra what you can do to help productionize k1 and migrate services to it from k0/boston-packets.