Using the k1 cluster
This document describes the differences between k0 and the k1 clusters. Familiarity with the cookbook is assumed.
k1 is meant to replace k0 as the main production cluster. After years of failing to bring k0 entirely up-to-date (due to the difficulty and risk of breaking prod), we’re starting from scratch with up-to-date dependencies, new hardware, better tooling, and architectural learnings from k1. Once productionized, we intend to move services from k0, eventually adding k2 as the crash/test cluster, and retiring k0.
Important: k1 is currently work-in-progress. Do not use it for serious production services just yet.
Jsonnet differences
Instead of:
local kube = import "../../../kube/hscloud.libsonnet";
use:
local kube = import "../../../kube/k1.libsonnet";
There are many minor differences in Kubernetes manifests due to a large version difference. For example, Ingress
custom rule paths now require a pathType: 'ImplementationSpecific'
.
Using jsonnet libraries
When using .libsonnet
s that are shared between clusters, kube.libsonnet
needs to be passed as a parameter, for example:
local kube = import "../../../kube/k1.libsonnet";
// ...
pki: ns.Contain(hspki) {
kube: kube, // <-- new
cfg+: {
// ...
},
}
Some libraries might also need a cluster parameter now. For example, hspki
will take cfg.cluster
set to k1
.
Cluster tooling
To authenticate, use prodaccess -cluster k1
.
Once authenticated, you can see currently selected cluster using kubectl config current-context
and switch between them using kubectl config use-context {k0,k1}.hswaw.net
.
You can also pass --context {k0,k1}.hswaw.net
to all standard kube tooling like kubectl
, kubecfg
or stern
.
DNS
For test Ingress domains, you can use these as equivalents of *.cloud.of-a.cat
:
*.k8.s-a.cat
*.kubernete.s-a.cat
*.kartongip.s-a.cat
For your own domains, use CNAME to ingress.k1.hswaw.net
.
Block Storage
As of writing this, there’s no long-term storage on k1 yet, sorry!
Temporary workaround: you can use host path volumes
kube.HostPathVolume("/var/lib/yourthing"),
Be careful though, talk to k1 ops beforehand, and be prepared to migrate to ceph later or have this data be destroyed.
Object Storage
For now, use k0’s S3/radosgw as per the cookbook
CockroachDB
For now, use k0’s CockroachDB as per the cookbook
Cross-cluster services
Both clusters are on the same network, so kube services can talk to each other between clusters. You can also use cluster DNS - instead of service.namespace
or services.namespace.svc.cluster.local
use service.namespace.svc.{k0,k1}.hswaw.net
.
TODO: cross-sign hspki certificates so that hspki mTLS authentication works cross-cluster as well
Dual-stack IP
(work in progress)
k1 supports IPv6. To use dual-stack LoadBalancer services, instead of the deprecated:
kube.Service("name") {
spec+: {
type: "LoadBalancer",
loadBalancerIP: "185.236.242.137", // <-- old
}
}
Use:
kube.Service("name") {
metadata+: {
annotations+: {
"metallb.universe.tf/loadBalancerIPs": "185.236.242.137, 2a0d:eb00:2137::abcd:1", // <-- new
},
},
}
User Namespaces
k1 supports user namespaces, a security hardening feature not available on k0. Please set hostUsers: false
on all your Pod specs and only opt out if it causes issues. (Note that this will likely be made into an opt-out default value later on.)
Migrating web services to k1 without downtime
It’s possible to move services from k0 to k1 while avoiding downtime due to DNS propagation (ingress address change). It’s a bit involved, so you probably want to ignore this tutorial unless dealing with a high-traffic or otherwise critical service.
Here’s how:
Step 1: Deploy to k1 under a test domain
This is to check that the service will continue to work on new cluster.
// (k1)
cfg:: {
// ...
domains: ['mything.k8.s-a.cat'],
}
// ...
ingress: ns.Contain(kube.TLSIngress(cfg.name)) {
hosts:: cfg.domains,
target:: top.service,
},
Step 2: Add original domain to k1 deployment
This needs to be a separate step. Since original domain does not yet point to k1, TLS certificates for the new test domain would fail to generate.
// (k1)
cfg:: {
// ...
domains: ['mything.k8.s-a.cat', 'mything.hackerspace.pl'],
}
If you kubectl -n mything get pod,ing
you should see a cm-acme-http-solver
running, trying (for now, unsuccessfully) to obtain a certificate for mything.hackerspace.pl
.
Now, run this sanity check:
curl -i --resolve "mything.hackerspace.pl:443:185.236.240.161" "https://mything.hackerspace.pl"
This will try to access the service (which still points to ingress.k0.hswaw.net
) by contacting ingress.k1.hswaw.net
. You should see something like curl: (60) SSL certificate problem
. If it doesn’t fail, you probably messed up the --resolve
option.
Step 3: Forward ACME requests from k0 to k1
Back on k0, modify the Ingress so that ACME challenges (TLS certificate verification) on mything.hackerspace.pl
get forwarded to k1, like so:
// (k0)
k1IngressProxy: ns.Contain(kube.Service("k1-ingress")) {
spec: {
type: 'ExternalName',
ports: [
{ port: 80, name: 'http', targetPort: 80 }
],
externalName: "ingress.k1.hswaw.net",
},
},
ingress: ns.Contain(kube.TLSIngress(cfg.name)) {
hosts:: cfg.domains,
target:: top.service,
extraPaths:: [
{ path: '/.well-known/acme-challenge', backend: top.k1IngressProxy.name_port }
]
},
After applying, observe cm-acme-http-solver
on k1. It should quickly succeed in obtaining the certificate. Now we can serve HTTPS content on this domain on k1.
To be sure, you can check kubectl -n mything get cert
to see if the READY
status changed to True
. You can also re-do the curl check - now, it should successfully return content through k1’s ingress. (This is why we did the sanity check before - to rule out that the curl command is bad and we’re checking k0 all this time. TODO: Come up with a simpler way of reliably verifying this).
Step 4: Forward all traffic to k1
Now we’re ready to switch all traffic from k0’s Deployment to k1, with zero downtime.
To do that, we’ll modify the k0’s Service to point to k1’s Service (using ExternalName and cross-cluster DNS) instead of k0’s Pods:
// (k0)
service: ns.Contain(kube.Service(cfg.name)) {
// target:: top.deployment, // <-- comment this out
spec: {
type: 'ExternalName',
clusterIP: null,
ports: [
{ port: 8080, name: 'http', targetPort: 8080 }
],
externalName: "%s.k1.hswaw.net" % top.service.host,
},
},
Please note:
- clusterIP: null
is needed when modifying existing Service (otherwise kube apiserver will reject this change)
- Make sure that the targetPort
is the correct port on which k1’s service serves content
- externalName
assumes that the k1 version has the same service and namespace names as on k0
If we didn’t make any mistakes, we should get the final result immediately after application. Otherwise, you might see errors 502 or long delay followed by a 509. Most likely, externalName
or ports are wrong.
Step 5: Clean up
- Scale k0 Deployment to 0 replicas
- Remove test domain from k1’s Ingress
- Switch DNS from
CNAME ingress.k0.hswaw.net
toCNAME ingress.k1.hswaw.net
TODO: Can we avoid the ACME ingress shenanigans by just copying certificates&secrets?
TODO: Come up with and write up advice for PVC/database migration
k1 ops and architecture differences
k1 stuff resides at //cluster/k1/
. For applying changes, instead of multiple “view” jsonnets, use: kubecfg diff cluster/k1/k1-view.jsonnet -A view=NAME
.
We have a NixOS integration test for the k1 cluster. Use it to test cluster/node-level changes before applying to the live cluster. See //cluster/k1/test.nix
for details.
Networking: calico-metallb interaction is now simplified. MetalLB more-or-less only serves as a kube operator for IP allocation, but the BGP announcements are handled by Calico now.
Storage: We plan a new Ceph cluster that is managed not in the kube cluster (by Rook), but directly on NixOS nodes.
Dependencies: For cluster-level dependencies (like calico, coredns, cert-manager, etc.), we now try to use vendored yaml manifests as much as possible, applying jsonnet “patches” as needed, instead of recreating them entirely in jsonnet. This is meant to simplify updates.
Wanna help? Talk to k1 ops (radex, informatic, mikedlr, et al) or ask on #hswaw-infra what you can do to help productionize k1 and migrate services to it from k0/boston-packets.