#kubernetes etcd backup and restore
Explore tagged Tumblr posts
codeonedigest · 2 years ago
Text
Youtube Short - Kubernetes Cluster Master Worker Node Architecture Tutorial for Beginners | Kubernetes ETCD Explained
Hi, a new #video on #kubernetes #cluster #architecture #workernode #masternode is published on #codeonedigest #youtube channel. Learn kubernetes #cluster #etcd #controllermanager #apiserver #kubectl #docker #proxyserver #programming #coding with
Kubernetes is a popular open-source platform for container orchestration. Kubernetes follows client-server architecture and Kubernetes cluster consists of one master node with set of worker nodes.  Let’s understand the key components of master node. etcd is a configuration database stores configuration data for the worker nodes. API Server to perform operation on cluster using api…
Tumblr media
View On WordPress
0 notes
dockerdummy · 2 years ago
Text
Kubernetes etcd backup and restore - cheat sheet
Kubernetes etcd backup and restore – cheat sheet
This is a cheat sheet on how to perform backup&restore of the etcd server in kubernetes quickly. Test this on Killercoda Play with Kubernetes tl;dr Find reference: https://kubernetes.io –> Documentation –> Search “etcd backup restore” –> you will find: Operating etcd clusters for Kubernetes | Kubernetes # get params cat /var/lib/kubelet/config.yaml | grep static cat…
View On WordPress
1 note · View note
computingpostcom · 2 years ago
Text
In recent years, the popularity of Kubernetes and its ecosystem has immensely increased due to its ability to its behavior, ability to design patterns, and workload types. Kubernetes also known as k8s, is an open-source software used to orchestrate system deployments, scale, and manage containerized applications across a server farm. This is achieved by distributing the workload across a cluster of servers. Furthermore, it works continuously to maintain the desired state of container applications, allocating storage and persistent volumes e.t.c. The cluster of servers in Kubernetes has two types of nodes: Control plane: it is used to make the decision about the cluster(includes scheduling e.t.c) and also to detect and respond to cluster events such as starting up a new pod. It consists of several other components such as: kube-apiserver: it is used to expose the Kubernetes API etcd: it stores the cluster data kube-scheduler: it watches for the newly created Pods with no assigned node, and selects a node for them to run on. Worker nodes: they are used to run the containerized workloads. They host the pods that er the basic components of an application. A cluster must consist of at least one worker node. The smallest deployable unit in Kubernetes is known as a pods. A pod may be made up of one or many containers, each with its own configurations. There are 3 different resources provided when deploying pods in Kubernetes: Deployments: this is the most used and easiest resource to deploy. They are usually used for stateless applications. However, the application can be made stateful by attaching a persistent volume to it. StatefulSets: this resource is used to manage the deployment and scale a set of Pods. It provides the guarantee about ordering and uniqueness of these Pods. DaemonSets: it ensures all the pod runs on all the nodes of the cluster. In case a node is added/removed from the cluster, DaemonSet automatically adds or removes the pod. There are several methods to deploy a Kubernetes Cluster on Linux. This includes using tools such as Minikube, Kubeadm, Kubernetes on AWS (Kube-AWS), Amazon EKS e.t.c. In this guide, we will learn how to deploy a k0s Kubernetes Cluster on Rocky Linux 9 using k0sctl What is k0s? K0s is an open-source, simple, solid, and certified Kubernetes distribution that can be deployed on any infrastructure. It offers the simplest way with all the features required to set up a Kubernetes cluster. Due to its design and flexibility, it can be used on bare metal, cloud, Edge and IoT. K0s exists as a single binary with no dependencies aside from the host OS kernel required. This reduces the complexity and time involved when setting up a Kubernetes cluster. The other features associated with k0s are: It is certified and 100% upstream Kubernetes It has multiple installation methods such as single-node, multi-node, airgap and Docker. It offers automatic lifecycle management with k0sctl where you can upgrade, backup and restore. Flexible deployment options with control plane isolation as default It offers scalability from a single node to large, high-available clusters. Supports a variety of datastore backends. etcd is the default for multi-node clusters, SQLite for single node clusters, MySQL, and PostgreSQL can be used as well. Supports x86-64, ARM64 and ARMv7 It Includes Konnectivity service, CoreDNS and Metrics Server Minimum CPU requirements (1 vCPU, 1 GB RAM) k0sctl is a command-line tool used for bootstrapping and managing k0s clusters. Normally, it connects to the hosts using SSH and collects information about them. The information gathered is then used to create a cluster by configuring the hosts, deploying k0s, and then connecting them together. The below image can be used to demonstrate how k0sctl works Using k0sctl is the recommended way to create a k0s cluster for production. Since you can create multi-node clusters in an easy and automatic manner.
Now let’s dive in! Environment Setup For this guide, we will have the 4 Rocky Linux 9 servers configured as shown: Role Hostname IP Address Workspace workspace 192.168.204.12 Control plane master.computingpost.com 192.168.205.16 Worker Node worker1.computingpost.com 192.168.205.17 Worker Node worker2.computingpost.com 192.168.205.18 The other Rocky Linux 9 server is my working space on which I will install k0sctl and run the cluster on the above nodes Once the hostnames have been set, edit /etc/hosts on the Workspace as shown: $ sudo vi /etc/hosts 192.168.205.16 master.computingpost.com master 192.168.205.17 worker1.computingpost.com worker1 192.168.205.18 worker2.computingpost.com worker2 Since k0sctl uses SSH to access the hosts, we will generate SSH keys on the Workspace as shown: $ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/rocky9/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/rocky9/.ssh/id_rsa Your public key has been saved in /home/rocky9/.ssh/id_rsa.pub The key fingerprint is: SHA256:wk0LRhNDWM1PA2pm9RZ1EDFdx9ZXvhh4PB99mrJypeU rocky9@workspace The key's randomart image is: +---[RSA 3072]----+ | +B+o...*=.o*| | .. =o.o.oo..B| | B .ooo = o=| | * + o. . =o+| | o S ..=o | | . B | | . + E | | o | | | +----[SHA256]-----+ Ensure root login is permitted on the 3 nodes by editing /etc/ssh/sshd_config as below # Authentication: PermitRootLogin yes Save the file and restart the SSH service: sudo systemctl restart sshd Copy the keys to the 3 nodes. ssh-copy-id root@master ssh-copy-id root@worker1 ssh-copy-id root@worker2 Once copied, verify if you can log in to any of the nodes without a password: $ ssh root@worker1 Activate the web console with: systemctl enable --now cockpit.socket Last login: Sat Aug 20 11:38:29 2022 [root@worker1 ~]# exit Step 1 – Install the k0sctl tool on Rocky Linux 9 The k0sctl tool can be installed on the Rocky Linux 9 Workspace by downloading the file from the GitHub release page. You can also use wget to pull the archive. First, obtain the latest version tag: VER=$(curl -s https://api.github.com/repos/k0sproject/k0sctl/releases/latest|grep tag_name | cut -d '"' -f 4) echo $VER Now download the latest file for your system: ### For 64-bit ### wget https://github.com/k0sproject/k0sctl/releases/download/$VER/k0sctl-linux-x64 -O k0sctl ###For ARM ### wget https://github.com/k0sproject/k0sctl/releases/download/$VER/k0sctl-linux-arm -O k0sctl Once the file has been downloaded, make it executable and copy it to your PATH: chmod +x k0sctl sudo cp -r k0sctl /usr/local/bin/ /bin Verify the installation: $ k0sctl version version: v0.13.2 commit: 7116025 To enable shell completions, use the commands: ### Bash ### sudo sh -c 'k0sctl completion >/etc/bash_completion.d/k0sctl' ### Zsh ### sudo sh -c 'k0sctl completion > /usr/local/share/zsh/site-functions/_k0sctl' ### Fish ### k0sctl completion > ~/.config/fish/completions/k0sctl.fish Step 2 – Configure the k0s Kubernetes Cluster We will create a configuration file for the cluster. To generate the default configuration, we will use the command: k0sctl init > k0sctl.yaml Now modify the generated config file to work for your environment: vim k0sctl.yaml Update the config file as shown: apiVersion: k0sctl.k0sproject.io/v1beta1 kind: Cluster metadata: name: k0s-cluster spec: hosts: - ssh: address: master.computingpost.com user: root port: 22 keyPath: /home/$USER/.ssh/id_rsa role: controller - ssh: address: worker1.computingpost.com user: root port: 22 keyPath: /home/$USER/.ssh/id_rsa role: worker - ssh: address: worker2.computingpost.com
user: root port: 22 keyPath: /home/$USER/.ssh/id_rsa role: worker k0s: dynamicConfig: false We have a configuration file with 1 control plane and 2 worker nodes. It is also possible to have a single node deployment where you have a single server to act as a control plane and worker node as well: For that case, you will a configuration file appear as shown: apiVersion: k0sctl.k0sproject.io/v1beta1 kind: Cluster metadata: name: k0s-cluster spec: hosts: - ssh: address: IP_Address user: root port: 22 keyPath: /home/$USER/.ssh/id_rsa role: controller+worker k0s: dynamicConfig: false Step 3 – Create the k0s Kubernetes Cluster on Rocky Linux 9 using k0sctl Once the configuration has been made, you can start the cluster by applying the configuration file: First, allow the service through the firewall on the control plane sudo firewall-cmd --add-port=6443/tcp --permanent sudo firewall-cmd --reload Now apply the config k0sctl apply --config k0sctl.yaml Sample Output: ⠀⣿⣿⡇⠀⠀⢀⣴⣾⣿⠟⠁⢸⣿⣿⣿⣿⣿⣿⣿⡿⠛⠁⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀█████████ █████████ ███ ⠀⣿⣿⡇⣠⣶⣿⡿⠋⠀⠀⠀⢸⣿⡇⠀⠀⠀⣠⠀⠀⢀⣠⡆⢸⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀███ ███ ███ ⠀⣿⣿⣿⣿⣟⠋⠀⠀⠀⠀⠀⢸⣿⡇⠀⢰⣾⣿��⠀⣿⣿⡇⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀███ ███ ███ ⠀⣿⣿⡏⠻⣿⣷⣤⡀⠀⠀⠀⠸⠛⠁⠀⠸⠋⠁⠀⠀⣿⣿⡇⠈⠉⠉⠉⠉⠉⠉⠉⠉⢹⣿⣿⠀███ ███ ███ ⠀⣿⣿⡇⠀⠀⠙⢿⣿⣦⣀⠀⠀⠀⣠⣶⣶⣶⣶⣶⣶⣿⣿⡇⢰⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⠀█████████ ███ ██████████ k0sctl v0.13.2 Copyright 2021, k0sctl authors. Anonymized telemetry of usage will be sent to the authors. By continuing to use k0sctl you agree to these terms: https://k0sproject.io/licenses/eula INFO ==> Running phase: Connect to hosts INFO [ssh] master:22: connected INFO [ssh] worker1:22: connected INFO [ssh] worker2:22: connected INFO ==> Running phase: Detect host operating systems INFO [ssh] master:22: is running Rocky Linux 9.0 (Blue Onyx) INFO [ssh] worker1:22: is running Rocky Linux 9.0 (Blue Onyx) INFO [ssh] worker2:22: is running Rocky Linux 9.0 (Blue Onyx) INFO ==> Running phase: Acquire exclusive host lock INFO ==> Running phase: Prepare hosts INFO ==> Running phase: Gather host facts ......... INFO [ssh] worker2:22: validating api connection to https://192.168.205.16:6443 INFO [ssh] master:22: generating token INFO [ssh] worker1:22: writing join token INFO [ssh] worker2:22: writing join token INFO [ssh] worker1:22: installing k0s worker INFO [ssh] worker2:22: installing k0s worker INFO [ssh] worker1:22: starting service INFO [ssh] worker2:22: starting service INFO [ssh] worker1:22: waiting for node to become ready INFO [ssh] worker2:22: waiting for node to become ready Once complete, you will see this: Install kubectl You may need to install kubectl on the workspace to help you manage the cluster with ease. Download the binary file and install it with the command: curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl" chmod +x kubectl sudo mv kubectl /usr/local/bin/ /bin Verify the installation: $ kubectl version --client Client Version: version.InfoMajor:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:54:23Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"linux/amd64" Kustomize Version: v4.5.4 To be able to access the cluster with kubectl, you need to get the kubeconfig file and set the environment. k0sctl kubeconfig > kubeconfig export KUBECONFIG=$PWD/kubeconfig Now get the nodes in the cluster: $ kubectl get nodes NAME STATUS ROLES AGE VERSION worker1.computingpost.com Ready 7m59s v1.24.3+k0s worker2.computingpost.com Ready 7m59s v1.24.3+k0s The above command will only list the worker nodes. This is because K0s ensures that the controllers and workers are isolated.
Get all the pods running: $ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-88b745646-djcjh 1/1 Running 0 11m kube-system coredns-88b745646-v9vfn 1/1 Running 0 9m34s kube-system konnectivity-agent-8bm85 1/1 Running 0 9m36s kube-system konnectivity-agent-tsllr 1/1 Running 0 9m37s kube-system kube-proxy-cdvjv 1/1 Running 0 9m37s kube-system kube-proxy-n6ncx 1/1 Running 0 9m37s kube-system kube-router-fhm65 1/1 Running 0 9m37s kube-system kube-router-v5srj 1/1 Running 0 9m36s kube-system metrics-server-7d7c4887f4-gv94g 0/1 Running 0 10m Step 4 – Advanced K0sctl File Configurations Once a cluster has been deployed, the default configuration file for the cluster is created. To view the file, access the file, use the command below on the control plane: # k0s default-config > /etc/k0s/k0s.yaml The file looks as shown: # cat /etc/k0s/k0s.yaml # generated-by-k0sctl 2022-08-20T11:57:29+02:00 apiVersion: k0s.k0sproject.io/v1beta1 kind: ClusterConfig metadata: creationTimestamp: null name: k0s spec: api: address: 192.168.205.16 k0sApiPort: 9443 port: 6443 sans: - 192.168.205.16 - fe80::e4f8:8ff:fede:e1a5 - master - 127.0.0.1 tunneledNetworkingMode: false controllerManager: extensions: helm: charts: null repositories: null storage: create_default_storage_class: false type: external_storage images: calico: cni: image: docker.io/calico/cni version: v3.23.3 kubecontrollers: image: docker.io/calico/kube-controllers version: v3.23.3 node: image: docker.io/calico/node version: v3.23.3 coredns: image: k8s.gcr.io/coredns/coredns version: v1.7.0 default_pull_policy: IfNotPresent konnectivity: image: quay.io/k0sproject/apiserver-network-proxy-agent version: 0.0.32-k0s1 kubeproxy: image: k8s.gcr.io/kube-proxy version: v1.24.3 kuberouter: cni: image: docker.io/cloudnativelabs/kube-router version: v1.4.0 cniInstaller: image: quay.io/k0sproject/cni-node version: 1.1.1-k0s.0 metricsserver: image: k8s.gcr.io/metrics-server/metrics-server version: v0.5.2 pushgateway: image: quay.io/k0sproject/pushgateway-ttl version: edge@sha256:7031f6bf6c957e2fdb496161fe3bea0a5bde3de800deeba7b2155187196ecbd9 installConfig: users: etcdUser: etcd kineUser: kube-apiserver konnectivityUser: konnectivity-server kubeAPIserverUser: kube-apiserver kubeSchedulerUser: kube-scheduler konnectivity: adminPort: 8133 agentPort: 8132 network: calico: null clusterDomain: cluster.local dualStack: kubeProxy: mode: iptables kuberouter: autoMTU: true mtu: 0 peerRouterASNs: "" peerRouterIPs: "" podCIDR: 10.244.0.0/16 provider: kuberouter serviceCIDR: 10.96.0.0/12 podSecurityPolicy: defaultPolicy: 00-k0s-privileged scheduler: storage: etcd: externalCluster: null peerAddress: 192.168.205.16 type: etcd telemetry: enabled: true status: You can modify the file as desired and then apply the changes made with the command: sudo k0s install controller -c The file can be modified if the cluster is running. But for the changes to apply, restart the cluster with the command: sudo k0s stop sudo k0s start Configure Cloud Providers The K0s-managed Kubernetes doesn’t include the built-in cloud provider service. You need to manually configure and add its support. There are two ways of doing this:
Using K0s Cloud Provider K0s provides its own lightweight cloud provider that can be used to assign static external IP to expose the worker nodes. This can be done using either of the commands: #worker sudo k0s worker --enable-cloud-provider=true #controller sudo k0s controller --enable-k0s-cloud-provider=true After this, you can add the IPv4 and IPv6 static node IPs: kubectl annonate node k0sproject.io/node-ip-external= Using Built-in Cloud Manifest Manifests allow one to run the cluster with preferred extensions. Normally, the controller reads the manifests from /var/lib/k0s/manifests This can be verified from the control node: $ ls -l /var/lib/k0s/ total 12 drwxr-xr-x. 2 root root 120 Aug 20 11:57 bin drwx------. 3 etcd root 20 Aug 20 11:57 etcd -rw-r--r--. 1 root root 241 Aug 20 11:57 konnectivity.conf drwxr-xr-x. 15 root root 4096 Aug 20 11:57 manifests drwxr-x--x. 3 root root 4096 Aug 20 11:57 pki With this option, you need to create a manifest with the below syntax: --- apiVersion: v1 kind: ServiceAccount metadata: name: cloud-controller-manager namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:cloud-controller-manager roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: cloud-controller-manager namespace: kube-system --- apiVersion: apps/v1 kind: DaemonSet metadata: labels: k8s-app: cloud-controller-manager name: cloud-controller-manager namespace: kube-system spec: selector: matchLabels: k8s-app: cloud-controller-manager template: metadata: labels: k8s-app: cloud-controller-manager spec: serviceAccountName: cloud-controller-manager containers: - name: cloud-controller-manager # for in-tree providers we use k8s.gcr.io/cloud-controller-manager # this can be replaced with any other image for out-of-tree providers image: k8s.gcr.io/cloud-controller-manager:v1.8.0 command: - /usr/local/bin/cloud-controller-manager - --cloud-provider=[YOUR_CLOUD_PROVIDER] # Add your own cloud provider here! - --leader-elect=true - --use-service-account-credentials # these flags will vary for every cloud provider - --allocate-node-cidrs=true - --configure-cloud-routes=true - --cluster-cidr=172.17.0.0/16 tolerations: # this is required so CCM can bootstrap itself - key: node.cloudprovider.kubernetes.io/uninitialized value: "true" effect: NoSchedule # this is to have the daemonset runnable on master nodes # the taint may vary depending on your cluster setup - key: node-role.kubernetes.io/master effect: NoSchedule # this is to restrict CCM to only run on master nodes # the node selector may vary depending on your cluster setup nodeSelector: node-role.kubernetes.io/master: "" Step 5 – Deploy an Application on k0s To test if the cluster is working as desired, we will create a deployment for the Nginx application: The command below can be used to create and apply the manifest: kubectl apply -f -
0 notes
leanesch · 2 years ago
Text
Kubernetes must know:
Tumblr media
First thing to know is that Kubernetes has many competitors such as Docker Swarm, Zookeeper, Nomad etc.. and Kubernetes is not the solution for every architecture so please define your requirements and check other alternatives first before starting with Kuberenetes as it can be complex or not really that beneficial in your case and that an easier orchestrator can do the job.
If you are using a cloud provider, and you want a managed kubernetes service, you can check EKS for AWS, GCP for Google Cloud or AKS for Azure.
Make sure to have proper monitoring and alerting for your cluster as this enables more visibility and eases the management of containerized infrastructure by tracking utilization of cluster resources including memory, CPU, storage and networking performance. It is also recommended to monitor pods and applications in the cluster. The most common tools used for Kubernetes monitoring are ELK/EFK, datadog, Prometheus and Grafana which will be my topic for the next article, etc..
Please make sure to backup your cluster’s etcd data regularly.
In order to ensure that your kubernetes cluster resources are only accessed by certain people, it's recommended to use RBAC in your cluster in order to build roles with the right access.
Scalability and what's more important than scalability, 3 types we must know and include in our cluster architecture are Cluster autoscaler, HPA and VPA.
Resource management is important as well, setting and rightsizing cluster resources requests and limits will help avoiding issues like OOM and Pod eviction and saves you money!
You may want to check Kubernetes CIS Benchmark which is a set of recommendations for configuring Kubernetes to support a strong security posture, you can take a look at this article to learn more about it.
Try to always get the latest Kubernetes stable GA version for newer functionalities and if using cloud, for supported versions.
Scan containers for security vulnerabilities is very important as well, here we can talk about tools like Kube Hunter, Kube Bench etc..
Make use of Admission controllers when possible as they intercept and process requests to the Kubernetes API prior to persistence of the object, but after the request is authenticated and authorized, which is used when you have a set of constraints/behavior to be checked before a resource is deployed. It can also block vulnerable images from being deployed.
Speaking about Admission controller, you can also enforce policies in Kubernetes using a tool like OPA which lets you define sets of security and compliance policies as code.
Using a tool like Falco for auditing the cluster, this is a nice way to log and monitor real time activities and interactions with the API.
Another thing to take a look at is how to handle logging of applications running in containers (I recommend checking logging agents such fluentd/fluentbit) and especially how to setup Log rotation to reduce the storage growth and avoid performance issues.
In case you have multiple microservices running in the cluster, you can also implement a service mesh solution in order to have a reliable and secure architecture and other features such as encryption, authentication, authorization, routing between services and versions and load balancing. One of the famous service mesh solutions is Istio. You can take a look at this article for more details about service mesh.
One of the most important production ready clusters features is to have a backup&restore solution and especially a solution to take snapshots of your cluster’s Persistent Volumes. There are multiple tools to do this that you might check and benchmark like velero, portworx etc..
You can use quotas and limit ranges to control the amount of resources in a namespace for multi-tenancy.
For multi cluster management, you can check Rancher, weave Flux, Lens etc..
0 notes
for-the-user · 6 years ago
Text
heptio ark k8s cluster backups
How do we use it? (in this example, i am using microsoft azure's cloud)
Prepare some cloud storage
Create a storage account and a blob container in the same subscription and resource group as the k8s cluster you want to be running backups on.
$ az storage account create \ --name $AZURE_STORAGE_ACCOUNT_ID \ --resource-group $RESOURCE_GROUP \ --sku Standard_LRS \ --encryption-services blob \ --https-only true \ --kind BlobStorage \ --access-tier Cool \ --subscription $SUBSCRIPTION $ az storage container create \ -n $STORAGE_RESOURCE_NAME \ --public-access off \ --account-name $AZURE_STORAGE_ACCOUNT_ID \ --subscription $SUBSCRIPTION
Get the storage account access key.
$ AZURE_STORAGE_KEY=`az storage account keys list \ --account-name $AZURE_STORAGE_ACCOUNT_ID \ --resource-group $RESOURCE_GROUP \ --query '[0].value' \ --subscription $SUBSCRIPTION \ -o tsv`
Create a service principle with appropriate permissions for heptio ark to use to read and write to the storage account.
$ az ad sp create-for-rbac \ --name "heptio-ark" \ --role "Contributor" \ --password $AZURE_CLIENT_SECRET \ --subscription $SUBSCRIPTION
Finally get the service principle's id called a client id.
$ AZURE_CLIENT_ID=`az ad sp list \ --display-name "heptio-ark" \ --query '[0].appId' \ --subscription $SUBSCRIPTION \ -o tsv`
Provision ark
Next we provision an ark instance to our kubernetes cluster with a custom namespace. First clone the ark repo
$ git clone https://github.com/heptio/ark.git
You will need to edit 3 files.
ark/examples/common/00-prereqs.yaml ark/examples/azure/00-ark-deployment.yaml ark/examples/azure/10.ark-config.yaml
In these yamls, it tries to create a namespace called "heptio-ark" and then put things into that namespace. Change all of these references to a namespace you prefer. I called it "my-groovy-system".
In the 10.ark-config.yaml, you also need to replace the placeholders YOUR_TIMEOUT & YOUR_BUCKET with some actual values. in our case, we use: 15m and the value of $STORAGE_RESOURCE_NAME, which in this case is ark-backups.
Create the pre-requisites.
$ kubectl apply -f examples/common/00-prereqs.yaml customresourcedefinition "backups.ark.heptio.com" created customresourcedefinition "schedules.ark.heptio.com" created customresourcedefinition "restores.ark.heptio.com" created customresourcedefinition "configs.ark.heptio.com" created customresourcedefinition "downloadrequests.ark.heptio.com" created customresourcedefinition "deletebackuprequests.ark.heptio.com" created customresourcedefinition "podvolumebackups.ark.heptio.com" created customresourcedefinition "podvolumerestores.ark.heptio.com" created customresourcedefinition "resticrepositories.ark.heptio.com" created namespace "my-groovy-system" created serviceaccount "ark" created clusterrolebinding "ark" created
Create a secret object, which contains all of the azure ids we gathered in part 1.
$ kubectl create secret generic cloud-credentials \ --namespace my-groovy-system \ --from-literal AZURE_SUBSCRIPTION_ID=$SUBSCRIPTION \ --from-literal AZURE_TENANT_ID=$TENANT_ID \ --from-literal AZURE_RESOURCE_GROUP=$RESOURCE_GROUP \ --from-literal AZURE_CLIENT_ID=$AZURE_CLIENT_ID \ --from-literal AZURE_CLIENT_SECRET=$AZURE_CLIENT_SECRET \ --from-literal AZURE_STORAGE_ACCOUNT_ID=$AZURE_STORAGE_ACCOUNT_ID \ --from-literal AZURE_STORAGE_KEY=$AZURE_STORAGE_KEY secret "cloud-credentials" created
Provision everything.
$ kubectl apply -f examples/azure/ $ kubectl get deployments -n my-groovy-system NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE ark 1 1 1 1 1h $ kubectl get pods -n my-groovy-system NAME READY STATUS RESTARTS AGE ark-7b86b4d5bd-2w5x7 1/1 Running 0 1h $ kubectl get rs -n my-groovy-system NAME DESIRED CURRENT READY AGE ark-7b86b4d5bd 1 1 1 1h $ kubectl get secrets -n my-groovy-system NAME TYPE DATA AGE ark-token-b5nm8 kubernetes.io/service-account-token 3 1h cloud-credentials Opaque 7 1h default-token-xg6x4 kubernetes.io/service-account-token 3 1h
At this point the ark server is running. To interact with it, we need to use a client.
Install the Ark client locally
Download one from here and unzip it and add it to your path. Here's a mac example:
$ wget https://github.com/heptio/ark/releases/download/v0.9.3/ark-v0.9.3-darwin-amd64.tar.gz $ tar -xzvf ark-v0.9.3-darwin-amd64.tar.gz $ mv ark /Users/mygroovyuser/bin/ark $ ark --help
Take this baby for a test drive
Deploy an example thing. Ark provides something to try with.
$ kubectl apply -f examples/nginx-app/base.yaml
This creates a namespace called nginx-example and creates a deployment and service inside with a couple of nginx pods.
Take a backup.
$ ark backup create nginx-backup --include-namespaces nginx-example --namespace my-groovy-system Backup request "nginx-backup" submitted successfully. Run `ark backup describe nginx-backup` for more details. $ ark backup get nginx-backup --namespace my-groovy-system NAME STATUS CREATED EXPIRES SELECTOR nginx-backup Completed 2018-08-21 15:57:59 +0200 CEST 29d
We can see in our Azure storage account container a backup has been created by heptio ark.
If we look inside the folder, we see some json and some gzipped stuff
Let's simulate a disaster.
$ kubectl delete namespace nginx-example namespace "nginx-example" deleted
And try to restore from the Ark backup.
$ ark restore create --from-backup nginx-backup --namespace my-groovy-system Restore request "nginx-backup-20180821160537" submitted successfully. Run `ark restore describe nginx-backup-20180821160537` for more details. $ ark restore get --namespace my-groovy-system NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR nginx-backup-20180821160537 nginx-backup Completed 0 0 2018-08-21 16:05:38 +0200 CEST
Nice.
And to delete backups...
$ ark backup delete nginx-backup --namespace my-groovy-system Are you sure you want to continue (Y/N)? Y Request to delete backup "nginx-backup" submitted successfully. The backup will be fully deleted after all associated data (disk snapshots, backup files, restores) are removed. $ ark backup get nginx-backup --namespace my-groovy-system An error occurred: backups.ark.heptio.com "nginx-backup" not found
And its gone.
2 notes · View notes
digital-dynasty · 4 years ago
Text
Cloud-native: Kubermatic Kubernetes Platform 2.17 bietet automatisierte Backups
Kubermatic nutzt ein neues Meta-Plug-in für Container Networking Interfaces und neue etcd-Controller erlauben automatische Datensicherung sowie Restores. Read more www.heise.de/news/…... www.digital-dynasty.net/de/teamblogs/…
Tumblr media
http://www.digital-dynasty.net/de/teamblogs/cloud-native-kubermatic-kubernetes-platform-2-17-bietet-automatisierte-backups
0 notes
adhocmitteilung · 4 years ago
Text
Kubernetes Tips: Backup and Restore Etcd
New Post has been published on https://www.adhocmitteilung.de/kubernetes-tips-backup-and-restore-etcd/
Kubernetes Tips: Backup and Restore Etcd
medium.com - Launch a VM on a cloud provider There are many choices out there. Some of my favorites ones are DigitalOcean, Civo, and Scaleway. Also, we need to set up an ssh access to this machine using an ssh ke…Mehr zu Kubernetes Services, Kubernetes Training und Rancher dedicated as a Service ...
Ganzen Artikel zu Kubernetes Tips: Backup and Restore Etcd lesen auf https://www.adhocmitteilung.de/kubernetes-tips-backup-and-restore-etcd/
0 notes
unternehmensmeldungen-com · 4 years ago
Text
Kubernetes Tips: Backup and Restore Etcd
New Post has been published on https://www.unternehmensmeldungen.com/kubernetes-tips-backup-and-restore-etcd/
Kubernetes Tips: Backup and Restore Etcd
medium.com - Launch a VM on a cloud provider There are many choices out there. Some of my favorites ones are DigitalOcean, Civo, and Scaleway. Also, we need to set up an ssh access to this machine using an ssh ke…Mehr zu Kubernetes Services, Kubernetes Training und Rancher dedicated as a Service ...
Ganzen Artikel zu Kubernetes Tips: Backup and Restore Etcd lesen auf https://www.unternehmensmeldungen.com/kubernetes-tips-backup-and-restore-etcd/ Mehr Wirtschaftsnachrichten und Unternehmensmeldungen unter https://www.unternehmensmeldungen.com
0 notes
finanznachrichten-online · 4 years ago
Text
Kubernetes Tips: Backup and Restore Etcd
New Post has been published on https://finanznachrichten.online/kubernetes-tips-backup-and-restore-etcd/
Kubernetes Tips: Backup and Restore Etcd
medium.com - Launch a VM on a cloud provider There are many choices out there. Some of my favorites ones are DigitalOcean, Civo, and Scaleway. Also, we need to set up an ssh access to this machine using an ssh ke…Mehr zu Kubernetes Services, Kubernetes Training und Rancher dedicated as a Service ...
Ganzen Artikel zu Kubernetes Tips: Backup and Restore Etcd lesen auf https://finanznachrichten.online/kubernetes-tips-backup-and-restore-etcd/
0 notes
holytheoristtastemaker · 5 years ago
Quote
The new release is focused on providing the scalability, management and security capabilities required to support Kubernetes at edge scale. A headline enhancement is support for one million clusters (currently available in preview). For general availability, the product now supports two thousand clusters and one hundred thousand nodes. Another enhancement is limited connectivity maintenance with K3s. Designed for cluster management, upgrades and patches where clusters may not have fixed or stable network connection, Rancher 2.4 can kick off an upgrade remotely, but the process is managed on local K3s clusters, allowing users to manage upgrades and patches locally and then synchronise with the management server once connectivity is restored. Rancher 2.4 also enables zero downtime maintenance, allowing organisations to upgrade Kubernetes clusters and nodes without application interruption. Additionally, users can select and configure their upgrade strategy for add-ons so that DNS and Ingress do not experience service disruption. Rancher 2.4 introduces CIS Scan, which allows users to run ad-hoc security scans of their RKE clusters against CIS benchmarks published by the Centre for Internet Security. Users can create custom test configurations and generate reports illustrating pass/fail information from which they can take corrective action to ensure their clusters meet security requirements. Rancher 2.4 is available in a hosted Rancher deployment, in which each customer has a dedicated AWS instance of a Rancher Server management control plan. The hosted offering includes a full-featured Rancher server, delivers a 99.9% SLA and automates upgrades, security patches and backups. Downstream clusters (e.g. GKE, AKS) are not included in the SLA and continue to be operated by the respective distribution provider. Several best practices were followed during the hosted Rancher build, including infrastructure as code (IaC), immutable infrastructure and a 'Shift Left' approach. Packer, Terraform and GitHub were chosen for tooling. Rancher delivers a consistent Kubernetes management experience for all certified distributions, including RKE, K3s, AKS, EKS, and GKE on-premise, cloud and/or edge. InfoQ spoke to Sheng Liang, CEO and co-founder of Rancher Labs, about the announcement: InfoQ: What is 'the edge'? Sheng Liang: When talking about the edge, people typically mean small and standalone computing resources like set-top boxes, ATM machines, and IoT gateways. In the broadest sense, however, you can think of the edge as any computing resource that is not in the cloud. So, not only do branch offices constitute part of your edge locations, developer laptops are also part of the device edge, and legacy on-premises systems could be considered the data centre edge. InfoQ: What's the difference between K3s and K8s? Liang: K3s adds specialised configurations and components to K8s so that it can be easily deployed and managed on edge devices. For example, K3s introduces a number of configuration database options beyond the standard etcd key-value store to make Kubernetes easier to operate in resource-constrained environments. K8s is often operated by dedicated DevOps engineers or SREs, whereas K3s is packaged as a single binary and can be deployed with applications or embedded in servers. InfoQ: Please can you explain the RKE strategy? Liang: RKE is Rancher's Kubernetes distribution for data centre deployments. It is a mature, stable, enterprise grade, and easy-to-use Kubernetes distribution. It has been in production and used by large enterprise customers for years. Going forward, we plan to incorporate many of the more modern Kubernetes operations enhancements developed in K3s into RKE 2.0. InfoQ: Why are people concerned about security in Kubernetes? Liang: As a new layer of software running between the applications and the underlying infrastructure, Kubernetes has a huge impact on the overall security of the system. On one hand, Kubernetes brings enhanced security by introducing opportunities to check, validate, encrypt, control, and lockdown application workload and the underlying infrastructure. On the other hand, a misconfigured Kubernetes could introduce additional security holes in the overall technology stack. It is therefore essential for Kubernetes management platforms like Rancher to ensure 1) Kubernetes clusters are configured securely (using for example, CIS benchmarks) and 2) applications take advantage of the numerous security enhancements offered by Kubernetes. InfoQ: What are the typical security requirements a Kubernetes cluster needs to comply with? Liang: At the most basic level, every Kubernetes cluster needs to have proper authentication, role-based access control, and secret management. When an enterprise IT organisation manages many different clusters, they need to make sure to have centralised policy management across all clusters.  An enterprise IT organisation, for example, can mandate a policy that all production Kubernetes clusters have the necessary security tools (e.g., Aqua or Twistlock) installed. InfoQ: If teams want Rancher hosted on Azure or GCP can they have that? Liang: As open source software, Rancher can be installed on any infrastructure, including AWS, Azure, and GCP. In that case though the users have to operate Rancher themselves. The initial launch of hosted Rancher in Rancher 2.4 only runs on AWS.  We plan to launch hosted Rancher in Azure and GCP in the future. InfoQ: How is it that Rancher is able to support such a wide range of Kubernetes distributions? Liang: Rancher is able to support any Kubernetes distribution because Kubernetes is the standard for computing. All Kubernetes distribution vendors today commit to running the same upstream Kubernetes code and to passing the same CNCF-defined compliance tests. Rancher is then able to take advantage of the portability guarantee of Kubernetes to create a seamless computing experience that spans the data centre, cloud, and edge. Rancher does not attempt to create a vertically locked-in technology stack that ties Rancher Kubernetes management with Rancher Kubernetes distribution. InfoQ: What are the geographies that Rancher is targeting for expansion and how will this happen? Liang: As an open source project, Rancher is adopted by Kubernetes users worldwide. Rancher today has commercial operations in fourteen countries across the Americas, Europe, Africa, and the Asia Pacific region. Our geographic presence will continue to grow as we generate significant amounts of enterprise subscription business in more countries. InfoQ: What proportion of enterprise applications currently run on Kubernetes and what's the forecast for growth? Liang: Despite the rapidly rising popularity of Kubernetes, the proportion of enterprise applications running on Kubernetes is still small among Rancher customers. Rancher customers have reported low single digits percentage of applications running on Kubernetes, which represents tremendous upside growth potential for Rancher.
http://damianfallon.blogspot.com/2020/04/rancher-24-supports-1-million.html
0 notes
faizrashis1995 · 5 years ago
Text
What Is a Kubernetes Operator?
Using a Kubernetes operator means you can operate a stateful application by writing a custom controller with domain specific knowledge built into it. If you are new to Kubernetes, the idea of an operator can be confusing, so let’s look at a simple example.
 Say you have a java app that connects to a database. You want to deploy that to your k8s cluster. Ideally, you’d want to run something like a “deployment” for the java app exposed with a service, and for the backend, run a “statefulset” for the database application. There are two parts to this setup:
 The stateless part, the Java app
The stateful part, the database
To understand an operator, think of the stateful part of the setup—the database or any application that stores data, like etcd. So, in our example, we can apply what we know about how the application relates to the database and create a controller that will do certain things when the application behaves in a certain way.
 Site reliability engineers and operational engineers are often tasked with, or interested in, automating things like backup, updates, data restore, etc. How these tasks are achieved varies depending on the application itself and the business use case (domain knowledge). This is exactly what k8s operators do: act on behalf of the user when you need to perform certain tasks that an SRE/Ops engineer normally has to perform.
 Operators follow K8S patterns
Operator is built on two key principles of Kubernetes: a custom resource and a custom controller.
 Custom resource
In Kubernetes, a resource is an endpoint in the k8s API that stores a bunch of API objects of a specific kind. It allows us to extend k8s by adding more objects of a kind to the cluster. After that, we can use kubectl to access our object just like any other built-in object.
 Take, for example, a pod or deployment. When you write a manifest, you have to specify a kind (pod or deployment) in the yaml file. A custom resource is simply a resource that does not come bundled with k8s out of the box.
 Custom controller
A controller is a control loop that watches the cluster for changes to a specific resource (custom resource) and makes sure that the current state matches the desired state. As a matter of fact, we are already using some form of controller already built into k8s.
 A good example is a deployment wherein you kill a pod, and another one spins up. The controller sees that the number of pods you desire does not match the current state, so it spins another one up to match the desired state.
 So, why aren’t those built-in controllers called operators? Because those controllers are not specific to a particular application; they are upstream controllers used with built-in resources, like deployment, jobs, etc.
 When to use an operator
It is important to know that all operators are controllers but not all controllers are operators. For a controller to be considered an operator, it must have application domain knowledge in it to perform automated tasks on behalf of the user (SRE/Ops engineer).
 Use an operator whenever you need to encapsulate your stateful application business logic controlling everything with Kubernetes API. This allows automation around your application built into the k8s ecosystem.
Use an operator whenever you need to build a tool that watches your applications for changes and perform certain SRE/Ops tasks when certain things happen.
How to build an operator
There are several ways to build an operator:
 ClientGo connects to the Kubernetes API.
The benefit: It uses the same resources as your built-in resources so you can rest easy knowing that the code is tested and versioned according to k8s standards.
The downside: It has a steep learning curve if you don’t understand the Go programming language.
Kubebuilder, part of the k8s sigs organization, is written in go and uses the controller-runtime.
Operator SDK, originally written by core OS and now run by RedHat, is a framework that comes with helper functions to create operators in Go, HEML, or Ansible.
How to deploy an operator
You can deploy an operator in two ways:
 Using yaml just like any other Kubernetes manifest.
Using Helm chart to deploy both CRD and controller as a package.
Best practices for creating an operator
K8S controllers are for the cluster itself, and operators are controllers for your deployed stateful applications.
 When creating an operator, follow these best pattern practices:
 Take advantage of built-in kinds to create your custom kinds. That way, you are leveraging already tested and proven kinds.
Ensure no other outside code is needed for your controller to function. So, running kubectl install should be all you need to deploy the controller.
If the operator is stopped, make sure your application can still function as expected.
Employ sufficient tests for your controller code.
When you’re ready to create your own application-specific custom resources that can be reconciled with your custom controller which allows you to extend the normal behavior of Kubernetes—you are ready to use operators.[Source]-https://www.bmc.com/blogs/kubernetes-operator/
Basic & Advanced Kubernetes Certification using cloud computing, AWS, Docker etc. in Mumbai. Advanced Containers Domain is used for 25 hours Kubernetes Training.
0 notes
mobilenamic · 6 years ago
Text
OpenShift project backups
Dr Jekyll’s potion famously owes its effectiveness to an ‘unknown impurity’. This is why, at the end of Stevenson’s tale, the protagonist has to confess to himself and the world that he will never regain control of his destructive alter ego. Some configuration errors are hard to spot; but it is much harder to figure out why an earlier, throwaway version of a service worked when our painstaking attempts to recreate it fail. As I hope to show, creating regular backups of our projects can help.
I’d like to distinguish between two kinds of backup here. On the one hand, there’s a spare vial in the fridge. Its contents match the original potion exactly. This is essentially a database snapshot. On the other hand, there’s a laboratory analysis of the original potion, which represents our only chance of identifying the ‘unknown impurity’.
In many cases, the vial in the fridge is what is needed. Its direct equivalent in the Kubernetes world is a database backup of the master’s etcd store. I want to concentrate instead on the laboratory analysis. It is less convenient when time is short, but it does offer a clear, human-readable glimpse of a particular moment in time when our service was working correctly.
While this approach will probably not allow you to restore the entire cluster to a working state, it enables you to look at an individual project, dissect its parts and hopefully identify the tiny, inadvertent configuration change that separates a failed deployment from a successful one.
There is no need to lock the database prior to taking the backup. We are exporting individual objects to pretty-printed JSON, not dumping a binary blob.
Why, considering our infrastructure is expressed in code, should we go to the trouble of requesting laboratory analyses? Surely the recipe will suffice as everything of consequence is persisted in Git? The reason is that too often the aspiration to achieve parity between code and infrastructure is never realised. Few of us can say that we never configure services manually (a port changed here, a health check adjusted there); even fewer can claim that we regularly tear down and rebuild our clusters from scratch. If we consider ourselves safe from Dr Jekyll’s error, we may well be deluding ourselves.
Project export
Our starting point is the script export_project.sh in the repository openshift/openshift-ansible-contrib. We will use a substantially modified version (see fork and pull request).
One of the strengths of the Kubernetes object store is that its contents are serialisable and lend themselves to filtering using standard tools. We decide which objects we deem interesting and we also decide which fields can be skipped. For example, the housekeeping information stored in the .status property is usually a good candidate for deletion.
oc export has been deprecated, so we use oc get -o json (followed by jq pruning) to export object definitions. Take pods, for example. Most pod properties are worth backing up, but some are dispensable: they include not only a pod’s .status, but also its .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.creationTimestamp and .metadata.generation fields.
Some caveats are in order. We store pod and replication controller definitions, yet we also store deployment configurations. Clearly the third is perfectly capable of creating the first two. Still, rather than second-guess a given deployment sequence, the backup comprises all three. It is after all possible that the pod definition (its replicas property, for example) has been modified. The resulting history may be repetitive, but we cannot rule out the possibility of a significant yet unseen change.
Another important caveat is that this approach does not back up images or application data (whether stored ephemerally or persistently on disk). It complements full disk backups, but it cannot take their place.
Why not use the original export script? The pull request addresses three central issues: it continues (with a warning) when the cluster does not recognise a resource type, thus supporting older OpenShift versions. It also skips resource types when the system denies access to the user or service account running the export, thus adding support for non-admin users. (Usually the export will be run by a service account, and denying the service account access to secrets is a legitimate choice.) Finally, it always produces valid JSON. The stacked JSON output of the original is supported by jq and indeed oc, but expecting processors to accept invalid, stacked JSON is a risky choice for backup purposes. python -m json.tool, for instance, requires valid JSON input and rejects the output of the original script. Stacked JSON may be an excellent choice for chunked streaming (log messages come to mind) but here it seems out of place.
Backup schedule
Now that the process of exporting the resources is settled, we can automate it. Let’s assume that we want the export to run nightly backups. We want to zip up the output, add a date stamp and write it to persistent storage. If that succeeds we finish by rotating backup archives, that is, deleting all exports older than a week. The parameters (when and how often the export runs, the retention period, and so on) are passed to the template at creation time.
Let’s say we are up and running. What is happening in our backup project?
Tumblr media
Fig. 1 Backup service
A nightly CronJob object instantiates a pod that runs the script project_export.sh. Its sole dependencies are oc and jq. It’s tempting at first glance to equip this pod with the ability to restore the exported object definitions, but that would require sweeping write access to the cluster. As mentioned earlier, the pod writes its output to persistent storage. The storage mode is ReadWriteMany, so we can access our files whether an export is currently running or not. Use the spare pod deployed alongside the CronJob object to retrieve the backup archives.
Policy
The permissions aspect is crucial here. The pod’s service account is granted cluster reader access and an additional, bespoke cluster role secret-reader. It is defined as follows:
kind: ClusterRole apiVersion: v1 metadata: name: ${NAME}-secret-reader rules: - apiGroups: [""] resources: ["secrets"] verbs: ["get", "list"]
Perhaps the greatest benefit of custom cluster roles is that they remove the temptation to grant cluster-admin rights to a service account.
The export should not fail just because we decide that a given resource type (e.g. secrets or routes) is out of bounds. Nor should it be necessary to comment out parts of the export script. To restrict access, simply modify the service account’s permissions. For each resource type, the script checks whether access is possible and exports only resources the service account can view.
Tumblr media
Fig. 2 Permissions
Administrator permissions are required only to create the project at the outset. The expectation is that this would be done by an authenticated user rather than a service account. As Fig. 2 illustrates, the pod that does the actual work is given security context constraint ‘restricted’ and security context ‘non-privileged’. For the most part, the pod’s service account has read access to the etcd object store and write access to its persistent volume.
How to get started, and why
To set up your own backup service, enter:
$ git clone https://github.com/gerald1248/openshift-backup $ make -C openshift-backup
If you’d rather not wait until tomorrow, store the permanent pod’s name in variable pod and enter:
$ oc exec ${pod} openshift-backup $ oc exec ${pod} -- ls -l /openshift-backup
Please check that the output has been written to /openshift-backup as intended. You can use the script project_import.sh (found next to project_export.sh in the openshift/openshift-ansible-contrib repository) to restore one project at a time. However, in most cases it will be preferable to use this backup as an analytical tool, and restore individual objects as required.
It’s worth considering the sheer number of objects the object store holds for a typical project. Each of them could have been edited manually or patched programmatically. It could also lack certain properties that are present in the version that is stored in Git. Kubernetes is prone to drop incorrectly indented properties at object creation time.
In short, there is ample scope for ‘unknown impurities’. Given how few computing resources are required, and how little space a week’s worth of project backups takes up, I would suggest that there is every reason to have a laboratory analysis to hand when the vials in the fridge run out.
Der Beitrag OpenShift project backups erschien zuerst auf codecentric AG Blog.
OpenShift project backups published first on https://medium.com/@TheTruthSpy
0 notes
iyarpage · 6 years ago
Text
OpenShift project backups
Dr Jekyll’s potion famously owes its effectiveness to an ‘unknown impurity’. This is why, at the end of Stevenson’s tale, the protagonist has to confess to himself and the world that he will never regain control of his destructive alter ego. Some configuration errors are hard to spot; but it is much harder to figure out why an earlier, throwaway version of a service worked when our painstaking attempts to recreate it fail. As I hope to show, creating regular backups of our projects can help.
I’d like to distinguish between two kinds of backup here. On the one hand, there’s a spare vial in the fridge. Its contents match the original potion exactly. This is essentially a database snapshot. On the other hand, there’s a laboratory analysis of the original potion, which represents our only chance of identifying the ‘unknown impurity’.
In many cases, the vial in the fridge is what is needed. Its direct equivalent in the Kubernetes world is a database backup of the master’s etcd store. I want to concentrate instead on the laboratory analysis. It is less convenient when time is short, but it does offer a clear, human-readable glimpse of a particular moment in time when our service was working correctly.
While this approach will probably not allow you to restore the entire cluster to a working state, it enables you to look at an individual project, dissect its parts and hopefully identify the tiny, inadvertent configuration change that separates a failed deployment from a successful one.
There is no need to lock the database prior to taking the backup. We are exporting individual objects to pretty-printed JSON, not dumping a binary blob.
Why, considering our infrastructure is expressed in code, should we go to the trouble of requesting laboratory analyses? Surely the recipe will suffice as everything of consequence is persisted in Git? The reason is that too often the aspiration to achieve parity between code and infrastructure is never realised. Few of us can say that we never configure services manually (a port changed here, a health check adjusted there); even fewer can claim that we regularly tear down and rebuild our clusters from scratch. If we consider ourselves safe from Dr Jekyll’s error, we may well be deluding ourselves.
Project export
Our starting point is the script export_project.sh in the repository openshift/openshift-ansible-contrib. We will use a substantially modified version (see fork and pull request).
One of the strengths of the Kubernetes object store is that its contents are serialisable and lend themselves to filtering using standard tools. We decide which objects we deem interesting and we also decide which fields can be skipped. For example, the housekeeping information stored in the .status property is usually a good candidate for deletion.
oc export has been deprecated, so we use oc get -o json (followed by jq pruning) to export object definitions. Take pods, for example. Most pod properties are worth backing up, but some are dispensable: they include not only a pod’s .status, but also its .metadata.uid, .metadata.selfLink, .metadata.resourceVersion, .metadata.creationTimestamp and .metadata.generation fields.
Some caveats are in order. We store pod and replication controller definitions, yet we also store deployment configurations. Clearly the third is perfectly capable of creating the first two. Still, rather than second-guess a given deployment sequence, the backup comprises all three. It is after all possible that the pod definition (its replicas property, for example) has been modified. The resulting history may be repetitive, but we cannot rule out the possibility of a significant yet unseen change.
Another important caveat is that this approach does not back up images or application data (whether stored ephemerally or persistently on disk). It complements full disk backups, but it cannot take their place.
Why not use the original export script? The pull request addresses three central issues: it continues (with a warning) when the cluster does not recognise a resource type, thus supporting older OpenShift versions. It also skips resource types when the system denies access to the user or service account running the export, thus adding support for non-admin users. (Usually the export will be run by a service account, and denying the service account access to secrets is a legitimate choice.) Finally, it always produces valid JSON. The stacked JSON output of the original is supported by jq and indeed oc, but expecting processors to accept invalid, stacked JSON is a risky choice for backup purposes. python -m json.tool, for instance, requires valid JSON input and rejects the output of the original script. Stacked JSON may be an excellent choice for chunked streaming (log messages come to mind) but here it seems out of place.
Backup schedule
Now that the process of exporting the resources is settled, we can automate it. Let’s assume that we want the export to run nightly backups. We want to zip up the output, add a date stamp and write it to persistent storage. If that succeeds we finish by rotating backup archives, that is, deleting all exports older than a week. The parameters (when and how often the export runs, the retention period, and so on) are passed to the template at creation time.
Let’s say we are up and running. What is happening in our backup project?
Fig. 1 Backup service
A nightly CronJob object instantiates a pod that runs the script project_export.sh. Its sole dependencies are oc and jq. It’s tempting at first glance to equip this pod with the ability to restore the exported object definitions, but that would require sweeping write access to the cluster. As mentioned earlier, the pod writes its output to persistent storage. The storage mode is ReadWriteMany, so we can access our files whether an export is currently running or not. Use the spare pod deployed alongside the CronJob object to retrieve the backup archives.
Policy
The permissions aspect is crucial here. The pod’s service account is granted cluster reader access and an additional, bespoke cluster role secret-reader. It is defined as follows:
kind: ClusterRole apiVersion: v1 metadata: name: ${NAME}-secret-reader rules: - apiGroups: [""] resources: ["secrets"] verbs: ["get", "list"]
Perhaps the greatest benefit of custom cluster roles is that they remove the temptation to grant cluster-admin rights to a service account.
The export should not fail just because we decide that a given resource type (e.g. secrets or routes) is out of bounds. Nor should it be necessary to comment out parts of the export script. To restrict access, simply modify the service account’s permissions. For each resource type, the script checks whether access is possible and exports only resources the service account can view.
Fig. 2 Permissions
Administrator permissions are required only to create the project at the outset. The expectation is that this would be done by an authenticated user rather than a service account. As Fig. 2 illustrates, the pod that does the actual work is given security context constraint ‘restricted’ and security context ‘non-privileged’. For the most part, the pod’s service account has read access to the etcd object store and write access to its persistent volume.
How to get started, and why
To set up your own backup service, enter:
$ git clone https://github.com/gerald1248/openshift-backup $ make -C openshift-backup
If you’d rather not wait until tomorrow, store the permanent pod’s name in variable pod and enter:
$ oc exec ${pod} openshift-backup $ oc exec ${pod} -- ls -l /openshift-backup
Please check that the output has been written to /openshift-backup as intended. You can use the script project_import.sh (found next to project_export.sh in the openshift/openshift-ansible-contrib repository) to restore one project at a time. However, in most cases it will be preferable to use this backup as an analytical tool, and restore individual objects as required.
It’s worth considering the sheer number of objects the object store holds for a typical project. Each of them could have been edited manually or patched programmatically. It could also lack certain properties that are present in the version that is stored in Git. Kubernetes is prone to drop incorrectly indented properties at object creation time.
In short, there is ample scope for ‘unknown impurities’. Given how few computing resources are required, and how little space a week’s worth of project backups takes up, I would suggest that there is every reason to have a laboratory analysis to hand when the vials in the fridge run out.
Der Beitrag OpenShift project backups erschien zuerst auf codecentric AG Blog.
OpenShift project backups published first on https://medium.com/@koresol
0 notes
computingpostcom · 2 years ago
Text
You have a running OpenShift Cluster powering your production microservices and worried about etcd data backup?. In this guide we show you how to easily backup etcd and push the backup data to AWS S3 object store. An etcd is a key-value store for OpenShift Container Platform, which persists the state of all resource objects. In any OpenShift Cluster administration, it is a good and recommended practice to back up your cluster’s etcd data regularly and store it in a secure location. The ideal location for data storage is outside the OpenShift Container Platform environment. This can be an NFS server share, secondary server in your Infrastructure or in a Cloud environment. The other recommendation is taking etcd backups during non-peak usage hours, as the action is blocking in nature. Ensure etcd backup operation is performed after any OpenShift Cluster upgrade. The importance of this is that during cluster restoration, an etcd backup taken from the same z-stream release must be used. As an example, an OpenShift Container Platform 4.6.3 cluster must use an etcd backup that was taken from 4.6.3. Step 1: Login to one Master Node in the Cluster The etcd cluster backup has to be performed on a single invocation of the backup script on a master host. Do not take a backup for each master host. Login to one master node either through SSH or  debug session: # SSH Access $ ssh core@ # Debug session $ oc debug node/ For a debug session you need to change your root directory to the host: sh-4.6# chroot /host If the cluster-wide proxy is enabled, be sure that you have exported the NO_PROXY, HTTP_PROXY, and HTTPS_PROXY environment variables. Step 2: Perform etcd Backup on OpenShift 4.x An OpenShift cluster access as a user with the cluster-admin role is required to perform this operation. Before you proceed check to confirm if proxy is enabled: $ oc get proxy cluster -o yaml If you have proxy enabled, httpProxy, httpsProxy, and noProxy fields will have the values set. Run the cluster-backup.sh script to initiate etcd backup process. You should pass a path where backup is saved. $ mkdir /home/core/etcd_backups $ sudo /usr/local/bin/cluster-backup.sh /home/core/etcd_backups Here is my command execution output: 3e53f83f3c02b43dfa8d282265c1b0f9789bcda827c4e13110a9b6f6612d447c etcdctl version: 3.3.18 API version: 3.3 found latest kube-apiserver-pod: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-115 found latest kube-controller-manager-pod: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-24 found latest kube-scheduler-pod: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-26 found latest etcd-pod: /etc/kubernetes/static-pod-resources/etcd-pod-11 Snapshot saved at /home/core/etcd_backups/snapshot_2021-03-16_134036.db snapshot db and kube resources are successfully saved to /home/core/etcd_backups List files in the backup directory: $ ls -1 /home/core/etcd_backups/ snapshot_2021-03-16_134036.db static_kuberesources_2021-03-16_134036.tar.gz $ du -sh /home/core/etcd_backups/* 1.5G /home/core/etcd_backups/snapshot_2021-03-16_134036.db 76K /home/core/etcd_backups/static_kuberesources_2021-03-16_134036.tar.gz There will be two files in the backup: snapshot_.db: This file is the etcd snapshot. static_kuberesources_.tar.gz: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot. Step 3: Push the Backup to AWS S3 (From Bastion Server) Login from Bastion Server and copy backup files. scp -r core@serverip:/home/core/etcd_backups ~/ Install AWS CLI tool: curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" Install unzip tool: sudo yum -y install unzip Extract downloaded file: unzip awscli-exe-linux-x86_64.zip Install AWS CLI: $ sudo ./aws/install You can now run: /usr/local/bin/aws --version Confirm installation by checking the version:
$ aws --version aws-cli/2.1.30 Python/3.8.8 Linux/3.10.0-957.el7.x86_64 exe/x86_64.rhel.7 prompt/off Create OpenShift Backups bucket: $ aws s3 mb s3://openshiftbackups make_bucket: openshiftbackups Create an IAM User: $ aws iam create-user --user-name backupsonly Create AWS Policy for Backups user – user able to write to S3 only: cat >aws-s3-uploads-policy.json
0 notes