Kops tasks and effort estimation
List of open issues and features and their effort estimation as of .
TODO Issues listed below require proper labels to be assigned, especially P0.
Priorities
P0: Must have fixes and features, needed to make existing vSphere support in kops work.
P1: Important fixes and features, required to give vSphere users a more useful kubernetes cluster deployment experience, with multiple masters and HA.
P2: Rarely occurring issues and features that will bring vSphere support closer to AWS and GCE support in kops.
Notes:
- Effort estimation includes fix for an issue or implementation for a feature, testing and generating a PR.
- There are a few issues that are related to startup and base image. If we can resolve "Use PhotonOS for vSphere node template" issue first and replace init-cloud with guestinfo, those issues might get resolved automatically. But further investigation is needed and fixed issues will need verifications and testings.
Priority | Task | Type (bug, feature, test | Effort estimate(in days) | Remarks |
---|---|---|---|---|
P0 | Kops for vSphere is broken kubernetes/kops#2729 | Bug | 1 | |
P0 | AWS EBS is set as default volume provisioner, instead of vSphere kubernetes/kops#2732 | Bug | 1 | Looks like the fix is available, need to be tested again before committing https://github.com/vmware/kops/pull/70/files |
P0 | Package installation through nodeup causing delay in cluster deployment kubernetes/kops#2742 | Bug | 2 | If we get the "Use PhotonOS for vSphere node template" done first, this one may just be avoided. Verification required. |
P0 | Connection to api.clustername.skydns.local is failing kubernetes/kops#2744 | Bug | 1 | This might be fixed by Abrar's PR in kubernetes already. Just need to verify. |
P0 | Make update command, that includes scale up and down, work kubernetes/kops#2738. There are two possible ways to implement this- without auto scaling group (ASG) or with auto scaling group | Feature | 4 days Assuming ASG is available. 4 days ASG is not available. | This effort estimation needs more analysis. |
P0 | Make end-to-end CI/CD work on vmware/kops kubernetes/kops#2730 | Bug, Test | 3 | |
P1 | Use PhotonOS for vSphere node template kubernetes/kops#2735. Use guestinfo instead of init-cloud kubernetes/kops#2726 | Feature | 5 | This was originally P2 issue. However several other issues might be affected by this one. So bring it to P1. Problem to solve: init-cloud on PhotonOS not working properly. Or, get rid of init-cloud and use guestinfo instead. |
P1 | Default image name extraction for vSphere needs to be fixed kubernetes/kops#2740 | Bug | 2 | |
P1 | Multi-master HA setup kubernetes/kops#2734 | Feature | 5 | |
P1 | Add unit tests for vSphere workflows kubernetes/kops#2745 | Test | 8 | This effort estimation needs more analysis as estimator is not familiar with all the components for which tests need to be written. |
P1 | Documentation of- 1) Existing commands and usage, blog kubernetes/kops#2739, 2) Behavior for all flags for ‘kops cluster create’ command kubernetes/kops#2741 | 3 | ||
P2 | vCenter running out of HTTP sessions kubernetes/kops#2747 | Bug | 3 | |
P2 | Long boot time for template VM kubernetes/kops#2746 | Bug | 3 | This estimation needs more analysis. Same here: if Use PhotonOS for vSphere node template can be resolved first, this might not be valid. Still, verification needed. |
P2 | Improve methods that create user-data and meta-data for ISO kubernetes/kops#2748, or use guestinfo instead of cloud-init for passing in VM specific informations kubernetes/kops#2726 | Feature | 2 | We need decide If we should use guestinfo instead of cloud-init. |
P2 | Support for rolling upgrade, normal upgrade | Feature | 7 | This estimation needs more analysis. Presence of ASG might simplify the implementation of this feature. Some changes might be needed in core kops code, as current rolling upgrade implementation is very much AWS specific. |
P2 | Make ETCD volumes re-attachable for vSphere (AWS and GCE already support this) kubernetes/kops#2736 | Feature | 7 | This task needs more analysis for design and implementation. Estimate might change accordingly. |
P2 | Security and isolation- 1) Networking for master, worker nodes kubernetes/kops#2731. 2) Credentials in plain text kubernetes/kops#2743 | Feature | — | Don't have enough information on this to give an estimate for effort involved. |
P2 | Explore vSphere DRS cluster’s anti-affinity rules to achieve better master HA and meaningful zone allotment for masters kubernetes/kops#2733 | Feature | 8 | This estimation needs more analysis. DRS anti-affinity rule needs to be explored, to see if it's even fit for this problem. |
P2 | Enable user-defined dns zone name kubernetes/kops#2727 | Feature | 2 | |
Total 67 |
Kops commands behavior for vSphere
List of all kops commands and how they behave for vSphere cloud provider, as of .
Column explanation
- Command, option and usage example are self-explanatory.
- vSphere support: whether or not the command is supported for vSphere cloud provider (Yes/No), followed by current status of that command and explanation of any failures.
- Graceful termination needed: If the command will not supported, does it need additional code to fail gracefully for vSphere provider?
- Remark: Miscellaneous comments about the command.
Command | Option | Usage example | vSphere support | Graceful termination needed (if not fixed) | Remark |
---|---|---|---|---|---|
completion | bash | kops completion bash | Yes | No | Output shell completion code for the given shell (bash), which can easily be incorporated in a bash script to run kops commands as bash functions. |
create | cluster | Yes. Supported/tested command flags: cloud, dns, dns-zone, image, networking, node-count, vsphere-server, vsphere-datacenter, vsphere-resource-pool, vsphere-datastore, vsphere-coredns-server, yes, zones. | Yes. Check for unsupported flags. Terminate command, if needed, with appropriate message. | Creates cluster spec and configs. If —yes is specified then creates resources as well. | |
create | instancegroup | kops create ig —name=v1c1.skydns.local —role=Node —subnet=vmw-zone nodes2 | No. InstanceGroup spec gets created in object store. Command however shows this error even after setting 'image' value in spec: I0412 11:08:23.025842 80677 populate_instancegroup_spec.go:257] Cannot set default Image for CloudProvider="vsphere" | Yes. Either add a check for vSphere, or fix the issue causing the failure. | |
create | secret | kops create secret sshpublickey test_key -i ~/.ssh/git_rsa.pub | Yes | No | Creates and delete secrets can be used in combination to replace existing secrets. Justin's explanation: "k8s in theory supports multiple certificates but it was not working until 1.5 so I don't think we actually enable it in kops This will be how we do certificate rotation though - add a certificate, roll that out, roll out a new key and switch to the new key" |
create | -f FILENAME | Three yams files are required- cluster: kops create -f ~/kops.yaml, master IG: kops create -f ~/kops.nodeig.yaml, node IG: kops create -f ~/kops.masterig.yaml | Yes | No | |
delete | cluster | kops delete cluster v2c1.skydns.local —yes | Yes | No | |
delete | instancegroup | kops delete instancegroup —name=v2c1.skydns.local nodes.v2c1.skydns.local | No. No implementation available to list resources. Method corresponding to AWS is getting called and crashing with panic, without any useful message. | Yes | |
delete | secret | Yes | - | ||
delete | -f FILENAME | kops delete -f config.yaml —name=v2c1.skydns.local | No. Cluster deletion works. Instance group deletion is failing with error: panic: interface conversion: *vsphere.VSphereCloud is not awsup.AWSCloud: missing method AddAWSTags goroutine 1 [running]: panic(0x26fbd20, 0xc420770780) /usr/local/go/src/runtime/panic.go:500 +0x1a1 k8s.io/kops/upup/pkg/kutil.FindCloudInstanceGroups | Yes | Delete cluster, ig specified by the file. |
describe | secrets | kops describe secrets | Yes | No | Describe secrets, based on the kubectl context. |
edit | cluster | kops edit cluster —name=v2c1.skydns.local nodes | Yes. Edited spec gets updated in object store. | Yes | Edit works. But it would be a bad user experience if we allow users to edit the spec, followed by a failed 'kops update' and then no way to go back to the older spec. |
edit | ig | kops edit ig —name=v2c1.skydns.local nodes | Yes. Edited spec gets updated in object store. | Yes | Edit works. But it would be a bad user experience if we allow users to edit the spec, followed by a failed 'kops update' and then no way to go back to the older spec. |
edit | federation | No | Yes | Federation is a group of k8s clusters. This doesn't look an important goal for vSphere in near future. Q: "How is a federation getting created? I see update and edit methods for a federation but I am not clear how to get a federation in the first place." A: Justin's reply: I'm chatting with the federation folk about kubefed & kops and whether we should integrate them etc. The federation stuff was very alpha and I believe is (trivially) broken right now, but I'm debating integrating with kubefed vs fixing kops federation. kubefed worked fine when I tried it the other day. | |
export | kubecfg | kops export kubecfg v1c1.skydns.local | Yes | - | Sets kubectl context to given cluster. |
get | clusters | Yes | - | Gets list of clusters. If yaml output is specified, this output can be modified and used for 'kops replace' command. | |
get | federations | Yes | - | Gets list of federations. For now empty list. | |
get | instancesgroups | Yes | - | Gets list of intancegroups. If yaml output is specified, this output can be modified and used for 'kops replace' command. | |
get | secrets | Yes | - | Gets list of secrets. | |
import | cluster | kops import cluster —region=us-west-2 —name=v2c1.skydns.local nodes | No. Current implementation is very aws specific. Multiple aws services are queried to construct the api.Cluster object. | Yes | Imports spec for an existing cluster into the object store. While this functionality is good for importing and managing existing k8s clusters using kops, it doesn't seem like a high priority functionality at this point of time. |
replace | kops replace -f FILENAME | No | Yes | Output of kops get cluster name -oyaml or kops get ig name -oyaml can be updated and passed to 'kops replace' command. | |
rolling-update | cluster | No. Current implementation is aws specific. | Yes | ||
secrets | create | - | - | Legacy command, points to 'kops create secrets'. | |
secrets | describe | - | - | Legacy command, points to 'kops describe secrets'. | |
secrets | expose | - | - | Legacy command, points to 'kops get secrets -oplaintext'. | |
secrets | get | - | - | Legacy command, point to 'kops get secret'. | |
toolbox | dump | No. Current implementation is aws specific. | Yes | Dumps cloud information for the given cluster. This looks like a good to have functionality. Once resource listing is available for vsphere, which will anyways get used for deletion operation as well, this command should become easier to implement. | |
toolbox | convert-imported | No. Current implementation is aws specific. | Yes | Doesn't look like a high priority functionality. | |
update | cluster | kops update cluster —name=v2c1.skydns.local —yes | No. 1) Works for new cluster. 2) Existing cluster scale up: vSphere provisioning code tries to provision all master and node VMs from scratch. New nodes get created and registered successfully. Existing resources keep failing with 'already exists' error. 3) Existing cluster scale down: Won't work, no resource listing or deletion logic available for vSphere. On top of that all listed resources- masters and workers are attempted for creation and fail with 'already exists' error. | Yes | |
update | federation | No | Yes | Federation is a group of k8s clusters. This doesn't look an important goal for vSphere in near future. | |
upgrade | cluster | kops upgrade cluster —name=v1c1.skydns.local —yes | No. Seeing this error: W0413 11:48:52.216116 15456 upgrade_cluster.go:202] No matching images specified in channel; cannot prompt for upgrade | Yes | Find out more about 'channel' in context of kops. Note that no —channel argument is specified. |
validate | cluster | kops validate cluster —name=v1c1.skydns.local | Yes. Not working right now. Failing with this error: cannot get nodes for "v1c1.skydns.local": Get https://api.v1c1.skydns.local/api/v1/nodes: dial tcp: lookup api.v1c1.skydns.local: no such host | - | Investigation is already going on- https://github.com/kubernetes/kops/issues/2744. This issue will most likely get fixed by a fix in cloud-provider code that is not returning appropriate internal and external IP for the node. |
version | kops version | Yes | - | Prints client version information. Eg: Version 1.6.0-alpha.1 (git-500cb69) |