Kubeflow 1.1導入でハマったこと

Image for post
Image for post

TL;DR

導入するもの

$ kfctl version
kfctl v1.1.0-0-g9a3621e

導入の仕方

導入環境

OS: 18.04.3 LTS (Bionic Beaver)
Kubernetes: v1.15.7

ハマったこと

PV

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"

リソース不足

$ kubectl describe node
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 5520m (46%) 45900m (382%)
memory 7408896512 (22%) 30141130Ki (91%)
ephemeral-storage 0 (0%) 0 (0%)

istio-tokenがマウントできないエラー

GUIのエラー

$ kubectl get svc -n istio-system istio-ingressgateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway NodePort 10.105.148.54 <none> 15020:31842/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:31654/TCP,15030:30459/TCP,15031:30005/TCP,15032:31611/TCP,15443:32149/TCP 2d20h

小ネタ

$ kubectl config set-context --current --namespace=kubeflow

仕上がり

$ kubectl get all -n kubeflow
NAME READY STATUS RESTARTS AGE
pod/admission-webhook-bootstrap-stateful-set-0 1/1 Running 4 2d20h
pod/admission-webhook-deployment-795bb748-pxcwx 1/1 Running 0 45h
pod/application-controller-stateful-set-0 1/1 Running 2 2d20h
pod/argo-ui-657d964995-8t4vg 1/1 Running 2 2d20h
pod/cache-deployer-deployment-867cf86c64-cjnxv 2/2 Running 3 2d20h
pod/cache-server-65596854d-wb42d 2/2 Running 0 2d20h
pod/centraldashboard-54c547bd7f-d2c42 1/1 Running 6 2d20h
pod/jupyter-web-app-deployment-56dc859fdd-l2gqn 1/1 Running 2 2d20h
pod/katib-controller-6fc96fddf8-xxpxv 1/1 Running 3 2d20h
pod/katib-db-manager-78d458db46-gpnqc 1/1 Running 257 2d20h
pod/katib-mysql-7f9cfccb98-45zxr 1/1 Running 2 2d20h
pod/katib-ui-74768457d5-8cvx5 1/1 Running 2 2d20h
pod/kfserving-controller-manager-0 2/2 Running 2 2d20h
pod/kubeflow-pipelines-profile-controller-588884d9bb-dk8jz 1/1 Running 2 2d20h
pod/metacontroller-0 1/1 Running 2 2d20h
pod/metadata-db-7fc598bbb5-kfr7b 1/1 Running 1 2d20h
pod/metadata-deployment-7578c6bc46-4wzbs 1/1 Running 497 2d20h
pod/metadata-envoy-deployment-75df6688bb-vx9w8 1/1 Running 2 2d20h
pod/metadata-grpc-deployment-76d44cfd88-czl2c 1/1 Running 222 2d20h
pod/metadata-ui-794f6dcc5b-7nw5b 1/1 Running 2 2d20h
pod/metadata-writer-694c48ccdc-qmvc5 2/2 Running 0 2d20h
pod/minio-655ddb4d95-ccqsx 1/1 Running 1 2d20h
pod/ml-pipeline-5df444d46d-65rgq 2/2 Running 0 2d20h
pod/ml-pipeline-persistenceagent-9f5c875d-dxvpp 2/2 Running 0 2d20h
pod/ml-pipeline-scheduledworkflow-768c4d65d4-gltdl 2/2 Running 0 2d20h
pod/ml-pipeline-ui-8589d58598-tcffh 2/2 Running 0 2d20h
pod/ml-pipeline-viewer-crd-5dd6cc5f56-wsj78 2/2 Running 1 2d20h
pod/ml-pipeline-visualizationserver-9b67b8b68-6cq76 2/2 Running 0 2d20h
pod/mpi-operator-55457d5f54-5f74v 1/1 Running 5 2d20h
pod/mxnet-operator-68bf5b4fbc-gdnc2 1/1 Running 4 2d20h
pod/mysql-56f64cfcc-z2kgq 2/2 Running 0 45h
pod/notebook-controller-deployment-6f789d748-5wbcv 1/1 Running 2 2d20h
pod/profiles-deployment-6fffd9c9-fwbt8 2/2 Running 4 2d20h
pod/pytorch-operator-d449c769b-hqm55 1/1 Running 9 2d20h
pod/seldon-controller-manager-68f9f7bff6-jkb57 1/1 Running 5 2d20h
pod/spark-operatorsparkoperator-758795c89b-vbrhf 1/1 Running 2 2d20h
pod/spartakus-volunteer-69f5b89c96-njknm 1/1 Running 2 2d20h
pod/tf-job-operator-644f847f5c-2844p 1/1 Running 9 2d20h
pod/workflow-controller-dd8985f4d-qxh8m 1/1 Running 2 2d20h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/admission-webhook-service ClusterIP 10.106.24.104 <none> 443/TCP 2d20h
service/application-controller-service ClusterIP 10.97.13.119 <none> 443/TCP 2d20h
service/argo-ui NodePort 10.110.193.72 <none> 80:32065/TCP 2d20h
service/cache-server ClusterIP 10.98.200.153 <none> 443/TCP 2d20h
service/centraldashboard ClusterIP 10.110.2.44 <none> 80/TCP 2d20h
service/jupyter-web-app-service ClusterIP 10.97.229.226 <none> 80/TCP 2d20h
service/katib-controller ClusterIP 10.106.123.248 <none> 443/TCP,8080/TCP 2d20h
service/katib-db-manager ClusterIP 10.107.254.8 <none> 6789/TCP 2d20h
service/katib-mysql ClusterIP 10.108.229.228 <none> 3306/TCP 2d20h
service/katib-ui ClusterIP 10.108.163.144 <none> 80/TCP 2d20h
service/kfserving-controller-manager-metrics-service ClusterIP 10.107.37.169 <none> 8443/TCP 2d20h
service/kfserving-controller-manager-service ClusterIP 10.98.195.250 <none> 443/TCP 2d20h
service/kfserving-webhook-server-service ClusterIP 10.106.79.84 <none> 443/TCP 2d20h
service/kubeflow-pipelines-profile-controller ClusterIP 10.109.47.5 <none> 80/TCP 2d20h
service/metadata-db ClusterIP 10.99.251.151 <none> 3306/TCP 2d20h
service/metadata-envoy-service ClusterIP 10.100.48.115 <none> 9090/TCP 2d20h
service/metadata-grpc-service ClusterIP 10.100.33.121 <none> 8080/TCP 2d20h
service/metadata-service ClusterIP 10.97.165.97 <none> 8080/TCP 2d20h
service/metadata-ui ClusterIP 10.97.253.2 <none> 80/TCP 2d20h
service/minio-service ClusterIP 10.110.118.90 <none> 9000/TCP 2d20h
service/ml-pipeline ClusterIP 10.96.66.86 <none> 8888/TCP,8887/TCP 2d20h
service/ml-pipeline-ui ClusterIP 10.103.33.58 <none> 80/TCP 2d20h
service/ml-pipeline-visualizationserver ClusterIP 10.98.43.116 <none> 8888/TCP 2d20h
service/mysql ClusterIP 10.97.209.58 <none> 3306/TCP 2d20h
service/notebook-controller-service ClusterIP 10.110.5.82 <none> 443/TCP 2d20h
service/profiles-kfam ClusterIP 10.106.127.68 <none> 8081/TCP 2d20h
service/pytorch-operator ClusterIP 10.105.224.245 <none> 8443/TCP 2d20h
service/seldon-webhook-service ClusterIP 10.100.237.108 <none> 443/TCP 2d20h
service/tf-job-operator ClusterIP 10.108.121.94 <none> 8443/TCP 2d20h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/admission-webhook-deployment 1/1 1 1 2d20h
deployment.apps/argo-ui 1/1 1 1 2d20h
deployment.apps/cache-deployer-deployment 1/1 1 1 2d20h
deployment.apps/cache-server 1/1 1 1 2d20h
deployment.apps/centraldashboard 1/1 1 1 2d20h
deployment.apps/jupyter-web-app-deployment 1/1 1 1 2d20h
deployment.apps/katib-controller 1/1 1 1 2d20h
deployment.apps/katib-db-manager 1/1 1 1 2d20h
deployment.apps/katib-mysql 1/1 1 1 2d20h
deployment.apps/katib-ui 1/1 1 1 2d20h
deployment.apps/kubeflow-pipelines-profile-controller 1/1 1 1 2d20h
deployment.apps/metadata-db 1/1 1 1 2d20h
deployment.apps/metadata-deployment 1/1 1 1 2d20h
deployment.apps/metadata-envoy-deployment 1/1 1 1 2d20h
deployment.apps/metadata-grpc-deployment 1/1 1 1 2d20h
deployment.apps/metadata-ui 1/1 1 1 2d20h
deployment.apps/metadata-writer 1/1 1 1 2d20h
deployment.apps/minio 1/1 1 1 2d20h
deployment.apps/ml-pipeline 1/1 1 1 2d20h
deployment.apps/ml-pipeline-persistenceagent 1/1 1 1 2d20h
deployment.apps/ml-pipeline-scheduledworkflow 1/1 1 1 2d20h
deployment.apps/ml-pipeline-ui 1/1 1 1 2d20h
deployment.apps/ml-pipeline-viewer-crd 1/1 1 1 2d20h
deployment.apps/ml-pipeline-visualizationserver 1/1 1 1 2d20h
deployment.apps/mpi-operator 1/1 1 1 2d20h
deployment.apps/mxnet-operator 1/1 1 1 2d20h
deployment.apps/mysql 1/1 1 1 2d20h
deployment.apps/notebook-controller-deployment 1/1 1 1 2d20h
deployment.apps/profiles-deployment 1/1 1 1 2d20h
deployment.apps/pytorch-operator 1/1 1 1 2d20h
deployment.apps/seldon-controller-manager 1/1 1 1 2d20h
deployment.apps/spark-operatorsparkoperator 1/1 1 1 2d20h
deployment.apps/spartakus-volunteer 1/1 1 1 2d20h
deployment.apps/tf-job-operator 1/1 1 1 2d20h
deployment.apps/workflow-controller 1/1 1 1 2d20h
NAME DESIRED CURRENT READY AGE
replicaset.apps/admission-webhook-deployment-795bb748 1 1 1 2d20h
replicaset.apps/argo-ui-657d964995 1 1 1 2d20h
replicaset.apps/cache-deployer-deployment-867cf86c64 1 1 1 2d20h
replicaset.apps/cache-server-65596854d 1 1 1 2d20h
replicaset.apps/centraldashboard-54c547bd7f 1 1 1 2d20h
replicaset.apps/jupyter-web-app-deployment-56dc859fdd 1 1 1 2d20h
replicaset.apps/katib-controller-6fc96fddf8 1 1 1 2d20h
replicaset.apps/katib-db-manager-78d458db46 1 1 1 2d20h
replicaset.apps/katib-mysql-7f9cfccb98 1 1 1 2d20h
replicaset.apps/katib-ui-74768457d5 1 1 1 2d20h
replicaset.apps/kubeflow-pipelines-profile-controller-588884d9bb 1 1 1 2d20h
replicaset.apps/metadata-db-7fc598bbb5 1 1 1 2d20h
replicaset.apps/metadata-deployment-7578c6bc46 1 1 1 2d20h
replicaset.apps/metadata-envoy-deployment-75df6688bb 1 1 1 2d20h
replicaset.apps/metadata-grpc-deployment-76d44cfd88 1 1 1 2d20h
replicaset.apps/metadata-ui-794f6dcc5b 1 1 1 2d20h
replicaset.apps/metadata-writer-694c48ccdc 1 1 1 2d20h
replicaset.apps/minio-655ddb4d95 1 1 1 2d20h
replicaset.apps/ml-pipeline-5df444d46d 1 1 1 2d20h
replicaset.apps/ml-pipeline-persistenceagent-9f5c875d 1 1 1 2d20h
replicaset.apps/ml-pipeline-scheduledworkflow-768c4d65d4 1 1 1 2d20h
replicaset.apps/ml-pipeline-ui-8589d58598 1 1 1 2d20h
replicaset.apps/ml-pipeline-viewer-crd-5dd6cc5f56 1 1 1 2d20h
replicaset.apps/ml-pipeline-visualizationserver-9b67b8b68 1 1 1 2d20h
replicaset.apps/mpi-operator-55457d5f54 1 1 1 2d20h
replicaset.apps/mxnet-operator-68bf5b4fbc 1 1 1 2d20h
replicaset.apps/mysql-56f64cfcc 1 1 1 2d20h
replicaset.apps/notebook-controller-deployment-6f789d748 1 1 1 2d20h
replicaset.apps/profiles-deployment-6fffd9c9 1 1 1 2d20h
replicaset.apps/pytorch-operator-d449c769b 1 1 1 2d20h
replicaset.apps/seldon-controller-manager-68f9f7bff6 1 1 1 2d20h
replicaset.apps/spark-operatorsparkoperator-758795c89b 1 1 1 2d20h
replicaset.apps/spartakus-volunteer-69f5b89c96 1 1 1 2d20h
replicaset.apps/tf-job-operator-644f847f5c 1 1 1 2d20h
replicaset.apps/workflow-controller-dd8985f4d 1 1 1 2d20h
NAME READY AGE
statefulset.apps/admission-webhook-bootstrap-stateful-set 1/1 2d20h
statefulset.apps/application-controller-stateful-set 1/1 2d20h
statefulset.apps/kfserving-controller-manager 1/1 2d20h
statefulset.apps/metacontroller 1/1 2d20h

おまけ:細かいバージョン

$ kubectl get pods -n kubeflow -o=custom-columns='NAME:.metadata.name,DATA:spec.containers[*].image'
NAME DATA
admission-webhook-bootstrap-stateful-set-0 gcr.io/kubeflow-images-public/ingress-setup:latest
admission-webhook-deployment-795bb748-pxcwx gcr.io/kubeflow-images-public/admission-webhook:vmaster-gaf96e4e3
application-controller-stateful-set-0 gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta
argo-ui-657d964995-8t4vg argoproj/argoui:v2.3.0
cache-deployer-deployment-867cf86c64-cjnxv gcr.io/ml-pipeline/cache-deployer:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
cache-server-65596854d-wb42d gcr.io/ml-pipeline/cache-server:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
centraldashboard-54c547bd7f-d2c42 gcr.io/kubeflow-images-public/centraldashboard:vmaster-gf39279c0
jupyter-web-app-deployment-56dc859fdd-l2gqn gcr.io/kubeflow-images-public/jupyter-web-app:vmaster-gd9be4b9e
katib-controller-6fc96fddf8-xxpxv gcr.io/kubeflow-images-public/katib/v1alpha3/katib-controller:917164a
katib-db-manager-78d458db46-gpnqc gcr.io/kubeflow-images-public/katib/v1alpha3/katib-db-manager:917164a
katib-mysql-7f9cfccb98-45zxr mysql:8
katib-ui-74768457d5-8cvx5 gcr.io/kubeflow-images-public/katib/v1alpha3/katib-ui:917164a
kfserving-controller-manager-0 gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0,gcr.io/kfserving/kfserving-controller:v0.3.0
kubeflow-pipelines-profile-controller-588884d9bb-dk8jz python:3.7
metacontroller-0 metacontroller/metacontroller:v0.3.0
metadata-db-7fc598bbb5-kfr7b mysql:8.0.3
metadata-deployment-7578c6bc46-4wzbs gcr.io/kubeflow-images-public/metadata:v0.1.11
metadata-envoy-deployment-75df6688bb-vx9w8 gcr.io/ml-pipeline/envoy:metadata-grpc
metadata-grpc-deployment-76d44cfd88-czl2c gcr.io/tfx-oss-public/ml_metadata_store_server:v0.21.1
metadata-ui-794f6dcc5b-7nw5b gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8
metadata-writer-694c48ccdc-qmvc5 gcr.io/ml-pipeline/metadata-writer:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
minio-655ddb4d95-ccqsx gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance
ml-pipeline-5df444d46d-65rgq gcr.io/ml-pipeline/api-server:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
ml-pipeline-persistenceagent-9f5c875d-dxvpp gcr.io/ml-pipeline/persistenceagent:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
ml-pipeline-scheduledworkflow-768c4d65d4-gltdl gcr.io/ml-pipeline/scheduledworkflow:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
ml-pipeline-ui-8589d58598-tcffh gcr.io/ml-pipeline/frontend:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
ml-pipeline-viewer-crd-5dd6cc5f56-wsj78 gcr.io/ml-pipeline/viewer-crd-controller:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
ml-pipeline-visualizationserver-9b67b8b68-6cq76 gcr.io/ml-pipeline/visualization-server:1.0.0,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
mpi-operator-55457d5f54-5f74v mpioperator/mpi-operator:latest
mxnet-operator-68bf5b4fbc-gdnc2 kubeflow/mxnet-operator:v1.0.0-20200625
mysql-56f64cfcc-z2kgq gcr.io/ml-pipeline/mysql:5.6,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
notebook-controller-deployment-6f789d748-5wbcv gcr.io/kubeflow-images-public/notebook-controller:vmaster-gf39279c0
profiles-deployment-6fffd9c9-fwbt8 gcr.io/kubeflow-images-public/profile-controller:vmaster-g34aa47c2,gcr.io/kubeflow-images-public/kfam:v1.1.0-g9f3bfd00
pytorch-operator-d449c769b-hqm55 gcr.io/kubeflow-images-public/pytorch-operator:vmaster-gd596e904
seldon-controller-manager-68f9f7bff6-jkb57 docker.io/seldonio/seldon-core-operator:1.2.1
spark-operatorsparkoperator-758795c89b-vbrhf gcr.io/spark-operator/spark-operator:v1beta2-1.1.0-2.4.5
spartakus-volunteer-69f5b89c96-njknm gcr.io/google_containers/spartakus-amd64:v1.1.0
tf-job-operator-644f847f5c-2844p gcr.io/kubeflow-images-public/tf_operator:vmaster-ga2ae7bff
workflow-controller-dd8985f4d-qxh8m argoproj/workflow-controller:v2.3.0
$ kubectl get pods -n istio-system -o=custom-columns='NAME:.metadata.name,DATA:spec.containers[*].image'
NAME DATA
cluster-local-gateway-f4967d447-57txx docker.io/istio/proxyv2:1.3.1
istio-citadel-79b5b568b-g6lnc gcr.io/istio-release/citadel:release-1.3-latest-daily
istio-galley-756f5f45c4-lhlsf gcr.io/istio-release/galley:release-1.3-latest-daily
istio-ingressgateway-77f74c944c-b2xxt gcr.io/istio-release/proxyv2:release-1.3-latest-daily
istio-nodeagent-f4bkx gcr.io/istio-release/node-agent-k8s:release-1.3-latest-daily
istio-pilot-55f7f6f6df-jdxcg gcr.io/istio-release/pilot:release-1.3-latest-daily,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
istio-policy-76dbd68445-kcftf gcr.io/istio-release/mixer:release-1.3-latest-daily,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
istio-security-post-install-release-1.3-latest-daily-wkzlz gcr.io/istio-release/kubectl:release-1.3-latest-daily
istio-sidecar-injector-5d9f474dcb-8v2vs gcr.io/istio-release/sidecar_injector:release-1.3-latest-daily
istio-telemetry-697c8fd794-d66xt gcr.io/istio-release/mixer:release-1.3-latest-daily,gcr.io/istio-release/proxyv2:release-1.3-latest-daily
prometheus-b845cc6fc-zcdqb docker.io/prom/prometheus:v2.8.0

Work for Hewlett Packard Enterprise as Solution Architect / Write on IT / Infrastructure / Cloud Native / Kubernetes / OpenShift / Japanese|English

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store