Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Updated arm64 support patch #2491

Merged
merged 3 commits into from
Mar 24, 2020
Merged

feat: Updated arm64 support patch #2491

merged 3 commits into from
Mar 24, 2020

Conversation

MrXinWang
Copy link
Contributor

@MrXinWang MrXinWang commented Mar 21, 2020

Resubmit the arm64 support patch. I think everything should work perfectly now.

Fixes: #2080

Here are the test process and test logs in (really) detail:
(Also upload this patch in a single file for reproducing: 0001-Updated-arm64-support-patch.zip)

Test process:

# Setup code base on both worker and master nodes.
$ git clone https://github.com/argoproj/argo
$ cd argo 
# Apply this patch
$ git apply 0001-Updated-arm64-support-patch.patch

# Build and deploy in k8s cluster
# On worker node
$ make build

Logs in building process:

# GIT_TAG=bfaf1c21, GIT_BRANCH=master, GIT_TREE_STATE=dirty, VERSION=latest, DEV_IMAGE=true
...
docker build -t argoproj/argocli:latest --target argocli -f Dockerfile.dev .
[+] Building 171.6s (18/18) FINISHED
...
docker build -t argoproj/argoexec:latest --target argoexec -f Dockerfile.dev .
[+] Building 14.8s (16/16) FINISHED
...
docker build -t argoproj/workflow-controller:latest --target workflow-controller -f Dockerfile.dev .
[+] Building 9.3s (12/12) FINISHED
...
echo latest > dist/MANIFESTS_VERSION
kustomize build manifests/cluster-install | sed "s/:latest/:latest/" | ./hack/auto-gen-msg.sh > manifests/install.yaml
kustomize build manifests/namespace-install | sed "s/:latest/:latest/" | ./hack/auto-gen-msg.sh > manifests/namespace-install.yaml
kustomize build manifests/quick-start/postgres | sed "s/:latest/:latest/" | ./hack/auto-gen-msg.sh > manifests/quick-start-postgres.yaml
kustomize build manifests/quick-start/mysql | sed "s/:latest/:latest/" | ./hack/auto-gen-msg.sh > manifests/quick-start-mysql.yaml

Then I modified the test/e2e/manifests/postgres.yaml Line#426 and Line#512 on the master node to IfNotPresent to avoid Container image "argoproj/argocli:latest" is not present with pull policy of Never error in deployment and also avoid pulling images from dockerhub.

Then running:

# On master node
$ make build && make start

Outputs should be similar to:

root@entos-x86-desktop:~/argo# make start
# GIT_TAG=bfaf1c21, GIT_BRANCH=master, GIT_TREE_STATE=dirty, VERSION=latest, DEV_IMAGE=true
# Create Postgres e2e manifests
cat test/e2e/manifests/postgres.yaml | sed 's/:latest/:latest/' | sed 's/pns/pns/' > dist/postgres.yaml
# Install Postgres quick-start
kubectl get ns argo || kubectl create ns argo
NAME   STATUS   AGE
argo   Active   47m
kubectl -n argo apply -f dist/postgres.yaml
customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io unchanged
serviceaccount/argo unchanged
serviceaccount/argo-server unchanged
role.rbac.authorization.k8s.io/argo-role unchanged
role.rbac.authorization.k8s.io/argo-server-role unchanged
role.rbac.authorization.k8s.io/workflow-role unchanged
rolebinding.rbac.authorization.k8s.io/argo-binding unchanged
rolebinding.rbac.authorization.k8s.io/argo-server-binding unchanged
rolebinding.rbac.authorization.k8s.io/workflow-default-binding unchanged
configmap/workflow-controller-configmap unchanged
secret/argo-postgres-config configured
secret/my-minio-cred configured
service/argo-server unchanged
service/minio unchanged
service/postgres unchanged
service/workflow-controller-metrics unchanged
deployment.apps/argo-server configured
deployment.apps/postgres unchanged
deployment.apps/workflow-controller configured
pod/minio unchanged
# Scale down
kubectl -n argo scale deployment/argo-server --replicas 0
deployment.extensions/argo-server scaled
kubectl -n argo scale deployment/workflow-controller --replicas 0
deployment.extensions/workflow-controller scaled
# Wait for pods to go away, so we don't wait for them to be ready later.
[ "`kubectl -n argo get pod -l app=argo-server -o name`" = "" ] || kubectl -n argo wait --for=delete pod -l app=argo-server  --timeout 30s
pod/argo-server-7b8494f8b5-p2gb6 condition met
pod/argo-server-945dcfc4d-btgcm condition met
[ "`kubectl -n argo get pod -l app=workflow-controller -o name`" = "" ] || kubectl -n argo wait --for=delete pod -l app=workflow-controller  --timeout 2m
pod/workflow-controller-5b45959dbf-zjlvs condition met
# Scale up
kubectl -n argo scale deployment/workflow-controller --replicas 1
deployment.extensions/workflow-controller scaled
kubectl -n argo scale deployment/argo-server --replicas 1
deployment.extensions/argo-server scaled
# Wait for pods to be ready
kubectl -n argo wait --for=condition=Ready pod --all -l app --timeout 2m
pod/minio condition met
pod/postgres-5999d74fc9-qj2xb condition met
pod/workflow-controller-5b45959dbf-vqqxq condition met
export ARGO_SERVER=localhost:2746
export ARGO_TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJhcmdvIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImFyZ28tc2VydmVyLXRva2VuLWw3OXB4Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFyZ28tc2VydmVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDllY2NmMDEtNmIxYS0xMWVhLWJjYzYtYTBkM2MxMjk3MWViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50OmFyZ286YXJnby1zZXJ2ZXIifQ.TQDwRdc4611zQz4l1BEP4Ra1SIWJmtiW57VLZ_UpJo2FEi4EwKfnuUfgdZUzraeTSn4VZIUTy98bfKZ6fIW7Sf1daHL7ckQ0XpvO7nF8dDD6SR3mFo9SLj5hd-NPIMV_B0RUN35n-JyDU_Yhfz8FgT7xti8vxaesFq17SIiv_mWGX_hpVaKAgk2gx8KPzkdy86dghybOVL5pAHTaZHqKNxB_rQFLb-PPfWRcoUOa5H-6bdb9hDYv2NawmQalstRCLuOHU9NeyED0o7dlk2gncSXsrPAB-B3QoY3kwQteek778TEC1SdF1MHEGz3zn9NYKiOqrRvXOOy4NGb1bLQEqw
# Switch to "argo" ns.
kubectl config set-context --current --namespace=argo
Context "kubernetes-admin@kubernetes" modified.

and the kubernetes cluster with argo can be correctly set:

root@entos-x86-desktop:~/argo# kubectl -n argo get all
NAME                                       READY   STATUS    RESTARTS   AGE
pod/argo-server-945dcfc4d-8xfqm            1/1     Running   0          76s
pod/minio                                  1/1     Running   0          19m
pod/postgres-5999d74fc9-qj2xb              1/1     Running   0          19m
pod/workflow-controller-5b45959dbf-vqqxq   1/1     Running   0          77s

NAME                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/argo-server                   ClusterIP   10.111.120.2    <none>        2746/TCP   19m
service/minio                         ClusterIP   10.109.125.44   <none>        9000/TCP   19m
service/postgres                      ClusterIP   10.99.72.36     <none>        5432/TCP   19m
service/workflow-controller-metrics   ClusterIP   10.99.171.132   <none>        9090/TCP   19m

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/argo-server           1/1     1            1           19m
deployment.apps/postgres              1/1     1            1           19m
deployment.apps/workflow-controller   1/1     1            1           19m

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/argo-server-7b8494f8b5           0         0         0       19m
replicaset.apps/argo-server-945dcfc4d            1         1         1       84s
replicaset.apps/postgres-5999d74fc9              1         1         1       19m
replicaset.apps/workflow-controller-569f77574    0         0         0       19m
replicaset.apps/workflow-controller-5b45959dbf   1         1         1       84s

Signed-off-by: Henry Wang <[email protected]>

@MrXinWang
Copy link
Contributor Author

MrXinWang commented Mar 21, 2020

Hi @alexec ! I think this version of patch should work well. Please kindly take another look and review it again. You can also reproduce this patch to verify it on your machine before merging it.

An important thing I noticed is that when you setup everything, you have to make sure that the argoproj/argocli:latest and other images used in the worker node of the k8s cluster is the ones you newly built, instead of those pulled from dockerhub (dockerhub images are x86_64 only). I think that is probably the reason why last time your image cannot be run. To support the multiarch image, we need to update a docker manifest for multiarch images.

An example (I used tag arm_support for a clear explanation)

$ kubectl -n argo describe pods argo-server-6f9f788bbc-fjqlr

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  65s   default-scheduler  Successfully assigned argo/argo-server-6f9f788bbc-fjqlr to argo
  Normal  Pulled     57s   kubelet, argo      Container image "argoproj/argocli:arm_support" already present on machine
  Normal  Created    54s   kubelet, argo      Created container argo-server
  Normal  Started    53s   kubelet, argo      Started container argo-server

If any problems happen, please let me know how you produced that and I will update this PR based on your comments.

Thanks very much!

@codecov
Copy link

codecov bot commented Mar 21, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@559cb00). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2491   +/-   ##
=========================================
  Coverage          ?   11.36%           
=========================================
  Files             ?       74           
  Lines             ?    31114           
  Branches          ?        0           
=========================================
  Hits              ?     3537           
  Misses            ?    27102           
  Partials          ?      475

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 559cb00...4ffa619. Read the comment docs.

Copy link
Member

@whynowy whynowy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seeing following error:

standard_init_linux.go:211: exec user process caused "exec format error"

How to reproduce:

make controller-image cli-image and deploy them.

@MrXinWang
Copy link
Contributor Author

Hi @whynowy , is this error happens on x86_64 or on arm64?

@whynowy
Copy link
Member

whynowy commented Mar 23, 2020

Hi @whynowy , is this error happens on x86_64 or on arm64?

x86_64

@MrXinWang
Copy link
Contributor Author

MrXinWang commented Mar 23, 2020

x86_64

It is very weird because this error is caused by the image architecture doesnt fit machine architecture, and from make controller-image cli-image these images were built exactly on that machine...I just repeated on my worker node (where the images are actually deployed), and I didnt see this error... I will change another machine to check..

After make controller-image cli-image on worker node

root@argo:~/argo# make clear && make controller-image cli-image
...
root@argo:~/argo# docker images
REPOSITORY                     TAG                            IMAGE ID            CREATED              SIZE
argoproj/argocli               arm_support                    9dcd16d8f22d        About a minute ago   65.8MB
argoproj/workflow-controller   arm_support                    4356f245f314        About a minute ago   49.8MB

Re-deploy by kubectl apply -f postgres.yaml on master node

root@entos-x86-desktop:~/argo/dist# kubectl -n argo delete deployments argo-server workflow-controller
...
root@entos-x86-desktop:~/argo/dist# kubectl apply -f postgres.yaml
...
root@entos-x86-desktop:~/argo/dist# kubectl -n argo get all
NAME                                       READY   STATUS    RESTARTS   AGE
pod/argo-server-6f9f788bbc-74h88           1/1     Running   0          32s
pod/minio                                  1/1     Running   0          47h
pod/postgres-5999d74fc9-qj2xb              1/1     Running   0          47h
pod/workflow-controller-69785cfb65-mdk52   1/1     Running   0          32s

Containers:
  argo-server:
    Container ID:  docker:https://4a4efdb213b7bfe003af3525889725a25f8567c2759049acd3ccc86dc80e750d
    Image:         argoproj/argocli:arm_support
    Image ID:      docker:https://sha256:9dcd16d8f22d8015c17ea367802df266911e6dbd7c8e071a82f790ada42e2adf


Containers:
  workflow-controller:
    Container ID:  docker:https://ba3add2458430c1c7d74fe40ee22f904f2e8b09850b0205eb1bddb3d202519a4
    Image:         argoproj/workflow-controller:arm_support

@MrXinWang
Copy link
Contributor Author

MrXinWang commented Mar 23, 2020

Hi @whynowy , tried on another x86_64 machine with same method I proposed above, also didnt find the error...I think the only thing might be useful now for our discussion is uploading my settings of cluster as follows:

OS: Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-72-generic x86_64)

k8s: v1.14.9 on both master and worker, container networking cni is flannel.

docker: I tried both v18.09.7 and v19.03.2, no differences.

Golang: go 1.14.1.

Makefile Outdated

.PHONY: controller-image
controller-image: dist/controller-image

dist/controller-image: dist/workflow-controller-linux-amd64
dist/controller-image: dist/workflow-controller-$(GOOS)-$(GOARCH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think this change will provide enabling multi-arch image, without harming existing code structure. Reasons given as follows:

On amd64 (x86_64) machines, building target dist/controller-image will still give us x86_64 docker images.

On arm64 (aarch64) machines, this change, together with the changes in Dockerfile and Dockerfile.dev, will enable us to build arm64 container images.

After this, if we have above two images, by adding a new docker manifest, we can make our dockehub image multiarch.

Makefile Outdated Show resolved Hide resolved
@whynowy
Copy link
Member

whynowy commented Mar 23, 2020

Hi @whynowy , tried on another x86_64 machine with same method I proposed above, also didnt find the error...I think the only thing might be useful now for our discussion is uploading my settings of cluster as follows:

OS: Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-72-generic x86_64)

k8s: v1.14.9 on both master and worker, container networking cni is flannel.

docker: I tried both v18.09.7 and v19.03.2, no differences.

Golang: go 1.14.1.

We don't go to a x86_64 OS to build the images for workflow-controller and argo-server.

@MrXinWang
Copy link
Contributor Author

MrXinWang commented Mar 23, 2020

We don't go to a x86_64 OS to build the images for workflow-controller and argo-server.

Ahhhh that is the reason! Right, now I understand. I think based on my comments above, now I should add a conditional to make the Makefile produce images as we desired. Probably adding a new variable to accept commandline variables, therefore we can produce docker images for different architectures, or just delete my architecture changes in Dockerfile, WDYT? Which one do you prefer?

@whynowy
Copy link
Member

whynowy commented Mar 23, 2020

We don't go to a x86_64 OS to build the images for workflow-controller and argo-server.

Ahhhh that is the reason! Right, now I understand. I think now I should add a conditional to make the Makefile produce images as we desired. Probably adding a new variable to accept commandline variables, WDYT?

Well, this changes the release strategy - providing multiple arch images for workflow-controller and argo-server. I'd leave it to @alexec .

@MrXinWang
Copy link
Contributor Author

MrXinWang commented Mar 23, 2020

Well, this changes the release strategy - providing multiple arch images for workflow-controller and argo-server. I'd leave it to @alexec .

Thanks for your answer! Well, I think whether these multiarch images are really needed can be firstly discussed by the community, personally I would prefer that way, as a lot of k8s and edge based project, such as kubeflow, used argo, and a multiarch image release will make this project fits multiple machines (at least k8s supports arm and it will be cool to have argo run on arm too :P)

Anyway if this is not needed, I would remove my changes related to the mutiarch image support, just leave the building of the arm64 binary code.

Change-Id: Ibdd92dc3a9105d01b6642cd246672b14ee1cfacd
Signed-off-by: Henry Wang <[email protected]>
@MrXinWang
Copy link
Contributor Author

MrXinWang commented Mar 23, 2020

Updated code again in order to:

  • Revert changes in Dockerfile, Dockerfile.dev in order to fit your building machine, now the images built from them should be ok. Also I managed to keep the consistancy of the release strategy, so nothing is affected, commands such as make build or make start will still just release x86_64 images.

  • Add multiarch image support. Now the Makefile can generate images for any OS/arch combinations supported by Golang.

  • OS/arch can be explicitly declared in command line when running make, for example, make build OUTPUT_IMAGE_OS=linux OUTPUT_IMAGE_ARCH=arm64 to build binary and images for arm64 architecture. Default OS/arch value is linux/amd64.

@whynowy @alexec Please review again, thanks!

@MrXinWang MrXinWang requested a review from whynowy March 23, 2020 15:41
Copy link
Member

@whynowy whynowy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Argo Workflow for ARM?
2 participants