Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation on GCP stuck on "writing google specific secrets to vault secret store" #2199

Open
1 task done
PGimenez opened this issue Jun 13, 2024 · 13 comments
Open
1 task done
Assignees
Labels
bug Something isn't working

Comments

@PGimenez
Copy link

PGimenez commented Jun 13, 2024

Which version of kubefirst are you using?

v2.4.10

Which cloud provider?

Google Cloud

Which DNS?

Cloudflare

Which installation type?

UI (Console app)

Which distributed Git provider?

GitHub

Did you use a fork of gitops-template?

No

Which Operating System?

macOS

What is the issue?

The installation gets stuck for over two hours at 93% writing google specific secrets to vault secret store
CleanShot 2024-06-14 at 00 10 45

Only thing I can see is the vault-1 pod in unhealthy status and showing this message in the logs:
2024-06-13T22:14:01.017Z [INFO] core.autoseal: seal configuration missing, not initialized: seal_type=recovery

update: if I kill the vault-1 pod it runs fine afterwards. however, if I reload the page to restart the install process, it gets stuck at the same point

Code of Conduct

  • I agree to follow this project's Code of Conduct
@PGimenez PGimenez added the bug Something isn't working label Jun 13, 2024
@Eric-TPS
Copy link

I'm receiving these issues on a DigitalOcean aswell.

@PGimenez
Copy link
Author

If I extract the vault token as explained here and set the env var VAULT_TOKEN="hvs.----", I can then finish the install process manually by going into ~/.k1/clustername/gitops/terraform/users and doing terraform apply.

some things still don't work, like the ingress for argocd, but otherwise all pods are healthy and I can connect to them

@mrsimonemms mrsimonemms self-assigned this Jun 17, 2024
@jarededwards
Copy link
Member

@PGimenez to get the argocd ingress working you can add this file to your gitops repository argocd components. you will also need to reference this file in the kustomization.yaml in the same directory

@jarededwards
Copy link
Member

I'm receiving these issues on a DigitalOcean aswell.

@Eric-TPS can you confirm what version of kubefirst you're using? I just ran with latest v2.4.10 with GitHub in DigitalOcean and everything worked well for me. Can you let me know how you installed so i can try to reproduce? feel free to find me in our slack if its easier for communication. thanks in advance!

@PGimenez
Copy link
Author

Just to provide some more info, besides using the UI I've also tried with the cli with this command

kubefirst beta google create \
	--alerts-email [email protected] \
	--github-org orgname \
	--domain-name domain \
	--google-project project-426512 \
	--cluster-name cluster \
	--force-destroy true \
	--cloud-region europe-central2 \
	--node-count 1 \
	--dns-provider cloudflare

It stopped at writing the google secrets as before. I executed terraform apply manually in each folder, added the secrets in Vault, and Argo finished syncing everything.

Still, the installation seemed incomplete as the kubefirst UI wouldn't let me create namespaces nor clusters, and 2/3 of the pods were unhealthy due to missing license.

@mrsimonemms
Copy link

mrsimonemms commented Jun 19, 2024

@PGimenez have you always used a single node in your cluster? I used 3 nodes when I did my tests yesterday which were successful. I've just tried again with a single node (--node-count 1) and that failed to deploy Vault (I don't think this is the error you're reporting though)

{"level":"info","time":"2024-06-19T10:08:53Z","message":"updated Secret kubefirst-cluster-sje-trygitops in Namespace kubefirst\n"}
{"level":"error","time":"2024-06-19T10:08:53Z","message":"the StatefulSet was not created within the timeout period"}

EDIT. I've just successfully installed using your command (with creds changed).

@Eric-TPS
Copy link

I'm receiving these issues on a DigitalOcean aswell.

@Eric-TPS can you confirm what version of kubefirst you're using? I just ran with latest v2.4.10 with GitHub in DigitalOcean and everything worked well for me. Can you let me know how you installed so i can try to reproduce? feel free to find me in our slack if its easier for communication. thanks in advance!

@jarededwards - I attempted to deploy using version 2.3.7 from the DO marketplace. The Kubefirst cluster deployed fine, but the environment it tried to deploy would not complete at the vault stage. I attempted a redeployment multiple time without success.

https://marketplace.digitalocean.com/apps/kubefirst

@mrsimonemms
Copy link

@PGimenez Can you try rerunning with --node-count set as 2 (or greater) please? I don't think this is the root cause, but my instance running with just one node was incredibly slow and then started crashing pods in a random order due to running out of memory.

@PGimenez
Copy link
Author

PGimenez commented Jun 19, 2024

I tried with 2 nodes per zone (6 nodes total, isn't this overkill?) but now the install gets stuck creating the keyrings with this error:

{"level":"debug","time":"2024-06-19T21:26:44Z","message":"OUT: \u001b[0m\u001b[0m\u001b[1mmodule.vault_keys.google_kms_key_ring.key_ring: Creating...\u001b[0m\u001b[0m"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m╷\u001b[0m\u001b[0m"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m│\u001b[0m \u001b[0m\u001b[1m\u001b[31mError: \u001b[0m\u001b[0m\u001b[1mError creating KeyRing: googleapi: Error 409: KeyRing projects/kubefirst-426920/locations/global/keyRings/vault-kubefirst-r1kv1 already exists.\u001b[0m"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m│\u001b[0m \u001b[0m"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m│\u001b[0m \u001b[0m\u001b[0m  with module.vault_keys.google_kms_key_ring.key_ring,"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m│\u001b[0m \u001b[0m  on modules/kms/main.tf line 6, in resource \"google_kms_key_ring\" \"key_ring\":"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m│\u001b[0m \u001b[0m   6: resource \"google_kms_key_ring\" \"key_ring\" \u001b[4m{\u001b[0m\u001b[0m"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m│\u001b[0m \u001b[0m"}
{"level":"debug","time":"2024-06-19T21:26:49Z","message":"ERR: \u001b[31m╵\u001b[0m\u001b[0m"}
{"level":"debug","time":"2024-06-19T21:26:50Z","message":"command \"/root/.k1/kubefirst/tools/terraform\" failed"}
{"level":"debug","time":"2024-06-19T21:26:50Z","message":"error: terraform apply -auto-approve for /root/.k1/kubefirst/gitops/terraform/google failed exit status 1"}
{"level":"error","time":"2024-06-19T21:26:51Z","message":"error creating google resources with terraform /root/.k1/kubefirst/gitops/terraform/google: exit status 1"}
{"level":"info","time":"2024-06-19T21:26:51Z","message":"updated Secret kubefirst-cluster-kubefirst in Namespace kubefirst\n"}
{"level":"info","time":"2024-06-19T21:26:51Z","message":"updated Secret kubefirst-cluster-kubefirst in Namespace kubefirst\n"}
{"level":"error","time":"2024-06-19T21:26:51Z","message":"error creating google resources with terraform /root/.k1/kubefirst/gitops/terraform/google: exit status 1"}

I've tried creating new projects, disabling/enabling the kms api to delete all keyrings, but this error keeps appearing :/

@mrsimonemms
Copy link

mrsimonemms commented Jun 20, 2024

I tried with 2 nodes per zone (6 nodes total, isn't this overkill?)

On the face of it, yes but there's a lot going on in the cluster. It's definitely something to look at post-beta.

The google_kms_key_ring error is understandable as that's not something that can be deleted which is annoying.

Can you try importing it into your TF state?

@PGimenez
Copy link
Author

I tried again with 2 nodes per zone in a new project, and I'm stuck at the same place as in my previous post. Manually finished the TF apply, added the secrets manually as well, but 2 of the 3 kubefirst-api pods are not ready with this error

{"level":"info","time":"2024-06-22T19:02:24Z","message":"error loading .env file, using local environment variables"}                         │
│ {"level":"info","time":"2024-06-22T19:02:24Z","message":"checking for cluster import secret for management cluster"}                          │
│ {"level":"info","time":"2024-06-22T19:02:24Z","message":"reading secret kubefirst-initial-state to determine if import is needed"}            │
│ {"level":"error","time":"2024-06-22T19:02:24Z","message":"error getting secret: secrets \"kubefirst-initial-state\" not found\n"}             │
│ {"level":"info","time":"2024-06-22T19:02:24Z","message":"error reading secret kubefirst-initial-state. secrets \"kubefirst-initial-state\" no │
│ {"level":"fatal","time":"2024-06-22T19:02:24Z","message":"secrets \"kubefirst-initial-state\" not found"}

The remaining pod works, but I cannot create anything in the UI

image

If I try to continue the installation with the kubefirst cli, I get the 500 error although I've manually added the secrets

{"level":"info","time":"2024-06-22T18:58:43Z","message":"pod \"vault-0\" at namespace \"vault\" has port-forward accepting local connections at port 8200\n"}
{"level":"info","time":"2024-06-22T18:58:46Z","message":"writing google specific secrets to vault secret store"}
{"level":"info","time":"2024-06-22T20:58:50+02:00","message":"unable to get cluster 500 Internal Server Error, continuing"}

I'm going to give the k3d install a try.

@mrsimonemms
Copy link

@PGimenez I've messaged you in our Slack to do a pairing session on this as it's not making an awful lot of sense as to why I can't recreate this.

@mrsimonemms
Copy link

@PGimenez apropos of nothing, I've just had a similar issue with a GCP cluster. I "solved" the problem by deleting the bad Vault pod. It restarted in a healthy condition.

It's not a solution, but it is a potential workaround for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

4 participants