Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: There was an issue upgrading after importing a new module #20489

Open
1 task done
zhi-lu opened this issue May 30, 2024 · 2 comments
Open
1 task done

[Bug]: There was an issue upgrading after importing a new module #20489

zhi-lu opened this issue May 30, 2024 · 2 comments
Labels

Comments

@zhi-lu
Copy link

zhi-lu commented May 30, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

My Cosmos brothers and sisters, I encountered a big problem. Based on the version of Cosmos sdk v0.50.6, my colleagues and I added a new module. It is called the Foo module (Of course, it is not called this name, but for our hidden information. Sorry.), which works like an 'auth' or 'bank'. Now we have a Cosmos sdk v0.50.6 blockchain without Foo. I want to upgrade through Cosmovisor and introduce the Foo module into our current blockchain. We have referred to the entire process of this document https://docs.cosmos.network/main/build/tooling/cosmovisor#installation.
We have perfectly replicated the migration plan for upgrading simap from v0.47 to v0.50. Unfortunately, my migration plan failed. My migration plan code upgrades.go did this:
截屏2024-05-30 20 20 21

Meanwhile, the app.go code looks like this:
截屏2024-05-30 20 22 38
截屏2024-05-30 20 22 59
截屏2024-05-30 20 23 10

We referred to the migration and upgrade code for v0.50: https://github.com/cosmos/cosmos-sdk/blob/v0.50.0/simapp/upgrades.go

Next, I will describe the exceptions of the upgrade. After I completed the migration proposal and voting operation for the upgrade, there were two validator nodes on the chain called A and B, and their weights on the chain were consistent. We propose to upgrade at a height of 350 blocks, which is feasible. After the upgrade, blocks 351, 352, and 353 are generated on the chain. At this point, they should have reached a successful consensus, but when I stopped the B node and restarted it using the 'Cosmovisor run start' method, they showed an 'ERR suggest step: consensus deaths this block invalid'; Prevoting nil err="wrong Block. Header. AppHash `. I don't understand why a node disconnection and re-entry can cause consensus anomalies. And this blockchain will not continue to block out。
This is the relevant screenshot.
WechatIMG337
WechatIMG338

They use the same old and new software during the upgrade. Is it because I lost any code or operation that caused it? I'm baffled.

Cosmos SDK Version

v0.50.6

How to reproduce?

No response

@zhi-lu zhi-lu added the T:Bug label May 30, 2024
@SpicyLemon
Copy link
Collaborator

SpicyLemon commented Jul 25, 2024

We (Provenance Blockchain) also encountered this when we upgraded our testnet chain to a version based on SDK v0.50.6. After the upgrade (cutting blocks again), if we stopped a node, then restarted it, we got that apphash error.

We fixed it by bumping the iavl library with a replace line:

replace github.com/cosmos/iavl => github.com/cosmos/iavl v1.2.0

That's replacing v1.1.2 which is being pulled in as an indirect dependency.

To fix our testnet, we then had to do a coordinated (empty) upgrade so that all the nodes stopped at the same block and got the same apphash (that would have been wrong if it were just one node stopping). When it started again with the updated version, all the nodes agreed on the apphash and it started cutting blocks again. After that 2nd upgrade, we were able to stop and restart nodes just fine.

We tried to identify what was actually different in the app store, but could only narrow it down to the staking module's store.

I'm not sure if it's related, but I see that, in the staking module's end block, there's a couple places where store writes are happening while iterating. One of the places is fixed in main, but the other is not, and both are currently in the release/v0.50.x branch.

We never figured out a way to trigger it locally, with a fresh mostly-empty chain, though.

@Senna46
Copy link

Senna46 commented Sep 9, 2024

I met this problem with a chain running sdk v0.50.3. The iavl version is 1.0.0
I executed an upgrade with only RunMigrations() and it resolved the issue.

There is likely a problem with the processing of the store added in the upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants