Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snap-bootstrap: experimental: handling install/factory reset steps within snap-bootstrap #12382

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

kubiko
Copy link
Contributor

@kubiko kubiko commented Dec 1, 2022

This is a reference PR implementing install and factory-reset modes within snap-bootstrap
It should server also as tracking for the work done so far.
This activity is split across multiple PRs and not all might be mentioned here, especially if they have already landed.
As such, this PR can be regularly rebased and force-pushed to update tracking.

Current state: install mode support had landed on the master branch

Related PRs:
Open:
#12641

Landed:
#12846

Keeping original descriptions around for reference:
Original description:

Experimental attempt to handle install step within snap-bootstrap, effectively compressing initial install into a single boot.

Since the install step relies on additional tools to manage partitions, create file systems, set up encryption, this change has to be also accompanied by additional tools in initramfs.
As a reference of initramfs size increases:

  • additional tools and extra features in snap-bootstrap (sec tools) resulted in 2.3MB size increase for UC22 (arm64, zstd compression)

TODOs:
Improve seed meta loading. We load seed meta 3 times(assuming preseeded image)

  • initial essential snaps for install, hashes: kernel, base, gadget
  • handling preseed, hashes: all
  • post-install snapd handling, hashes: snapd
    Since we support parallelism we should ideally:
  • load all the essentials (kernel, core, snapd, gadget) most systems are either single and 4+ cores
  • reuse the hash of essential snaps and only hash other snaps

@pedronis pedronis self-requested a review December 1, 2022 08:51
@kubiko kubiko marked this pull request as draft December 1, 2022 11:42
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks promising, we need to look into:

  • when do we do this? always? can we? not always but based on what?
    • does that mean that install handlers go away?
  • what to do about hooks, calling fde-reveal-key instead of fde-setup is a change of contract that doesn't make sense, also it means we cannot really do this always
  • we need to properly split the parts of devicestate we are reusing to a package (to avoid bringing in most of overlord in snap-bootstrap), and also try to have even more higher-level helpers, there is still quite a bit of stuff that feels repeated from install code and could go out of sync
  • we need to see what do about places that look at the kernel command line directly, I could spot:
  • what to do about install-device hook

@@ -149,7 +151,7 @@ func generateInitramfsMounts() (err error) {
case "recover":
err = generateMountsModeRecover(mst)
case "install":
err = generateMountsModeInstall(mst)
err = installNewSystemGenerateMounts(mst)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the idea to always do this? should we do this only if preseed is setup?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we can't do this in general because of the different hook contract ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a rather radical opinion here :)
I thin we should completely eliminate "install boot" for all the scenarios unless proven otherwise that this is needed.
initramfs is exactly what we want as an ephemeral system to run install in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed we have to:

  • start smaller and decide how to trigger this (which probably relates to the question about the hook contract)
  • we still have use cases for the full install mode (dedicated installers for example)

Comment on lines 1804 to 1978
// 1.1 load seed
deviceSeed, err := loadDeviceSeed(systemLabel)
if err != nil {
return err
}
// 1.2 load seed assertions
assertDb, err := loadSeedAssertions(deviceSeed)
if err != nil {
return err
}

// 1.3 load sead essential meta
if err = loadSeedEssentialMeta(deviceSeed, systemLabel); err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably can be one helper?

}

// 2.2 ensure next boot is to run mode
if err := boot.EnsureNextBootToRunMode(systemLabel); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a bit more than this happening in doRestartSystemToRunMode that we might want/need to take care of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this should be removed from here and system should be marked as bootable later once it actually boots.
If we mark system  runable here and it fails later, the install would not be re-run. Perhaps more fitting place for this would be:
https://github.com/snapcore/snapd/blob/master//boot/boot.go#L355


func generateSnapMountsModeRun(mst *initramfsMountsState, isClassic, hasSeedPart bool, rootfsDir string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this has a bit too many bool args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I'd extend initramfsMountsState struct to do more, so we do not pass this many things.
But fair point


// to set sysconfig.ApplyFilesystemOnlyDefaultsImpl
_ "github.com/snapcore/snapd/overlord/configstate/configcore"
"github.com/snapcore/snapd/overlord/devicestate"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this brings in most overlord which we don't want, we will have to split out the bits we need of devicestate to their own package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this very concern when I was prototyping with a separate go binary.
In case of snap-bootstrap we are unfortunately already pulling it in:
https://github.com/snapcore/snapd/blob/master/cmd/snap-bootstrap/cmd_initramfs_mounts.go#L46
So defo something to improve. And yes I can confirm it does increate size noticeably.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configcore is different, is designed not to pull in most of overlord if things are compiled with -tags nomanagers which we do for snap-boostrap or should

// encryptionType, err := m.checkEncryption(st, deviceCtx)
// if err != nil {
// return err
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we cannot assume we will do encryption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's potentially next TODO to fix this

return nil
}

func doRunInstall(deviceSeed seed.Seed, recoverySystemLabel string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to find a way to share even more of this with install handler code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I think we should move the install handler to bootstrap.

Comment on lines 330 to 364
preseedAs, err := devicestate.InstallHandlerDoReadPreseedAssertion(tmpDb, model, boot.InitramfsUbuntuSeedDir, systemLabel)
if err != nil {
return fmt.Errorf("failed to read preseed assertion: %v", err)
}
writableDir := boot.InstallHostWritableDir(model)
if err != nil {
return err
}
// TODO: improve handling of the preseed, we do double hashing for already loaded snaps
// at the same time, we need to process them to generarate dir structures e.g. /sna/<name>/<rev>
if err = devicestate.InstallHandlerDoMaybeApplyPreseededData(deviceSeed.Model(),
preseedAs, preseedArtifact, boot.InitramfsUbuntuSeedDir, systemLabel, writableDir); err != nil {
return fmt.Errorf("failed to apply preseed data: %v", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again it seems this should be made into one helper

Comment on lines 364 to 390
mountInfos, err := osutil.LoadMountInfo()
if err != nil {
return fmt.Errorf("failed to load mount info: %v", err)
}
ubuntuDataDevice := ""
for _, mnt := range mountInfos {
if mnt.MountDir == InstallUbuntuDataDir {
ubuntuDataDevice = mnt.MountSource
break
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit odd, there should be a more direct to know this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love a simpler way here, but I could not find a go helper to get single-mount info.

Ultimately I do not think we will need this.
We divided /run/mnt/data and /run/mnt/ubuntu-data because we boot into install mode, where /run/mnt/data is tmpfs but this is no more concern. So my preference would be to run the install directly to /run/mnt/data and this extra dance is no more needed.


// helper to clean up as much as we can at fail
// TODO: do we even need this?
func cleanupAtFail() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not called anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was calling it in my early PoC, but then I notice snap-bootstrap is not doing any cleanup so I'm not calling it now. but at least keep the code in case we decide to call it.
For debugging it was also helpful to see the state of the system where it bailed out.

It's a bit philosophical if we need to clean, or if we even should:

  • in secure mode, there is non-interactive shell anyway, and we anyway wiped the whole thing.
  • in debug mode, it's probably good to give a change to see what is where. If we unmount enc partitions and lock fde-hook, extend TPM, there is no way to mount them again....

@github-actions github-actions bot added the Needs Documentation -auto- Label automatically added which indicates the change needs documentation label Jan 18, 2023
@kubiko kubiko force-pushed the poc-single-boot-install branch 3 times, most recently from 53aefed to 111225b Compare March 13, 2023 12:25
@github-actions github-actions bot removed the Needs Documentation -auto- Label automatically added which indicates the change needs documentation label May 24, 2023
@kubiko kubiko force-pushed the poc-single-boot-install branch 4 times, most recently from 8378a13 to b9b10f0 Compare May 31, 2023 14:28
@kubiko kubiko force-pushed the poc-single-boot-install branch 2 times, most recently from bed97dc to de00bdc Compare June 9, 2023 15:51
@kubiko kubiko changed the title snap-bootstrap: experimental: handling install step within snap-bootstrap snap-bootstrap: experimental: handling install/factory reset steps within snap-bootstrap Jun 14, 2023
@kubiko kubiko force-pushed the poc-single-boot-install branch 5 times, most recently from e969c1f to c5362d4 Compare June 21, 2023 23:02
@kubiko kubiko force-pushed the poc-single-boot-install branch 2 times, most recently from 7133e7e to cd61bb2 Compare July 3, 2023 11:19
@kubiko kubiko force-pushed the poc-single-boot-install branch 2 times, most recently from 972ee39 to 420dd05 Compare November 13, 2023 12:52
@ernestl
Copy link
Collaborator

ernestl commented Feb 27, 2024

@kubiko is this still useful/relevant?

@ernestl
Copy link
Collaborator

ernestl commented Feb 28, 2024

Discussed with @kubiko , this is still relevant and should not be removed.

@kubiko kubiko force-pushed the poc-single-boot-install branch 3 times, most recently from c000138 to 9d98b26 Compare April 15, 2024 12:42
kubiko added 11 commits May 29, 2024 17:39
Test show that we can use more cores before we hit I/O limit
Do not artificially limit to 2 jobs, modern systems have 4+ cores
and high disk I/O

Signed-off-by: Ondrej Kubik <[email protected]>
During normal runtime fdekeymanager executable is snap-fde-keymgr (part of the snapd snap)
When running within initramfs, effort is made to avoid multiple go binaries,
so fdekeymanager functionality is provided by snap-bootstrap
Provide option to set fdekeymanager executable name to support snap-bootstrap run environment.

Signed-off-by: Ondrej Kubik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants