Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to control checkpoint rate #247

Open
Wazzzzuzp opened this issue Jun 1, 2024 · 12 comments
Open

Add option to control checkpoint rate #247

Wazzzzuzp opened this issue Jun 1, 2024 · 12 comments
Labels
enhancement New feature or request

Comments

@Wazzzzuzp
Copy link

Hi Devs,

Any chance of getting GPU Checkpoint Option added to the cores? Either to turn it off or extend it to x amount of frames?

With my 4090 folding time per frame is normally in the 9 - 30 secs range. Having the core stop to perform a check point that takes as long as a frame to complete seems waste full.

Thanks
image

@muziqaz
Copy link
Contributor

muziqaz commented Jun 1, 2024

This is set by project owners, not the client. 4090 or other ultra high end users are extreme minority, and it is not very helpful to the projects to cater to that minority :)
Reducing the frequency of checkpoints would hinder science progress, because those with slower cards would lose a lot of progress if they paused or switched off folding just before it writes a checkpoint.
remember, your 9 seconds TPF is equal to minutes or tens of minutes to other performance tier cards. Checkpoints are not time based, too. so 5% for you is 45s, for others it might be half an hour, and they start losing half an hour of science, we are going backwards

@Wazzzzuzp
Copy link
Author

Wazzzzuzp commented Jun 1, 2024

So no different then folding on a cpu, however the cpu units can set how often they checkpoint. Was after the same for the GPUs.

@muziqaz
Copy link
Contributor

muziqaz commented Jun 1, 2024

CPU checkpointing is different, and is very fine grained. Regardless of what you set for yourself, CPU will always restart very close to where you paused/stopped. OpenMM (GPU), I believe, does not have such functionality. Whatever project owner (not fah dev) sets it in their project, it is universal and cannot be changed per client basis, at least with current client functionality. I believe if this was possible, this would have been implemented in v7 long time ago.

@Wazzzzuzp
Copy link
Author

Better course of action is to go add a feature request to OpenMM?

@muziqaz
Copy link
Contributor

muziqaz commented Jun 1, 2024

No, better course of action is to be patient ;) interesting to hear what Joe has to say, and I also asked the question to GPU fahcore devs. Will see what comes out of it ;)

@jcoffland
Copy link
Member

I don't think there's a problem here. The checkpointing is taking less than one second. This is apparent from the log.

@muziqaz
Copy link
Contributor

muziqaz commented Jun 5, 2024

We had a little chat internally and OpenMM checkpoint frequency is set by researcher at the beginning of the project, and is written into core.xml file. Not sure it would be ideal to have an option for users to alter those settings

@jcoffland
Copy link
Member

The benefit to cost, in terms of performance gained vs development effort, is not good.

@jcoffland jcoffland changed the title Folding Core: GPU Checkpoint Option Add option to control checkpoint rate Jun 5, 2024
@jcoffland jcoffland added the enhancement New feature or request label Jun 5, 2024
@Wazzzzuzp
Copy link
Author

Wazzzzuzp commented Jun 6, 2024

I don't think there's a problem here. The checkpointing is taking less than one second. This is apparent from the log.

Not the best example as this machine is running FAHClient from RAM.

Thanks all for entertain the idea.

@muziqaz
Copy link
Contributor

muziqaz commented Jun 6, 2024

That would still need to be exposed within fahclient for you to do the change

@Wazzzzuzp
Copy link
Author

Wazzzzuzp commented Jun 6, 2024

That would still need to be exposed within fahclient for you to do the change

Sorry I dont follow

@muziqaz
Copy link
Contributor

muziqaz commented Jun 6, 2024

Fahclient is the UI for users to adjust the settings. If fahclient does not have an actual option to change checkpoint frequency, you won't be able to change it ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants