From eb75d6d6e2f3a3454ef1302756ef2c8a8688359b Mon Sep 17 00:00:00 2001 From: Qinlong Wang Date: Thu, 4 Jan 2024 10:01:50 +0800 Subject: [PATCH] Fix by comments. --- README.md | 8 ++++---- docs/blogs/flash_checkpoint.md | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 0c153aeb2..758cbf793 100644 --- a/README.md +++ b/README.md @@ -74,10 +74,10 @@ from the latest checkpoint when a failure happens. The actions of flash checkpoi The Performance of DLRover Flash Checkpoint to Save/Load GPT2-1.5B. -The figure illustrates that the I/O time overhead to read checkpoint files -when resuming by restarting training processes. With DLRover Flash Checkpoint, -recovery directly from shared memory takes essentially -on the order of seconds wich is much faster than SSD and NAS. +The figure illustrates that the I/O time to read checkpoint files +when resuming training processes. With DLRover Flash Checkpoint, +recovery could be completed in the order of seconds by loading checkpoints directly from shared memory, +which is much faster compared to loading checkpoints from SSD and NAS. #### Fault Tolerance Improves the Stability of TensorFlow PS Training diff --git a/docs/blogs/flash_checkpoint.md b/docs/blogs/flash_checkpoint.md index b22d463eb..794aabc9e 100644 --- a/docs/blogs/flash_checkpoint.md +++ b/docs/blogs/flash_checkpoint.md @@ -307,10 +307,10 @@ Compared to NAS remote file systems, FCP reduces the blocking time by nearly a h Figure 4: The Paused Training Time to Save Checkpoint. -The figure illustrates that the I/O time overhead to read checkpoint files -when resuming by restarting training processes. With DLRover Flash Checkpoint, -recovery directly from shared memory takes essentially -on the order of seconds wich is much faster than SSD and NAS. +The figure illustrates that the I/O time to read checkpoint files +when resuming training processes. With DLRover Flash Checkpoint, +recovery could be completed in the order of seconds by loading checkpoints directly from shared memory, +which is much faster compared to loading checkpoints from SSD and NAS.
Editor