Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of Sync Records: Implementing Pagination and Temporary Table #6585

Merged

Conversation

skrashevich
Copy link
Contributor

This pull request introduces significant changes to optimize the process of syncing records, resulting in a speed optimization of up to 400 times. A temporary table is created for record deletion, and pagination is used to process recordings in chunks. This strategy is particularly beneficial for handling the deletion of recordings with missing files efficiently and swiftly

Speed Comparison:

Current:

Start sync recordings.
Deleting 32124 recordings with missing files
End sync recordings.
Old - Processed for 3046.11 seconds.

This PR:

Start sync recordings.
Deleting 31536 recordings with missing files
End sync recordings.
New - Processed for 7.99 seconds.

@netlify
Copy link

netlify bot commented May 23, 2023

Deploy Preview for frigate-docs ready!

Name Link
🔨 Latest commit bc5a57c
🔍 Latest deploy log https://app.netlify.com/sites/frigate-docs/deploys/6485c1d0315f680007e76062
😎 Deploy Preview https://deploy-preview-6585--frigate-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

frigate/record/cleanup.py Outdated Show resolved Hide resolved
@skrashevich skrashevich force-pushed the 230523-optimize-sync-records branch 2 times, most recently from 8f61470 to a474eec Compare May 23, 2023 17:06
@skrashevich skrashevich force-pushed the 230523-optimize-sync-records branch from a474eec to 094cbe9 Compare May 23, 2023 17:17
@blakeblackshear
Copy link
Owner

What is the performance for the os.walk when there are tens of thousands of files in the directory? I think this was one of the problems we ran into.

@skrashevich
Copy link
Contributor Author

skrashevich commented May 25, 2023

What is the performance for the os.walk when there are tens of thousands of files in the directory? I think this was one of the problems we ran into.

creating a set of files on disk give as a constant time (O(1)) search operation

% python3 test.py 5 10000
Folders and files created successfully.
Total - Processed 50001 files for 0.27 seconds.

% python3 test.py 5 100000
Folders and files created successfully.
Total - Processed 500001 files for 2.09 seconds.

import os
import datetime
import sys


def create_folders_with_files(num_folders, num_files):
    # Create the specified number of folders
    for i in range(1, num_folders + 1):
        folder_name = f"Folder_{i}"
        os.makedirs(folder_name, exist_ok=True)

        # Create the specified number of files within each folder
        for j in range(1, num_files + 1):
            file_name = f"{folder_name}/File_{j}.txt"
            with open(file_name, "w") as file:
                file.write("This is file content.")

    print("Folders and files created successfully.")


if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Please provide two command-line arguments: <num_folders> <num_files>")
    else:
        num_folders = int(sys.argv[1])
        num_files = int(sys.argv[2])

    create_folders_with_files(num_folders, num_files)

    start_time = datetime.datetime.now().timestamp()
    # get all recordings files on disk and put them in a set
    files_on_disk = {
        os.path.join(root, file) for root, _, files in os.walk(".") for file in files
    }
    duration = datetime.datetime.now().timestamp() - start_time
    print(f"Total - Processed {len(files_on_disk)} files for {duration:.2f} seconds.")

@blakeblackshear
Copy link
Owner

We should think carefully about when this runs. If a user temporarily mapped their recordings incorrectly or their NAS disconnected, this would drop all their recordings on restart.

@skrashevich
Copy link
Contributor Author

We should think carefully about when this runs. If a user temporarily mapped their recordings incorrectly or their NAS disconnected, this would drop all their recordings on restart.

make sense. I'll think about mark event in database as unreachable instead of delete

@blakeblackshear
Copy link
Owner

I think it would be best to use a simple heuristic instead. What if we try and detect when >50% of the recordings would be deleted and stop the automatic sync. We could also add a button on the Storage page that triggers a sync in "force" mode which can override the fail safe.

We could also consider creating a hidden file in the recordings directory that we could check for on startup. If it's not there and we have recordings in the database, then we can assume there is an issue with the mounted storage.

@skrashevich skrashevich requested a review from NickM-27 June 8, 2023 22:22
@skrashevich skrashevich force-pushed the 230523-optimize-sync-records branch 3 times, most recently from e1b1f15 to 32dc664 Compare June 8, 2023 22:33
@skrashevich skrashevich force-pushed the 230523-optimize-sync-records branch from 32dc664 to 1ec8bc0 Compare June 8, 2023 22:34
frigate/record/cleanup.py Outdated Show resolved Hide resolved
frigate/record/cleanup.py Show resolved Hide resolved
@blakeblackshear
Copy link
Owner

Looks like there are a few conflicts here, but this is good to merge when those are resolved.

@blakeblackshear blakeblackshear merged commit 5c27cb7 into blakeblackshear:dev Jun 11, 2023
11 checks passed
@skrashevich skrashevich deleted the 230523-optimize-sync-records branch June 11, 2023 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants