Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cancelled jobs do not clean-up temp files #500

Open
scotty6435 opened this issue Apr 17, 2024 · 1 comment
Open

[BUG] Cancelled jobs do not clean-up temp files #500

scotty6435 opened this issue Apr 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@scotty6435
Copy link

Describe the bug

When a job is cancelled (in our case because a new commit was added to the PR, triggering another build), the temp files generated by the job are not being cleaned up, causing our agent to fail.

Checkmarx's behaviour of zipping up the .git folders means that the zip generated by one of our codebases is very large (~3GB) so our /tmp/ space has to be enormous to handle this. As we're running inside containers, having persistent volumes or large temporary volumes is very inefficient for the cluster. Our widespread usage of Checkmarx within the company also means that this problem scales out very quickly.

Expected behavior

When a job terminates for any reason (success, failure, cancelled etc) the temp files generated by the task should always be processed.

Actual behavior

The files remain in place, causing our self-hosted ADO agents to run out of disk space and error.

Steps to reproduce

  1. Create/select a very large SCM repo
  2. Run a Checkmarx scan on an agent with a small /tmp reserve
  3. After the job creates the zip file, cancel the job
  4. Repeat as necessary until the agent fails.

Environment

  • Checkmarx AST@2
  • Ubuntu 20.04 LTS

Additional comments

Add any other context about the problem here.

Logs

Screenshot of the files left in the temp folder. The top is from a cancelled job, the second from a job that succeeded
image

2024/04/17 11:14:34 Scan status:  Running
##[error]The Operation will be canceled. The next steps may not contain expected logs.
##[error]The operation was canceled.
Finishing: CheckmarxAST
##[error]Failed to create CoreCLR, HRESULT: 0x80004005
,##[error]We stopped hearing from agent ci-ado-agent-dev-43cee. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610
@scotty6435 scotty6435 added the bug Something isn't working label Apr 17, 2024
Copy link

Internal Jira issue: AST-40077

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant