Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create new pruning script for tables aow_processed, aow_processed_aow_actions #3328

Open
ghost opened this issue Mar 28, 2017 · 23 comments
Open
Labels
Priority:Important Issues & PRs that are important; broken functions, errors - there are workarounds Type: Bug Bugs within the core SuiteCRM codebase

Comments

@ghost
Copy link

ghost commented Mar 28, 2017

Issue

When using recurring workflows triggered by scheduled task (cron), the tables aow_processed and aow_processed_aow_actions can quickly become huge and bog down the system.

Expected Behavior

Log tables like aow_processed and aow_processed_aow_actions should be pruned automatically before they become too large.

Actual Behavior

If you set up workflows to be fired by scheduled task these tables can quickly become very large.

Possible Fix

I'd suggest creating a new pruning script that can be setup via Admin>Scheduled Tasks to:

see e.g. my pruning query here:
https://suitecrm.com/forum/developer-help/11262-very-large-aow-processed-and-aow-processed-aow-actions-tables#56405

Steps to Reproduce

  1. Setup workflow to run from scheduler over a module with a few hundred records
  2. Setup scheduler task for workflows to run regularly
  3. watch these tables grow and eventually cause problems, especially when loading any page where the workflow logs are shown.

Context

Your Environment

  • SuiteCRM Version used: 7.7.4
  • Browser name and version (e.g. Chrome Version 51.0.2704.63 (64-bit)):
  • Environment name and version (e.g. MySQL, PHP 7):
  • Operating System and version (e.g Ubuntu 16.04):
@pgorod
Copy link
Contributor

pgorod commented Mar 29, 2017

Agreed, this is necessary. But why not make it part of the existing Prune job?

@Dillon-Brown Dillon-Brown added the Type:Suggestion Issue containing a suggestion of functionality, process or UI. Associated PRs are called enhancement label Mar 29, 2017
@ghost
Copy link
Author

ghost commented Mar 30, 2017

  1. deleting logs that belong to workflows that no longer exist (issue aow_actions & aow_processed & aow_conditions records not deleted if workflow is deleted #2826): This could and should be included in the existing prune job (because the existing prune job will delete the workflows that are marked with "deleted" anyway)
  2. deleting old entries in the the aow_processed tables however might require an admin settings panel allowing the admin to choose something like the number of rows to retain, or how long of a log history to retain, something like that. Then the pruning script would delete old rows according to this setting. Depending on use case and server resources we may not all like a one-size-fits-all here.

adding a simple Admin panel in the Advanced OpenAdmin section wouldn't be a big deal I hope?

I'd certainly vote for having this pruning script on by default with some default settings like deleting the log older than 1 month or something like that.

@pgorod
Copy link
Contributor

pgorod commented Mar 30, 2017

This got me thinking. What is the use of all this information? On the forum posts, some people mentioned having 14 million records!

Maybe we should start thinking about this a little earlier: do all these records really need to be generated? By default? Or shouldn't it be some extreme-debugging-logging-option, off by default?

@mattlorimer
Copy link
Member

Its is required for workflows that don't use repeated runs, especially one that run on the scheduler where it is set to run on all records, as this prevents it from running more than once on a record. The common problem is when "repeated runs" is used without understanding how it works, this can cause the tables to fill up rapidly. A Solution for this, is to just update the the logs with a count as opposed to a new entry each time, this would help reduce the load. We can easily add to the pruning job to remove rows for deleted records but doing so for old record may have unexpected results.

@samus-aran
Copy link
Contributor

@samus-aran
Copy link
Contributor

We have now set up a new home for suggestions at Trello. All github issues that were labeled 'suggestion' have been moved and will be closed. Certain ones will be progressed within the new Suggestion Box and may be re-opened.

Announcement of moving Suggestions:
https://suitecrm.com/forum/suggestion-box/13691-moving-suggestions-from-github-to

New SuiteCRM Suggestion Box
https://trello.com/b/Ht7LbMqw/suitecrm-suggestion-box

@pgorod
Copy link
Contributor

pgorod commented Sep 15, 2017

@Dillon-Brown I would re-open this and label it Bug. I regularly (though not very frequently) see people in the forums whose database has Gigabytes of these rows. It slows the system down, and it's easy to run out of disk space (especially inside a VM), with critical consequences.

@ghost
Copy link
Author

ghost commented Sep 7, 2018

Like @pgorod, I also feel this should be labeled as a bug and added to a standard scheduled prune job in SuiteCRM.

Maybe you can build upon the sql queries I've been using to keep my aow_processed tables cleaned up:

USE crm;
CREATE TABLE aow_processed_new LIKE aow_processed; 
# retain all aow_processed rows that belong to WF that run only once per record
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0;
# prune aow_processed rows that belong to WF that run multiple times, based on date
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1 AND wp.date_entered >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_new; 

CREATE TABLE aow_processed_aow_actions_new LIKE aow_processed_aow_actions;
# retain all aow_processed_aow_actions rows that belong to WF that run only once per record
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0) as wp ON wpa.aow_processed_id = wp.id;
# prune aow_processed_aow_actions rows that belong to WF that run multiple times, based on date
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1) as wp ON wpa.aow_processed_id = wp.id AND wpa.date_modified >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_aow_actions_new;

ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;
ALTER TABLE aow_processed_aow_actions RENAME aow_processed_aow_actions_old;
ALTER TABLE aow_processed_aow_actions_new RENAME aow_processed_aow_actions;

DROP TABLE aow_processed_old;
DROP TABLE aow_processed_aow_actions_old;

Naturally the cutoff dates would need to be set dynamically as in "more than a year old" or something like that.

nice-to-have
admin setting to choose the criteria for pruning these tables (with a sensible default value).

@pgorod
Copy link
Contributor

pgorod commented Sep 8, 2018

Thanks for that John, very useful.

Another "nice-to-have" which isn't too much work and would be quite practical is a button, inside the workflow, next to the Repeated Runs check-box, saying Reset runs history which would give a warning message box explaining the consequences, and then would delete all the aow_processed information for that specific workflow. Sometimes people are ok with just deleting everything.

@samva259
Copy link

Dear authors,

Thank you very much for the posted information, it is about to save us a lot of trouble.

Regarding the technique for dealing with the aow_processed_aow_actions table, should not the query that inserts into the new table take into account whether the parent row in the aow_processed table was copied or not into its own new table ?

I'm not familiar with the internal application logic to determine whether it would allow the existence of a child aow_processed_aow_actions row that would have a date_modified value newer than the date_entered value of its parent row. If this is possible then it seems that the insert query for the new aow_processed_aow_actions_new table should be as follows (using your same datetime cutoff as an example):

INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1 AND wp.date_entered >= '2017-10-15 00:00:00') as wp ON wpa.aow_processed_id = wp.id AND wpa.date_modified >= '2017-10-15 00:00:00');

Thank you again for your help.

@samva259
Copy link

...or...something like this might be a way to select only those aow_processed rows which they themselves are older than the cutoff datetime and only those all of whose child rows in the aow_processed_aow_actions table are also all older than the same cutoff time, which may provide some extra consistency, as it may age out of the database only entire execution sessions of the workflow process.

SELECT wp.id
FROM aow_processed wp
WHERE wp.date_entered < '2017-10-15 00:00:00'
AND wp.id IN (
SELECT wpa.aow_processed_id
FROM aow_processed_aow_actions wpa
GROUP BY wpa.aow_processed_id
HAVING MAX(wpa.date_modified) < '2017-10-15 00:00:00'
);

Then the aow_processed_aow_actions and related aow_processed rows can be deleted as an atomic transaction including all related rows

@holdusback
Copy link

Is this now fixed ? I mean do the Prune Database workflow delete aow_processed and aow_processed_aow_actions ?

@pgorod
Copy link
Contributor

pgorod commented Sep 4, 2019

I don't think there were any changes to the code.

As I wrote in 2017, I believe this Issue could be reopened and labeled a bug because of the nasty consequences it has for some people. With some care the existing solutions could be made generic and applied in core.

@holdusback
Copy link

I said that because If its not fixed I think I will make a CRON or a view in SQL to prune thoses two table. I know that aow_processed only store logs, IDK about aow_processed_aow_actions, but It grow rly quick in size too.

@pgorod
Copy link
Contributor

pgorod commented Sep 5, 2019

Remember that those "logs" are relevant to the Repeated runs option. So messing with those tables alters the behavior of Workflows.

That's why only a case-by-case evaluation is possible.

@pgorod
Copy link
Contributor

pgorod commented Oct 1, 2019

Just for my reference, this issue #3744 is also about database clean-ups.

@Dillon-Brown can we consider a re-opening of this issue here, as I asked above?

@Dillon-Brown Dillon-Brown reopened this Oct 1, 2019
@Dillon-Brown Dillon-Brown added Type: Bug Bugs within the core SuiteCRM codebase Priority:Important Issues & PRs that are important; broken functions, errors - there are workarounds and removed Type:Suggestion Issue containing a suggestion of functionality, process or UI. Associated PRs are called enhancement labels Oct 1, 2019
@huangzl888
Copy link

huge aow table is definitly a problem if you run many workflows daily.

if you delete the rows in it, the work flow may run for the record again, then you end up having many new entries been created in your crm again and again.

@pgorod
Copy link
Contributor

pgorod commented Dec 16, 2020

To some extent, this is unavoidable, if SuiteCRM needs to track each record. If the problem hits you very hard I think you should consider moving the information into the record. What I mean is to add fields (or custom fields) so that the record itself stores its own state, in a manner that is relevant to your business. Then your workflows can use more conditions based on that data, in order to be more selective.

I hope this helps.

@DaftBrit
Copy link

I would argue that even if you are running your workflows as lean and optimally as possible you will at some point in time still need to clear this logging information down. At some point a cleaning / pruning method will be required to stop database bloat. The scripts provided here by ghost are very useful but adding options to have this cleaned by suite would be far better, I dont like the idea of running an automated script that alters tables when the system itself should be capable of intelligently cleaning its own logging.

@holdusback
Copy link

I would argue that even if you are running your workflows as lean and optimally as possible you will at some point in time still need to clear this logging information down. At some point a cleaning / pruning method will be required to stop database bloat. The scripts provided here by ghost are very useful but adding options to have this cleaned by suite would be far better, I dont like the idea of running an automated script that alters tables when the system itself should be capable of intelligently cleaning its own logging.

Thanks for the reply here, forgot this issue. I've checked those two tables, and it looks like they are not pruned. Got some logs from 2019,2020 etc... Theyre are not that big (1730 entry) but idk if its revelent to have them stored.

@turtlemenace
Copy link

I don't know if you can "upvote" on here but the lack of this feature has certainly become a problem for me.

@SuiteBot
Copy link

This issue has been mentioned on SuiteCRM. There might be relevant details there:

https://community.suitecrm.com/t/aow-processed-aow-processed-aow-actions-huge-tables/93080/2

@shubham-pawar
Copy link

@turtlemenace You need to thumps-up on the post to upvote it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:Important Issues & PRs that are important; broken functions, errors - there are workarounds Type: Bug Bugs within the core SuiteCRM codebase
Projects
None yet
Development

No branches or pull requests