Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Historical data over 47 GB – any way to enfore a retention policy? #84

Closed
zkvvoob opened this issue Sep 25, 2023 · 4 comments
Closed

Comments

@zkvvoob
Copy link

zkvvoob commented Sep 25, 2023

Hello all,

I'm running Plausible in a Docker container and I just noticed that it's storing 47GB of data, the majority of it in the ./store directory:

8.0K	./495
16K	./108
16K	./1b3
16K	./1eb
16K	./2b1
16K	./8db
16K	./bd3
16K	./c2f
16K	./c58
16K	./f69
20K	./403
40K	./58b
56K	./8fb
840K	./9d2
848K	./c99
36M	./fe5
88M	./38d
3.3G	./964
19G	./7a2
25G	./ee7
47G	.

I'm not sure what any of these directory names signify, so I'm hesitant to delete any of them. Nevertheless, I'd like to know if there's a way to limit the amount of historical data stored? 47GB is a bit excessive for a handful of small sites, especially given the fact that I have no intention of reviewing statistics from years ago.

Thanks!

@ruslandoga
Copy link
Contributor

ruslandoga commented Sep 25, 2023

👋 @zkvvoob

These look like ClickHouse parts. Depending on your config, some of these can be logs and can be safely dropped (in SQL terms). You can disable logs storage by mounting these config: https://github.com/plausible/hosting/tree/master/clickhouse

@ruslandoga
Copy link
Contributor

More info: https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk/

@ruslandoga
Copy link
Contributor

ruslandoga commented Sep 25, 2023

You can check if it's indeed query_log with something like this

$ cd hosting
$ docker compose exec plausible_events_db clickhouse-client
:) select count(*) from system.query_log;

@ruslandoga
Copy link
Contributor

ruslandoga commented Sep 25, 2023

And I just found this query to list the biggest tables

SELECT table, formatReadableSize(sum(bytes)) AS size FROM system.parts GROUP BY table

You can use it to check if the data on disk is actually the app data or if it's from the system tables.

source: https://gist.github.com/sanchezzzhak/511fd140e8809857f8f1d84ddb937015

@zkvvoob zkvvoob closed this as completed Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants