Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate site_imports from sites.imported_data in CE #4155

Merged
merged 7 commits into from
May 29, 2024

Conversation

ruslandoga
Copy link
Contributor

Changes

This PR makes site_imports be automatically populated from sites.imported_data on startup for CE builds. Right now the self-hosters need to do it manually. See #4125 (reply in thread)

Tests

  • This PR does not require tests

Changelog

  • This PR does not make a user-facing change

Documentation

  • This change does not need a documentation update

Dark mode

  • This PR does not change the UI

@ruslandoga ruslandoga requested a review from zoldar May 28, 2024 11:57
@zoldar zoldar force-pushed the auto-migrate-site-imports branch from 54558a7 to 3eb0a39 Compare May 29, 2024 09:29
@zoldar
Copy link
Contributor

zoldar commented May 29, 2024

Unfortunately, this might not work as ClickhouseRepo is most likely not running when this migration is executed. The migration script uses both, Plausible.Repo and Plausible.ClickhouseRepo.

 % MIX_ENV=ce_dev m ecto.migrate

11:21:09.474 [info] == Running 20240528115149 Plausible.Repo.Migrations.MigrateSiteImports.up/0 forward
Backfilling legacy site import across 0 sites (DRY RUN: false)...
Finished backfilling sites.

11:21:09.499 [debug] QUERY OK source="sites" db=0.2ms
SELECT s0."id", s0."domain", s0."timezone", s0."public", s0."locked", s0."stats_start_date", s0."native_stats_start_at", s0."allowed_event_props", s0."conversions_enabled", s0."props_enabled", s0."funnels_enabled", s0."ingest_rate_limit_scale_seconds", s0."ingest_rate_limit_threshold", s0."domain_changed_from", s0."domain_changed_at", s0."imported_data", s0."inserted_at", s0."updated_at", s0."id" FROM "sites" AS s0 WHERE (s0."id" = ANY($1)) [[2, 1]]
Adjusting end dates of 2 site imports (DRY RUN: false)...
Adjusting end date for site import 2 (1/2) (site ID 1, start date: 2020-05-31, end date: 2022-06-01)
** (RuntimeError) could not lookup Ecto repo Plausible.ClickhouseRepo because it was not started or it does not exist
    (ecto 3.11.2) lib/ecto/repo/registry.ex:22: Ecto.Repo.Registry.lookup/1
    (ecto 3.11.2) lib/ecto/repo/supervisor.ex:160: Ecto.Repo.Supervisor.tuplet/2
    (plausible 0.0.1) lib/plausible/clickhouse_repo.ex:2: Plausible.ClickhouseRepo.all/2
    (plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:154: Plausible.DataMigration.SiteImports.imported_stats_end_date/2
    (plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:92: anonymous fn/4 in Plausible.DataMigration.SiteImports.adjust_site_import_end_dates/2
    (elixir 1.16.0) lib/enum.ex:2528: Enum."-reduce/3-lists^foldl/2-0-"/3
    (plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:80: Plausible.DataMigration.SiteImports.adjust_site_import_end_dates/2
    (plausible 0.0.1) lib/plausible/data_migration/site_imports.ex:45: Plausible.DataMigration.SiteImports.run/1

I have used https://hexdocs.pm/ecto_sql/Ecto.Migrator.html#with_repo/3 wrapper function to ensure ClickhouseRepo is running when data migration is executed:

% MIX_ENV=ce_dev m ecto.migrate

11:28:21.138 [info] == Running 20240528115149 Plausible.Repo.Migrations.MigrateSiteImports.up/0 forward
Backfilling legacy site import across 0 sites (DRY RUN: false)...
Finished backfilling sites.

11:28:21.180 [debug] QUERY OK source="sites" db=0.2ms
SELECT s0."id", s0."domain", s0."timezone", s0."public", s0."locked", s0."stats_start_date", s0."native_stats_start_at", s0."allowed_event_props", s0."conversions_enabled", s0."props_enabled", s0."funnels_enabled", s0."ingest_rate_limit_scale_seconds", s0."ingest_rate_limit_threshold", s0."domain_changed_from", s0."domain_changed_at", s0."imported_data", s0."inserted_at", s0."updated_at", s0."id" FROM "sites" AS s0 WHERE (s0."id" = ANY($1)) [[2, 1]]
Adjusting end dates of 2 site imports (DRY RUN: false)...
Adjusting end date for site import 2 (1/2) (site ID 1, start date: 2020-05-31, end date: 2022-06-01)
End date of site import 2 (site ID 1) is left unadjusted.
Adjusting end date for site import 3 (2/2) (site ID 2, start date: 2020-05-31, end date: 2024-05-13)
End date of site import 3 (site ID 2) is left unadjusted.
Finished adjusting end dates of site imports.
Finished

11:28:21.306 [info] == Migrated 20240528115149 in 0.1s

11:28:21.326 [info] Migrations already up

@ruslandoga
Copy link
Contributor Author

Can the script switch to using Plausible.IngestRepo?

zoldar
zoldar previously approved these changes May 29, 2024
@zoldar
Copy link
Contributor

zoldar commented May 29, 2024

@ruslandoga Wouldn't Plausible.Repo be missing from running apps then?

@zoldar zoldar added the deploy-to-staging Special label that triggers a deploy to a staging environment label May 29, 2024
@ruslandoga
Copy link
Contributor Author

Why would it be missing?

@zoldar
Copy link
Contributor

zoldar commented May 29, 2024

@ruslandoga That's why it's failing here in the first place, to my understanding. Looking at https://github.com/elixir-ecto/ecto_sql/blob/v3.11.2/lib/mix/tasks/ecto.migrate.ex#L151-L158 - the migration task only starts a particular repo for the duration of executing the migrations, not the whole app (or other repos, in parallel). I think the same holds for migrator in releases.

@zoldar
Copy link
Contributor

zoldar commented May 29, 2024

This fails with migrator run via release because ce?() expands Mix.env() which is in turn used in runtime context here:

$ ./migrate.sh 
Loading plausible..
Starting dependencies..
Starting repos..
Running migrations for Elixir.Plausible.Repo
** (UndefinedFunctionError) function Mix.env/0 is undefined (module Mix is not available)
    Mix.env()
    (plausible 0.0.1) expanding macro: Plausible.ce?/0
    lib/plausible-0.0.1/priv/repo/migrations/20240528115149_migrate_site_imports.exs:6: Plausible.Repo.Migrations.MigrateSiteImports.up/0
    (elixir 1.16.0) expanding macro: Kernel.if/2
    nofile:1: (file)

I think it's better to add an explicit migration step to upgrade instructions instead, like:

docker compose exec plausible bin/plausible rpc "Plausible.DataMigration.SiteImports.run(dry_run: false)"

@zoldar zoldar dismissed their stale review May 29, 2024 10:02

Failed when tested in staging env

@ruslandoga
Copy link
Contributor Author

ruslandoga commented May 29, 2024

I think it's easier to work around this than to push it down to self-hosters to perform another manual step.

ce? can be made into a function similar to Plausible.product_name/0, while on_ce and others would stay macros.

PR: #4158

@ruslandoga
Copy link
Contributor Author

ruslandoga commented May 29, 2024

the migration task only starts a particular repo for the duration of executing the migrations, not the whole app (or other repos, in parallel)

We are not using mix ecto.migrate: https://github.com/plausible/analytics/blob/master/rel/overlays/migrate.sh#L6

eval starts the whole app with all the repos.

@zoldar zoldar force-pushed the auto-migrate-site-imports branch from 3eb0a39 to 14fbfc0 Compare May 29, 2024 12:56
@zoldar
Copy link
Contributor

zoldar commented May 29, 2024

We are not using mix ecto.migrate:

Anyone trying to setup the application locally (for instance, with MIX_ENV=ce|ce_test mix ecto.setup) will have this migration fail on them without both repos up.

P.S. The updated ce?() check works when used at runtime now, confirmed with a test run on staging.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented May 29, 2024

Anyone trying to setup the application locally (for instance, with MIX_ENV=ce|ce_test mix ecto.setup) will have this migration fail on them without both repos up.

MIX_ENV=ce|ce_test mix ecto.setup fails already since seeds use funnels :) And it doesn't seem like ClickhouseRepo is called if there are no sites in the DB (the setup use-case). Something feels off to me about this migrator wrapper.

@zoldar
Copy link
Contributor

zoldar commented May 29, 2024

  • Funnels are no longer created in seeds when running the app in CE envs
  • Warnings were put at the top of data migration scripts which are currently run in migrations
  • Migrator repo wrapper is dropped as ClickhouseRepo shouldn't be called on freshly setup database anyway and the repo problem surfaces only when running in local dev env in practice, as release migrations use eval which brings the whole application up beforehand

@zoldar zoldar changed the title populate site_imports from sites.imported_data in CE Populate site_imports from sites.imported_data in CE May 29, 2024
@zoldar zoldar merged commit dc7243f into master May 29, 2024
10 of 11 checks passed
@zoldar zoldar deleted the auto-migrate-site-imports branch May 29, 2024 15:37
@ruslandoga
Copy link
Contributor Author

ruslandoga commented Jun 6, 2024

eval starts the whole app with all the repos.

I was wrong, only the selected repos and apps are started right now.

Enum.each(@start_apps, &Application.ensure_all_started/1)
# Start the Repo(s) for myapp
IO.puts("Starting repos..")
Enum.each(repos(), & &1.start_link(pool_size: 2))

[
  {:ecto_sql, ~c"SQL-based adapters for Ecto and database migrations", ~c"3.11.1"},
  {:ch, ~c"HTTP ClickHouse driver for Elixir", ~c"0.2.5"},
  {:mint, ~c"Small and composable HTTP client.", ~c"1.6.0"},
  {:hpax, ~c"Implementation of the HPACK protocol (RFC 7541) for Elixir", ~c"0.2.0"},
  {:castore, ~c"Up-to-date CA certificate store.", ~c"1.0.7"},
  {:ecto, ~c"A toolkit for data mapping and language integrated query for Elixir", ~c"3.11.2"},
  {:eex, ~c"eex", ~c"1.16.0"},
  {:postgrex, ~c"PostgreSQL driver for Elixir", ~c"0.17.5"},
  {:jason, ~c"A blazing fast JSON parser and generator in pure Elixir.\n", ~c"1.4.1"},
  {:decimal, ~c"Arbitrary precision decimal arithmetic.", ~c"2.1.1"},
  {:db_connection, ~c"Database connection behaviour for database transactions and connection pooling\n", ~c"2.6.0"},
  {:telemetry, ~c"Dynamic dispatching library for metrics and instrumentations", ~c"1.2.1"},
  {:ssl, ~c"Erlang/OTP SSL application", ~c"11.1"},
  {:public_key, ~c"Public key infrastructure", ~c"1.15"},
  {:asn1, ~c"The Erlang ASN1 compiler version 5.2.1", ~c"5.2.1"},
  {:crypto, ~c"CRYPTO", ~c"5.4"},
  {:logger, ~c"logger", ~c"1.16.0"},
  {:elixir, ~c"elixir", ~c"1.16.0"},
  {:compiler, ~c"ERTS  CXC 138 10", ~c"8.4.1"},
  {:stdlib, ~c"ERTS  CXC 138 10", ~c"5.2"},
  {:kernel, ~c"ERTS  CXC 138 10", ~c"9.2"}
]

So this migration still needs a workaround to either:

  • use the already started repo (Plausible.IngestRepo instead of Plausible.ClickhouseRepo)

  • pre-start Plausible.ClickhouseRepo in prepare (this could help automatically avoid similar problem in the future)

defp prepare do
  # ...
  Enum.each([Plausible.ClickhouseRepo | repos()], & &1.start_link(pool_size: 2))
end

I've opened a PR with the latter: #4155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deploy-to-staging Special label that triggers a deploy to a staging environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants