Scripts table should be created for each database/schema #665

sunng87 · 2022-11-30T06:09:00Z

What type of enhancement is this?

Refactor

What does the enhancement do?

The scripts table, where we stored python scripts, should be created for each database/schema. This can either be done when creating the database/schema, or lazily created on first script register request.

Also we need a special naming pattern for these built-in tables, for example, to be prefixed with _ or gt_.

Implementation challenges

No response

The text was updated successfully, but these errors were encountered:

ShenJunkun · 2023-01-01T12:41:27Z

I want to try this task, please assign this task to me if you don't mind.

MichaelScofield · 2023-01-03T02:15:39Z

@ShenJunkun assigned

evenyag · 2023-01-03T02:56:12Z

Also we need a special naming pattern for these built-in tables, for example, to be prefixed with _ or gt_

We could reserve a schema for all built-in tables, such as a schema named system or greptime.

ShenJunkun · 2023-01-03T16:52:02Z

This is my plan:

Reserve a schema named system in greptime catalog for all scripts table.
The name of scripts table will be catalogName_schemaName_scripts.
Currently, the codes in standalone mode will generate scripts table in public schema, I need to move it to my new schema.
GreptimeDB has two modes,i.e. standalone mode and distributed mode, I need to consider both scenarios.

I also have some questions.

When I launch greptimedb in distributed mode, I can's insert data into table, the log shows that the logic hasn't implement. So, I want to know whether there is something wrong with me.
For this task, compared to standalone mode, need I do some special in distributed mode? May be I need to write some codes to update catalog and schema info in MetaSrv.

evenyag · 2023-01-04T03:39:19Z

There are several approaches

Reserve one scripts table in the system schema

We could reserve a schema for all built-in tables, such as a schema named system or greptime.

Drawbacks

users need to access their script table via the fully qualified name in their SQL
scripts from different schemas are mixed together

select script from catalog.system.scripts where name='xxx';

select script from catalog.system.scripts where name='xxx' and schema='xxx';

But this could be improved by putting the scripts table or the system catalog in the search path like PostgreSQL

In addition to public and user-created schemas, each database contains a pg_catalog schema, which contains the system tables and all the built-in data types, functions, and operators. pg_catalog is always effectively part of the search path. If it is not named explicitly in the path then it is implicitly searched before searching the path's schemas. This ensures that built-in names will always be findable.

Reserve scripts table for each schema in the system schema

Reserve a schema named system in greptime catalog for all scripts table.
The name of scripts table will be catalogName_schemaName_scripts.

Drawbacks

We need to create lots of scripts table
The table name would be too long

select script from catalogName_schemaName_scripts where name = 'xxx';

Maybe we need to use a fully qualified name as we are in a different catalog/schema

select script from greptime.system.catalogName_schemaName_scripts where name = 'xxx';

Actually, we could reserve a system schema for each catalog, so the catalogName isn't necessary. We could also add an additional column to represent the schema of each script. Finally, this becomes the first approach.

Reserve scripts table for each schema

Also we need a special naming pattern for these built-in tables, for example, to be prefixed with _ or gt_.

We need to create this table for each schema, then we could access it by typing its name directly

select script from gt_scripts where name = 'xxx';

Drawback

We need to create the table for each schema, which also means that there are lots of scripts table

Conclusion

As for me, I'd prefer the first solution: putting all system tables, including the scripts table in a specific schema. Generally, we don't recommend users operate the system tables directly. We should provide SQL commands or API to do that

Normally, one should not change the system catalogs by hand, there are normally SQL commands to do that. (For example, CREATE DATABASE inserts a row into the pg_database catalog — and actually creates the database on disk.)

I'd be happy to hear your thoughts @sunng87 @killme2008 @ShenJunkun

evenyag · 2023-01-04T03:53:29Z

For this task, compared to standalone mode, need I do some special in distributed mode

I think we could implement this in standalone mode. I'm not sure whether scripts table is supported in distributed mode.

When I launch greptimedb in distributed mode, I can's insert data into table, the log shows that the logic hasn't implement. So, I want to know whether there is something wrong with me.

You might need to create a distributed table, as mentioned in this

ShenJunkun · 2023-01-04T06:08:15Z

As for me, I'd prefer the first solution: putting all system tables, including the scripts table in a specific schema. Generally, we don't recommend users operate the system tables directly. We should provide SQL commands or API to do that

Sounds reasonable! By the way, Is there a difference between a schema and a database in GreptimeDB? In MySQL, a schema is synonymous with a database, but it's not in PostgreSQL!

evenyag · 2023-01-04T06:25:55Z

You might need to create a distributed table, as mentioned in this

Oops, @MichaelScofield is refactoring our gPRC protocol, so I'm afraid that the distributed mode may not work until #381 is done

By the way, Is there a difference between a schema and a database in GreptimeDB

Catalog and schema are different in GreptimeDB. But there are still some works to be done if we'd like to support creating a catalog/schema via SQL manually.

killme2008 · 2023-01-04T06:32:18Z

I prefer the first solution too. Let's do it.

sunng87 · 2023-01-04T06:44:30Z

By the way, Is there a difference between a schema and a database in GreptimeDB? In MySQL, a schema is synonymous with a database, but it's not in PostgreSQL!

Yes, this is the key issue I have been concerning about. In MySQL schema == database, while in Postgres and some other databases, catalog == database, schema is namespace of a few tables. In GreptimeDB, we follow MySQL convention to treat schema as database: by running create database you created a schema.

From my perspective, I want to keep database self-contained, to be the minimum isolation unit in shared environment, which means:

all data/configuration stored within a database, there is no dependency between databases
for operation, database can be moved between clusters without caring about any other things
we can limit qps or other finer access control unit to a single database without affecting others

So that's the reason I'm +1 for storing scripts as built-in table for each schema(database). But I agree this approach has its cons for example the table is visible and modifiable to user.

ShenJunkun · 2023-01-04T07:02:21Z

So that's the reason I'm +1 for storing scripts as built-in table for each schema(database). But I agree this approach has its cons for example the table is visible and modifiable to user.

That's a trade-off! We can discuss about it. @killme2008 @sunng87 @evenyag

Oops, @MichaelScofield is refactoring our gPRC protocol, so I'm afraid that the distributed mode may not work until #381 is done

Never mind!

evenyag · 2023-01-04T08:59:16Z

@sunng87 @ShenJunkun After taking a brief look at the codes, I find out that adding a schema column to the scripts table is much easier to implement. The ScriptsTable registers itself to the catalog manager and finds the table when it needs to access a script

greptimedb/src/script/src/table.rs

Lines 68 to 72 in 7762873

 catalog_manager 

 .register_system_table(RegisterSystemTableRequest { 

 create_table_request: request, 

 open_hook: None, 

 })

To support schema isolation, we just need to specify the schema column in insert() and find_script_by_name()

greptimedb/src/script/src/table.rs

Line 87 in 7762873

pub async fn insert(&self, name: &str, script: &str) -> Result<()> {

greptimedb/src/script/src/table.rs

Lines 142 to 146 in 7762873

 pub async fn find_script_by_name(&self, name: &str) -> Result<String> { 

 // FIXME(dennis): SQL injection 

 // TODO(dennis): we use sql to find the script, the better way is use a function 

 // such as `find_record_by_primary_key` in table_engine. 

 let sql = format!("select script from {} where name='{}'", self.name(), name);

But the catalog manager itself doesn't support creating the system tables for each new schema now.

I think we can fix this issue by simply adding a schema column to the existing scripts table. But the script table is still using the default catalog and default schema.

greptimedb/src/script/src/table.rs

Lines 54 to 58 in 7762873

 let request = CreateTableRequest { 

 id: SCRIPTS_TABLE_ID, 

 catalog_name: DEFAULT_CATALOG_NAME.to_string(), 

 schema_name: DEFAULT_SCHEMA_NAME.to_string(), 

 table_name: SCRIPTS_TABLE_NAME.to_string(),

Maybe we need to move the scripts table to the system catalog like this system table

greptimedb/src/catalog/src/system.rs

Lines 104 to 108 in 7762873

 let request = CreateTableRequest { 

 id: SYSTEM_CATALOG_TABLE_ID, 

 catalog_name: SYSTEM_CATALOG_NAME.to_string(), 

 schema_name: INFORMATION_SCHEMA_NAME.to_string(), 

 table_name: SYSTEM_CATALOG_TABLE_NAME.to_string(),

For distributed mode, we need some other ways to implement system tables, which is out of the topic.

So that's the reason I'm +1 for storing scripts as built-in table for each schema(database). But I agree this approach has its cons for example the table is visible and modifiable to user.

Actually, the script operation is only available under the HTTP API, so it's meaningless for users to access the table directly. So it should be fine that we only have one script table now.

sunng87 · 2023-01-04T12:28:51Z

Agreed. Let's use system schema to store scripts and use a schema column to specify execution context. We can consider special handling for system schema in cluster setup to resolve isolation issue.

ShenJunkun · 2023-01-05T06:35:14Z

Ok, let's do it!

ShenJunkun · 2023-01-05T07:06:11Z

Another question:
How can I get data from system catalog?

When I try to use sql to access system catalog like this, select * from system.information_schema.system_catalog;

the log show that:

2023-01-05T06:59:59.491339Z ERROR common_telemetry::panic_hook: panicked at 'System catalog table does not support scan!', src/catalog/src/system.rs:68:9 backtrace=   0: common_telemetry::panic_hook::set_panic_hook::{{closure}}
             at src/common/telemetry/src/panic_hook.rs:30:25
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/alloc/src/boxed.rs:2032:9
      std::panicking::rust_panic_with_hook
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/std/src/panicking.rs:577:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/core/src/panicking.rs:64:14
   6: <catalog::system::SystemCatalogTable as table::table::Table>::scan::{{closure}}
             at src/catalog/src/system.rs:68:9
   7: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/core/src/future/future.rs:124:9
   8: <table::table::adapter::DfTableProviderAdapter as datafusion::datasource::datasource::TableProvider>::scan::{{closure}}
             at src/table/src/table/adapter.rs:77:65
   9: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/935dc07218b4bf6e20231e44eb9263b612fd649b/library/core/src/future/future.rs:124:9
  10: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
             at /home/jkshen/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/4917235/datafusion/core/src/physical_plan/planner.rs:488:88

evenyag · 2023-01-05T09:40:47Z

What data do you want to get from this table? Now we don't support reading the contents in this table via the SQL interface, so you might need to implement scan() for it if you'd like to select it. I think we should create an issue for this.

I think we should hide system_catalog or some other system tables until they support scan @v0y4g3r .

ShenJunkun · 2023-01-08T09:21:49Z

What data do you want to get from this table? Now we don't support reading the contents in this table via the SQL interface, so you might need to implement scan()

If I move scripts table to system catalog like this, Should I implement the logic for user to read scripts table via the SQL interface?

Maybe we need to move the scripts table to the system catalog like this system table

sunng87 · 2023-01-08T10:23:32Z

It's recommended. We need a way to list stored scripts for a database. It can either be done via a Http REST API, or a dedicated SQL statement like SHOW SCRIPTS

evenyag · 2023-01-09T02:00:41Z

We need a way to list stored scripts for a database. It can either be done via a Http REST API, or a dedicated SQL statement like SHOW SCRIPTS

I think we could create another issue for this feature and do it later

sunng87 · 2023-02-08T07:50:27Z

There are two tasks to be finished:

move scripts table to system scehama
provide show script api or statement

apdong2022 · 2023-02-18T04:18:13Z

@ShenJunkun Hi Junkun, this is April from Greptime Devrel team, Thanks a lot for your contribution in our community!! Do you mind joining our Slack https://greptime.com/slack or leaving your email address? so we can contact you and send you important notices. Thanks a lot!

ShenJunkun · 2023-03-06T14:41:03Z

@ShenJunkun Hi Junkun, this is April from Greptime Devrel team, Thanks a lot for your contribution in our community!! Do you mind joining our Slack https://greptime.com/slack or leaving your email address? so we can contact you and send you important notices. Thanks a lot!

I have already joined GreptimeDB slack. My email address is [email protected].

sunng87 added the C-enhancement Category Enhancements label Nov 30, 2022

MichaelScofield assigned ShenJunkun Jan 3, 2023

ShenJunkun mentioned this issue Jan 11, 2023

refactor: add schema column to the scripts table #868

Merged

2 tasks

MichaelScofield closed this as completed Feb 7, 2023

sunng87 reopened this Feb 8, 2023

sunng87 mentioned this issue Feb 16, 2023

refactor(api): change api from schema to db GreptimeTeam/dashboard#127

Merged

shuiyisong mentioned this issue Apr 20, 2023

feat: catalog level scripts table #1432

Closed

MichaelScofield mentioned this issue Sep 18, 2023

feat: make scripts table work again #2420

Merged

2 tasks

github-actions bot unassigned ShenJunkun Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scripts table should be created for each database/schema #665

Scripts table should be created for each database/schema #665

sunng87 commented Nov 30, 2022

ShenJunkun commented Jan 1, 2023

MichaelScofield commented Jan 3, 2023

evenyag commented Jan 3, 2023

ShenJunkun commented Jan 3, 2023

evenyag commented Jan 4, 2023 •

edited

Loading

evenyag commented Jan 4, 2023

ShenJunkun commented Jan 4, 2023

evenyag commented Jan 4, 2023 •

edited

Loading

killme2008 commented Jan 4, 2023

sunng87 commented Jan 4, 2023 •

edited

Loading

ShenJunkun commented Jan 4, 2023

evenyag commented Jan 4, 2023

sunng87 commented Jan 4, 2023

ShenJunkun commented Jan 5, 2023

ShenJunkun commented Jan 5, 2023

evenyag commented Jan 5, 2023

ShenJunkun commented Jan 8, 2023

sunng87 commented Jan 8, 2023

evenyag commented Jan 9, 2023

sunng87 commented Feb 8, 2023

apdong2022 commented Feb 18, 2023

ShenJunkun commented Mar 6, 2023

Scripts table should be created for each database/schema #665

Scripts table should be created for each database/schema #665

Comments

sunng87 commented Nov 30, 2022

What type of enhancement is this?

What does the enhancement do?

Implementation challenges

ShenJunkun commented Jan 1, 2023

MichaelScofield commented Jan 3, 2023

evenyag commented Jan 3, 2023

ShenJunkun commented Jan 3, 2023

evenyag commented Jan 4, 2023 • edited Loading

Reserve one scripts table in the system schema

Drawbacks

Reserve scripts table for each schema in the system schema

Drawbacks

Reserve scripts table for each schema

Drawback

Conclusion

evenyag commented Jan 4, 2023

ShenJunkun commented Jan 4, 2023

evenyag commented Jan 4, 2023 • edited Loading

killme2008 commented Jan 4, 2023

sunng87 commented Jan 4, 2023 • edited Loading

ShenJunkun commented Jan 4, 2023

evenyag commented Jan 4, 2023

sunng87 commented Jan 4, 2023

ShenJunkun commented Jan 5, 2023

ShenJunkun commented Jan 5, 2023

evenyag commented Jan 5, 2023

ShenJunkun commented Jan 8, 2023

sunng87 commented Jan 8, 2023

evenyag commented Jan 9, 2023

sunng87 commented Feb 8, 2023

apdong2022 commented Feb 18, 2023

ShenJunkun commented Mar 6, 2023

evenyag commented Jan 4, 2023 •

edited

Loading

evenyag commented Jan 4, 2023 •

edited

Loading

sunng87 commented Jan 4, 2023 •

edited

Loading