Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-Request: ResultSetMetaData does not provide catalog, schema and table name for a column #22306

Open
prrvchr opened this issue Jun 6, 2024 · 19 comments

Comments

@prrvchr
Copy link
Member

prrvchr commented Jun 6, 2024

Hi all,

I am trying to continue integrating the Trino driver into jdbcDriverOOo which is a JDBC driver for LibreOffice Base.

I can create tables in Base if their name is given in lowercase characters.

I can display the contents of a table or view, but am unable to add, modify or delete any record since I do not have access to the name of the catalog, the schema and the table in the ResultSetMetaData.

Would it be possible to have access to such data?

@mosabua
Copy link
Member

mosabua commented Jun 6, 2024

Question in general since I am maybe not understanding enough. Are you basically trying to use LibreOfficebase as a user interface for querying Trino? If so .. does it not support a plain JDBC driver directly? Why do you need a separate additional driver?

@prrvchr
Copy link
Member Author

prrvchr commented Jun 6, 2024

Hi mosabua,

Why do you need a separate additional driver?

Because LibreOffice's native JDBC driver is not working properly, it is unable to edit (create, modify, delete) a forward only ResultSet. It is written in C which makes it very slow (it seems that the C/Java conversion used requires many calls) and it is completely abandoned by the developers of LibreOffice.
jdbcDriverOOo is a type 4 driver written only in Java.

On the other hand if jdbcDriverOOo has access to the name of the catalog, the schema and the table then any Trino ResultSet will become editable in Base.

@mosabua
Copy link
Member

mosabua commented Jun 6, 2024

But a ResultSet is just that .. it is the result of a query that ran in Trino (or part of it and the query is still running). That result could come from one or more tables, from one or more schemas, and one of more catalogs.

It is not some sort of handle to the underlying table or whatever.

If the JDBC drivers together allow you to send queries to Trino .. those queries can be insert, update, delete and so on. Any data you can see is also the result of queries .. just straight select queries from the tables or views.

On a side note .. happy to chat on Trino slack or even on a call some time as well. If you get this working it might be interesting to add to the website and maybe even do a Trino Community Broadcast interview and demo with you.

@prrvchr
Copy link
Member Author

prrvchr commented Jun 6, 2024

I'm talking about methods:

I only get empty strings with the Trino driver while any JDBC driver provides the names of the catalog, schema, and table of the column selected by the index provided to the previous methods. This information may be missing for calculated columns (ie: count, sum...)

If I obtain this information then it is possible for LibreOffice Base thanks to jdbcDriverOOo to edit (insert, update, delete) tables and even views or queries spread over several tables...

@mosabua
Copy link
Member

mosabua commented Jun 6, 2024

I kind think this architecturally not possible, but might be wrong. My reasoning is that the Trino JDBC driver is a wrapper around the Trino Client REST API. Once the result of a query is gone out to the client there is no hook back to the underlying data beyond it being a resultset of some query. So its essentially a virtual/read-only result set.

If a client tool wants to edit data, for example in a grid in a user interface, the changes in the UI would have to create a query that performs the relevant operating somehow. And if that works or not will depend on the features of the connector in Trino. Some are read only, some allow insert, some insert and update, and so on.

@prrvchr
Copy link
Member Author

prrvchr commented Jun 6, 2024

I kind think this architecturally not possible, but might be wrong.

In fact it already works perfectly with all the other drivers. It is me who constructs the SQL queries (INSERT..., UPDATE..., DELETE...) but if I do not know the name of the table, the schema and the catalog to which the column is attached I am unable to construct the appropriate query.

Normally this information is provided by the underlying database driver?

@mosabua
Copy link
Member

mosabua commented Jun 6, 2024

And that is the case even for a simple SELECT * FROM catalog.schema.table; kind of setup I assume.

@prrvchr
Copy link
Member Author

prrvchr commented Jun 6, 2024

And that is the case even for a simple SELECT * FROM catalog.schema.table; kind of setup I assume.

This is exactly the query that Base sends when you are in table editing mode (ie: table open).

@prrvchr
Copy link
Member Author

prrvchr commented Jun 6, 2024

Why does Trino not fill the ResultSetMetaData from the underlying database driver, because normally the database provides this data (catalog, schema and table)?

@mosabua
Copy link
Member

mosabua commented Jun 7, 2024

Why does Trino not fill the ResultSetMetaData from the underlying database driver, because normally the database provides this data (catalog, schema and table)?

I dont know .. I am assuming in many cases it would be difficult to determine but in general it might be possible in some cases. However I am not sure that even if that were the case it would solve your issue. I asked others on slack and hope somebody with more knowledge of the JDBC driver like @electrum can chime in.

@mosabua
Copy link
Member

mosabua commented Jun 7, 2024

Also note .. catalog, schema, and table are concepts in Trino itself .. and those are the ones that would be returned and the ones you would attempt to update.

@electrum
Copy link
Member

electrum commented Jun 7, 2024

This would be the code in

.setCatalogName("") // TODO
.setSchemaName("") // TODO
.setTableName("") // TODO

In order to implement this, we would need to extend the client protocol to include this information, likely in the columns object of the response. This would further require changing the query analyzer and planner to record this information, so that it can be made available to the client protocol. The io.trino.server.protocol.Query#setQueryOutputInfo method is a good place in the middle between the query engine and the client protocol.

@TonBits
Copy link

TonBits commented Jun 7, 2024

Why does Trino not fill the ResultSetMetaData from the underlying database driver, because normally the database provides this data (catalog, schema and table)?

Catalog and Schema is not always guaranteed for ResultSetMetaData. Theoretically it is possible to return the catalog and schema for a query where the actual catalog and schema is known, but it is not always the case, And the overhead might be significant for identifying the catalog and schema and could add additional running time for all query that will be executed in Trino.

In your case a SELECT * FROM catalog.schema.table might look straight forward to identify the catalog and schema but for all other queries like, expressions, aggregation, nested queries, joins, and more complex ones,

e.g.

SELECT SUM(t.column1) + SUM(t.column2) AS mycolumn 
FROM catalog.schema.table t

The returned column mycolumn does not actually come from a single catalog but a result of an aggregation and expression.

The examples below are expressions and does not have catalog or schema associated to them

SELECT CURRENT_TIMESTAMP
SELECT 1+1
SELECT 'ABC'

So now, the question becomes, Is it worth filling up the Catalog and Schema in ResultSetMetaData and add overhead for the rest of the queries just for a particular use case?
Because if you have to fill the catalog and schema for

SELECT * 
FROM catalog.schema.table;

Then you have to handle the rest of the other use case where catalog and schema are identifiable like

WITH temp AS (
  SELECT *
  from catalog.schema.table
)
SELECT *
FROM temp;

That is just an example. There are other more complex structure, sub queries, nested with statements, union, nested sub queries where catalog and schema are identifiable. The implementation must trace back from the inner most query to the outer most to identify the catalog and schema.

In your case, I would just go with DatabaseMetaData and fetch the catalog and schema for the particular table you are interested in. Because the DatabaseMetaData represents the physical tables (not an arbitrary expressions like query), you will most likely get the catalog and schema.

@prrvchr
Copy link
Member Author

prrvchr commented Jun 7, 2024

I already have a lot of specific code that makes Base more efficient with ResultSet:

If the ResultSet is CONCUR_UPDATABLE then Base uses the bookmarks (the UNO XRowLocate interface) to perform positioned updates on the ResultSet (ie: insertRow(), updateRow() and deleteRow()).

If the ResultSet cannot be updated then these classes will emulate a CONCUR_UPDATABLE ResultSet and will use the RowSetWriter.class to construct the necessary SQL queries (INSERT..., UPDATE..., DELETE...) and it is precisely at this moment that I need the name of the catalog, the schema and the table.

In order to implement this, we would need to extend the client protocol to include this information, likely in the columns object of the response. This would further require changing the query analyzer and planner to record this information, so that it can be made available to the client protocol. The io.trino.server.protocol.Query#setQueryOutputInfo method is a good place in the middle between the query engine and the client protocol.

I think that if we want to be able to edit ResultSet this is the right way to use it already works (outside of Trino) with ResultSet coming from a view which are generally read-only ResultSet.

@prrvchr
Copy link
Member Author

prrvchr commented Jun 7, 2024

To be more precise, if we want perfect editing of the ResultSet in Base, it is necessary to have access to additional information:

The isAutoIncrement information allows Base to make the column non-editable during insertion.

@prrvchr
Copy link
Member Author

prrvchr commented Jun 7, 2024

SELECT SUM(t.column1) + SUM(t.column2) AS mycolumn 
FROM catalog.schema.table t

The returned column mycolumn does not actually come from a single catalog but a result of an aggregation and expression.

The examples below are expressions and does not have catalog or schema associated to them

There is no point in making such a query editable, and it is possible to make only the columns attached to a table editable and therefore to exclude the calculated columns...
For the moment no filtering on the ResultSet columns is done but this can be added...

So now, the question becomes, Is it worth filling up the Catalog and Schema in ResultSetMetaData and add overhead for the rest of the queries just for a particular use case? Because if you have to fill the catalog and schema for...

Maybe this can only be enabled if we try to run the ResultSet in a mode other than TYPE_FORWARD_ONLY?

Then you have to handle the rest of the other use case where catalog and schema are identifiable like...

For the moment with the database drivers supporting it (ie: ouside Trino) I can edit views. Nothing prevents us from taking into account more complicated scenarios.

@prrvchr
Copy link
Member Author

prrvchr commented Jun 7, 2024

This would be the code in

.setColumnLabel(column.getName())
.setColumnName(column.getName()) // TODO

I see that java.sql.ResultSetMetaData.getColumnName(int) and java.sql.ResultSetMetaData.getColumnLabel(int) return the same value, to be able to edit ResultSet you need access to the name of the column and not to its label...

@prrvchr
Copy link
Member Author

prrvchr commented Jun 17, 2024

Well I have version 1.4.0 of jdbcDriverOOo which can insert into the resultset (I will look for the missing information from the ResultSetMetadata in DatabaseMetaData.getColumns()).

On the other hand, I don't understand why I have errors coming from the driver (Error code 13) when I try to delete or update a record with an SQL command (UPDATE... or DELETE...)?

Error code 13

I don't really understand what's stopping me from doing SQL queries?

Thanks for your help.

@prrvchr prrvchr changed the title ResultSetMetaData does not provide catalog, schema and table name for a column Feature-Request: ResultSetMetaData does not provide catalog, schema and table name for a column Jun 19, 2024
@prrvchr
Copy link
Member Author

prrvchr commented Jun 19, 2024

@electrum

The io.trino.server.protocol.Query#setQueryOutputInfo method is a good place in the middle between the query engine and the client protocol.

If I look at the code it seems that only the column name and type is supported at the moment.
Can you confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants