Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

export results support #55

Open
asw-dev opened this issue Aug 21, 2018 · 19 comments
Open

export results support #55

asw-dev opened this issue Aug 21, 2018 · 19 comments
Assignees

Comments

@asw-dev
Copy link
Collaborator

asw-dev commented Aug 21, 2018

The button currently exists but is not hooked up to anything

@asw-dev
Copy link
Collaborator Author

asw-dev commented Aug 21, 2018

This is implemented in my dev branch, but conflicts with the chunked loading. Not sure what the interaction should be (could load all chunks on export, but it would be memory expensive).

asw-dev@da83962

@mispp
Copy link
Owner

mispp commented Aug 21, 2018

Thanks for the implementation, i'll merge it manually somehow.

Chunked load fills the model with 100 rows at a time. Each time new chunk is extracted from results, rows get appended to the model. For extract, instead of appending, these 100 rows could be replaced. Memory in this case would only need to be as big as those 100 rows. Two consideration for this case:

  • there will have to be a separate Query(Manager) used since the same query cannot be touched due to forward-only (probably otherwise as well)
  • this further means that proper extract should not be read from the model, but from its own query.

If we want to be more adventurous, extracts should not be limited to one extract per tab, but more like a job queue. This means that another tab next to results grid and messages should be added and so on. For this we'd need a bit more work since closing connection or tab should cancel subsequent operations... My implementation is not the best in this regard.

@mispp
Copy link
Owner

mispp commented Aug 23, 2018

Just looking at your implementation of Query in threading, since query on Query::run() adds database, executes query and then removes database, how would this work with forward only queries which only on QSqlQuery::next() returns a row, initiated by the GUI thread?
This looks like it would work when whole set is moved to model (not fwd-only), but would it work with forward-only?

Anyhow, export needs to be done in it's own thread so Csv can initiate its own Query and Database connection.

@mispp
Copy link
Owner

mispp commented Aug 23, 2018

In addition, i was thinking about bigger use case, for later, so just want to drop ideas here.

Use case:

  • When we run a query, results will be displayed in the results grid (100 rows).
  • User clicks export, which then copies connection and query information from whatever was used to display results and submits a job
  • New query in new thread with new (cloned?) database connection for export job is initiated, where it starts looping row-by-row calling QSqlQuery::next() until all rows have been written out.
  • If we assume that this output is done for a large amount of data or that it's done over slow network (so user will not wait for this to finish), user could execute more queries in the GUI and click export for another job.
  • Another copied connection information, new database connection is established etc.

An idea how descibed use case would be achieved:

  • QueryTab has a QueryManager
  • QueryManager will handle queries and db-connections for gui, but every export job will have it's own database connection, so it should initiate it since this is the only reason this connection will be used
  • One query will be for the gui, all others will be ExportJobs. We'd need several new classes, something like this:
    • ExportJobManager
    • ExportJob
    • CsvOutputFormat
    • XmlOutputFormat
    • JSONOutputFormat
    • ...

Reason why this management of established connections is mentioned so many time is because database connection can be removed only by the thread it created it.
I'm making an assumption here that since from gui we need add/remove databases, we need to create those connections in that thread. we can pass them around as objects, but in a subseqent thread calling QSqlDatabase::database("mydatabase") won't work

Thoughts? Would this be too complicated?

@asw-dev
Copy link
Collaborator Author

asw-dev commented Aug 23, 2018

... how would this work with forward only queries ... ?

It actually is forward-only right now, but it lacks a few things that I haven't implemented yet.

  1. Render the data when the first row is received. You can add a new signal modelAdded() that is emitted each time the thread adds a model (with only the first row so far) to the list. The tab can listen for this and call pop_front & setModel at that time. I have not looked at how to tell the GUI there are more rows added to the model, that may or may not be baked in already.

  2. Do not hold all the result data in memory. For this you can subclass QAbstractItemModel. When setData() is called it will append to a tmp file and when data() is called it is read from the file. You can use mmap() or a cache to hold your 100/1000 "rows" of data that is currently being rendered to the ui. You would probably want to be smart about cleaning up old files from killed processes (locked pid files maybe). You probably also want 2 files per query (index-to-offset, and data) to do fast lookups. This is conceptually similar to being non-forward only which use a database cursor to view a section at a time, the difference being the data is now client side and the connection can be closed.

  3. I was also planning to have another subclass QAbstractItemModel decorator to returned the BackgroundRole data instead of having it stored with the model (it is always a function of the column/value).

This setup lets you have forward only, ui-responsive, quick closing, single run, memory lite queries.

The one draw back to this approach is there is no good way to pause the data-to-disk flow (other than killing the query), so if you don't have a disk large enough to hold the result data you are out of luck.

Thoughts?

The only gotcha I can think of with the ExportJobs is when queries are not idempotent.

--example
create table times
(
	ts_when timestamp
);

insert into times (ts_when) values (current_timestamp) returning ts_when;

having export repeat the query would cause a change to the database which is not what the user expects to happen. So long as it's more "query to file" instead of "export grid to file" it should be fine.

@mispp
Copy link
Owner

mispp commented Aug 24, 2018

It actually is forward-only

I think i missed one fact, please correct me if i'm wrong: query opens connection, it pulls whole rowset into the model, then closes connection.

...
The one draw back to this approach is there is no good way to pause the data-to-disk flow (other than killing the query), so if you don't have a disk large enough to hold the result data you are out of luck.

If we do this, i'd very much like to have it configurable. Reason is - i'm testing this on a huge table, so whenever i run something, it'll pull 3+ GB from db to the client which i don't want.
My preference in behavior is that it takes as little as needed to be shown, basically like it works now. What i'm unsure is what to do with threading in this case, should it stay with QtConcurrent::run or move it somehow to QThread.

So, several options here:

  • fwd only
    • in memory, but pull as little as needed (like currently in master)
    • open connection, pull all rows at once then close connection (like now in your branch)
      • cash on client hdd (a bit more complicated, but provides low memory requirement)
      • do not cash on the client hdd, just dump all rows into the model (simple, but high memory requirement)
  • non-fwd only

This applies to how we get the results, but i'm also thinking how to decide/design QtConcurrent::run vs QThread implementation.

having export repeat the query would cause a change to the database which is not what the user expects to happen

this is true, but i think whoever does this, must be aware that anything but select will have an impact on the db state. so different buttons or behavior defined through settings?

I was also planning to have another subclass QAbstractItemModel decorator to returned the BackgroundRole data instead of having it stored with the model (it is always a function of the column/value).

it makes sense to be defined like that. what i did was a quick win, didn't yet use delegates, so i'd have to spend a lot of time to get it working.
as note, i'd add also like to add an icon to header and color the text in the textgrid based on datatype. i guess delegate supports at coloring of text, but can it handle icons in header?
when it comes to datatypes, i tried to extract it from QSqlField, but it returns not so precise information it seems. Sometimes, if the number is too big, it converts it to QString.

@mispp
Copy link
Owner

mispp commented Aug 24, 2018

Anyhow, i'll try to patch something up to make your Csv class work in simplest way, so at least some export functionality is working.
Additional things described we can work on at a later time.

@asw-dev
Copy link
Collaborator Author

asw-dev commented Aug 26, 2018

FYI I've added export-to-clipboard support to my branch asw-dev@137f175

The change has 2 parts, capturing ctrl+c events on the grid and a "full" export on button click. In each case I'm assuming that headers should be included (but that could be an option of a tool button).

As you are currently working on export code I figured it would be best to let you handle this merge.

@mispp
Copy link
Owner

mispp commented Aug 27, 2018

Sure, no prob.

The only thing is - this might take a while. That rewrite from QtConcurrent::run to QThread is giving me issues since i want to keep that 100rows by 100rows extract (so model gets only filled as needed). I yesterday decided to reset --hard and start from scratch for exporting branch

I guess all can be taken in from your commit except the change in .pro file since there is no Query class anymore in master. The idea was to have a class which would manage all stuff related to queries since we have 4 happening for every tab:

  1. normal query for execution of user statements
  2. cancel user query
  3. export query
  4. cancel export query

all of these should be running over their own connection/thread.

Just having issues in designing this since this damn QSqlDatabase::database() won't work if database is created in another thread and it needs to stay open since 100-by-100 rows is taken chunk by chunk.

@mispp
Copy link
Owner

mispp commented Aug 27, 2018

Point is, if you want to merge it into master, i have no issues with that. I can rebase my branch since i did reset all the work i did, so no worries.
Only thing i'd like to keep is this behavior where not all 3+ gb gets pulled to client when i do select * from b and b is a big table created just for this kind of testing. This can maybe be controlled by a checkbox in QueryTab next to combobox weather to pull all, or just chunks.

@mispp
Copy link
Owner

mispp commented Sep 1, 2018

I tried to make clasd Query and move it to its thread (to standardize export and ui queries), but it didn't work for some reason with triggering different query on the same object. Tried different things and nothing worked.

I will have to probably reset again.

@mispp
Copy link
Owner

mispp commented Sep 2, 2018

export now works in exporting branch (commit 64dcf51) since i copy/pasted your referenced commit into it (probably should have cherrypicked).

bad:

  • now the connection combobox doesnt work (tried to do it properly by emmiting events on connect/disconnect, wasnt finished with it since removing items seemd a bit difficult when using Connection object as data in QStandardItem)
  • couldn't make QThread work with having connection open in the thread. signal for pulling next 100 rows just wouldnt trigger, so i resorted back to QtConcurrent::run
  • exporting to file i tried to embed within the Query class itself (that's why QueryExport is there) so it would be running in its own thread when using QThread; didnt want to make QSqlQuery member of the Query class public...
  • Query classes (multiple) is complication for no reason, just doesn't make sense, there are 3 of them which is mostly c/p. i guess abstract class would fix some things
  • if QThread is used, then Connection object needs to be passed around. Touching a database connection must be done in the thread it is created in, so this i tried, but it didnt work as i wanted (comment above with signal). So it's either QThread+Connection or QtConcurrent::run+QString

good:

  • QueryTab now doesnt have any logic handling database connections

as a learning process, this is somehow ok, but result of what i did is PoS. needs to be simplified and streamlined a lot. exporting branch is now here just for reference. it really needs to be improved.

with this we come back to refactoring

headache starting

@mispp
Copy link
Owner

mispp commented Sep 4, 2018

last post was addressed in #47 , but a bit more work is needed before this can be closed.

when refactoring of Query part is done, i'd like to do a dialog here and then push parameters into Csv. Parameters would be:

  • delimiter
  • quoting (enabled/disabled, symbol)
  • number format
  • date format
  • include/exclude header

if possible, this dialog should display how would this exactly look like (it would have a text part below the settings). since c/p takes data from QStandardItemModel, not sure if we can distinguish which datatype is in.

@mispp
Copy link
Owner

mispp commented Sep 8, 2018

this was done in refactoring2 branch, but still pending:

  • number format
  • date format

@mispp
Copy link
Owner

mispp commented Sep 10, 2018

in locale branch, now different locales are supported, also custom date/time formats.
unfortunatelly, c/p from results grid is now broken (all of it is quoted), so this further needs to be worked upon.

so, todos before this issue can be closed:

  • fix c/p
    • parameters: tab delimiter, always quote string columns, include header, iso format (2018-12-31 date, 23:59:59 time, 2018-12-13T23:59:59), system locale
    • optional: add checkbox on tab for pulling all rows from database when using c/p
  • every time on export dialog button 'ok' is pressed, save settings which are reloaded on next export dialog shown (custom formats, selected locale, state of checkboxes); Settings class is needed (static methods i guess)
  • locales should be limited to only one per country (iso is missing?)
    • even better, there should be a '+' sign to add locales which are relevant for a person to be shown in the list on the export dialog (so another child dialog of export dialog?)
    • system locale should always be available
  • on uncheck of custom format, set combobox to format of selected locale

@mispp
Copy link
Owner

mispp commented Sep 15, 2018

latest commit solves 2 mentioned bulltes. still remains:

  • save custom formats
  • custom selected locales

@mispp
Copy link
Owner

mispp commented Sep 15, 2018

this was more or less done. results somehow cover what i had in mind, so the question is now - do we need to do more before this can be closed?

@asw-dev
Copy link
Collaborator Author

asw-dev commented Sep 15, 2018

I say it's good as is. Can always create a new case for anything more.

@mispp
Copy link
Owner

mispp commented Sep 16, 2018

to improve it, i think the following should be done (together with that data formater):

  • when c/p from results grid, by default do not get headers. this is extremly annoying when c/p single cell or when c/p subset. it should be valid maybe on whole column(s) selection.
  • add right-click menu on the grid with two options: copy with and copy without header
  • clicking corner button should copy all of it with header (not sure if selection here is relevant)

all in all, this c/p thing needs to be updated/cleaned/refactored i think

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants