-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats
command writes output file even when --output
is not set
#1794
Comments
Hi @mhkeller , As you inferred, it's used by other commands to do metadata and schema inferencing, among other things. It's also used by In the field I work in, where we deal primarily with large, historical CSVs, this is helpful as the files are typically static once exported from transaction systems, as Anyway, I'll add an option to suppress generating these cache files. I'll also add some logic to only cache results if the potential savings are too small (say less than 5 seconds) to bother caching them. |
stats
command writes output file even when --output
is not setstats
command writes output file even when --output
is not set
Thanks for the quick and thorough reply! I figured it had to do with some internal usage. That makes a lot of sense. An option to skip writing out the files would be great for my use case. With the Thanks in general for your work on this library. I have a more general question that I'll post over in Discussions. |
No worries... big fan of the data journalism work you and your team are doing at NYTimes BTW... 💯 FYI, during the first few days of the pandemic, I wrote a Selenium scraper to retrieve data from NYTimes and petitioned to have it released as open data instead, to which the team responded quickly. 😄 Anyways, as for the new |
Ah thank you – that's so nice of you to say. And I'm glad you were able to get the data you were after – that tracking was a huge effort. (I was not involved but was a great admirer.) That option for the flag makes sense and I'm looking forward to trying it out! I was looking for a fast, portable way to check csv types so qsv is perfect. |
Thanks for merging this so quickly! |
Heh, this is perfect timing for me. I'm working with a directory of csv files that sort of works like an auto-load folder. If a CSV file is dropped into the directory, a process picks it up and tries to load it. I was hoping to find a way to disable the cache files as I only need to run |
That's good to know @chadbaldwin ! You may be interested to know that I added a new negative setting to Lines 143 to 153 in 15d0072
For example, If you set Further, after the stats run, it will auto-delete the index and the stats cache files as the |
Describe the bug
When running
qsv stats --typesonly my_file.csv
, I get the stats in stdout but it also writes two files next to the file I am reading in:I would prefer to not write any files.
To Reproduce
Steps to reproduce the behavior:
qsv stats iris.csv --typesonly
Expected behavior
I'm not sure if this is a bug but the behavior is surprising and it would be great if there were an option to not write out any files.
The docs describe an
--output
flag to write output. I would expect this function to only create output if set via a flag.If these files are necessary for other qsv commands, it would be helpful to include a flag to optionally not write them.
Screenshots/Backtrace/Sample Data
If applicable, add screenshots/backtraces/sample data to help explain your problem.
Desktop (please complete the following information):
qsv 0.127.0-mimalloc-apply;fetch;foreach;geocode;Luau 0.622;python-3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)];to;polars-0.39.2;self_update-10-10;12.80 GiB-1016.88 MiB-4.13 GiB-16.00 GiB (aarch64-apple-darwin compiled with Rust 1.78) compiled
The text was updated successfully, but these errors were encountered: