Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_rde_data_line_re.search() stalls #5

Open
interplanetarychris opened this issue Oct 12, 2019 · 2 comments
Open

new_rde_data_line_re.search() stalls #5

interplanetarychris opened this issue Oct 12, 2019 · 2 comments
Labels
bug Something isn't working

Comments

@interplanetarychris
Copy link
Collaborator

This was previously TruSat/trusat-backend#26 as follows:

Following unit test updates, both SeeSat archive import scripts appear to be "paused," in a Ctrl-S terminal way, in the process of importing. Ctrl-C in the terminal allows them to continue, but it is unclear what is the underlying cause, or if data loss happens because of the Ctrl-C.

Debugger sessions have yet to refine the location of the problem itself.

From terminal feedback with the -V flag:
https://github.com/consensys-space/trusat-backend/blob/257bf4606147dbd927f3c3b9acf453b475c22656/database_tools/read_seesat_mbox.py#L346

...the script appears to consistently "pause" after the following lines:

Found  67 IOD obs in msg: 2015-03-19 23:23:00+01:00 LB Obs 2015 Mar 19
Found 162 IOD obs in msg: 2018-04-21 10:05:40+02:00 LB Obs 2018 Apr 20-21 night
Found   2 IOD obs in msg: 2019-03-27 22:56:50+01:00 Obs 2019 Mar 27 pm
Found   3 IOD obs in msg: 2019-09-20 07:28:25-04:00 slow moving unid seen on Sept 19

and has been traced to the following regex in iod.py:

https://github.com/consensys-space/trusat-orbit/blob/ea82c90af2645183318f7fc716a901960e0d5c65/iod.py#L760-L784

Re-writing the regex with no extraneous whitespace (to disallow the OR'ed flags for re.MULTILINE and re.VERBOSE) did not solve the problem.

It could be due to a problem with exponential possibilities on possible matches, referenced in https://bugs.python.org/issue29977

An interim "solution" is to use the previous RDE-block matching regexp (rde_format_re), which is not as comprehensive.

@interplanetarychris
Copy link
Collaborator Author

For reference, when importing the hypermail seesat archive, the older version of the regexp results in:
Processed 402373 observations in 81021 files in 25 directories.
(277864) IOD records (69.06 %)
(42221) UK records (10.49 %)
(82288) RDE records (20.45 %)

@interplanetarychris
Copy link
Collaborator Author

For reference in MBOX processing:
Processed 201211 observations in 12773 messages in 21.705 seconds.
(176197) IOD records (87.6 %)
( 3589) UK records ( 1.8 %)
( 21425) RDE records (10.6 %)

Last messageID imported from: [email protected]

@interplanetarychris interplanetarychris added the bug Something isn't working label Oct 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant