Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ghost detections (at DB & pkg) #68

Open
peterdesmet opened this issue Feb 22, 2018 · 29 comments
Open

Handle ghost detections (at DB & pkg) #68

peterdesmet opened this issue Feb 22, 2018 · 29 comments
Labels
database Related to ETN database question Further information is requested

Comments

@peterdesmet
Copy link
Member

Here's a quick overview of the data that will be included in the dataset we will publish. @PieterjanVerhelst @IPauwels can you have a look if this makes sense? Let me know if you need more info.

animal_project_name animals.scientific_name detections individuals stations start_date end_date
2011 Rivierprik Lampetra fluviatilis 114605 29 29 2011-12-14 2012-07-03
2012 Leopoldkanaal Anguilla anguilla 2215829 92 60 2012-07-04 2017-03-12
2014 Demer Petromyzon marinus 42 1 1 2015-05-06 2015-05-12
2014 Demer Rutilus rutilus 11030 2 9 2014-04-19 2014-06-28
2014 Demer Silurus glanis 86023 9 46 2014-04-25 2018-01-31
2014 Demer Squalius cephalus 139013 2 10 2014-04-30 2015-02-12
2015 Dijle Anguilla anguilla 41798 1 7 2015-05-01 2015-10-15
2015 Dijle Cyprinus carpio 4944 2 9 2015-04-23 2015-11-06
2015 Dijle Platichthys flesus 101488 8 28 2015-04-29 2016-04-08
2015 Dijle Rutilus rutilus 7870 4 9 2015-04-23 2015-09-14
2015 Dijle Silurus glanis 78331 11 25 2015-04-22 2017-09-16
@peterdesmet peterdesmet changed the title Info included in data dump Info regarding the data included in the dataset Feb 22, 2018
@PieterjanVerhelst
Copy link
Collaborator

The project '2012 Leopoldkanaal' ended in January 2016 (i.e. the last receiver of that project was removed on 18/01/2016). Could you check which eel was detected till 2017? Was it 29920?

@peterdesmet
Copy link
Member Author

peterdesmet commented Feb 22, 2018

@PieterjanVerhelst very few detections after January 1, 2016:

datetime transmitter deployment_station_name
2016-05-05T21:55:57Z A69-1601-29954 S07
2017-02-18T02:10:39Z A69-1601-29961 s-Wetteren
2017-02-20T14:38:25Z A69-1601-29961 s-Wetteren
2017-03-12T00:43:14Z A69-1601-29961 s-Wetteren
2017-03-12T02:31:25Z A69-1601-29961 s-Wetteren

@PieterjanVerhelst
Copy link
Collaborator

Could you give the station name for those detections?

@peterdesmet
Copy link
Member Author

Updated (+ plus in original overview it now has start, end, not end, start)

@PieterjanVerhelst
Copy link
Collaborator

I just checked the battery end dates of those two tags and they dropped dead in February 2015. These 5 detections are likely ghost detections and can be removed from the dataset.
However, this poses the urge for a quality check regarding tag detections post battery end date.

@peterdesmet peterdesmet changed the title Info regarding the data included in the dataset Check dataset for ghost detections post battery end date Feb 22, 2018
@peterdesmet
Copy link
Member Author

Indeed. I've updated this issue title. Can you check this and remove those from the database itself? I think that's better than me removing them from my data dump.

Once done and checked, let me know if there were many: if not, I'll remove them from my dump. If yes, I'll ask for a new dump.

@PieterjanVerhelst
Copy link
Collaborator

I finally found time to get to this issue :-). I discussed this with the VLIZ team and they prefer that such data removals don't occur at the database level. @peterdesmet what do you think?

@peterdesmet
Copy link
Member Author

Can they be flagged by the user as ghost detections, so these can be filtered upon?

@PieterjanVerhelst
Copy link
Collaborator

That should be possible. I'll check with them.

@PieterjanVerhelst
Copy link
Collaborator

It should be possible to add a column with a boolean TRUE FALSE. @jreubens @bwydoogh this can be implemented? If this would be done, we only need a rule to consider ghost detections.

@bwydoogh
Copy link

Can they be flagged by the user as ghost detections, so these can be filtered upon?

Yes, why not (if Jan agrees; I hope he is also watching this Github repo).
What are the exact rules to set that flag to TRUE?

@PieterjanVerhelst
Copy link
Collaborator

We could consider a ghost detection when the detection timestamp > battery end date & there is no recapture date.

@jreubens
Copy link
Collaborator

It seems logic to me that this should be flagged as 'possible ghost detection'.
However, it should be clear that it is the responsibility of the user/data owner to use or not use these detections. It is just a flag.
Having said this, we should have clear rules what we consider as ghost detections.
THe rule mentioned by Pieterjan can be a start (this is a simple example. However, there should be more rules (and some might be quite complicated)....
We also need to take into account possible tag ID duplication, due to the fact that several brands use the same set of ID codes...

I suggest that we start with the implementation of the rule of Pieterjan, but we should have a brainstorm on other possible rules as well

@PieterjanVerhelst
Copy link
Collaborator

Notably, Since the exact hour of tagging and therefore battery end date are often unknown, I would add a buffer of at least one day. Or even a month. So a detection is considered a ghost detection if it occurred > 1 month after the battery end date. Detections < 1 months should be checked by the researcher if a wrong tagging time stamp was given.

@stijnvanhoey
Copy link
Contributor

@jreubens
Copy link
Collaborator

jreubens commented Apr 27, 2018 via email

@bwydoogh
Copy link

bwydoogh commented Oct 9, 2018

@jreubens @PieterjanVerhelst @stijnvanhoey
How do proceed with this topic? Will you use an R package to filter ghost detections, or do we / I add a field in the ETN DB, table detections?
Notice that on a total of (currently) 40 million detection records, we have 2.921.976 detection records where detections.datetime > deployments.drop_dead_date (and 2.189.176 where detections.datetime > deployments.drop_dead_date + INTERVAL '1 month').

Other things to pay attention to:

  • 868 deployments with no deployments.drop_dead_date (on a total of 2.143)

@stijnvanhoey
Copy link
Contributor

My proposal:

  • define a clear definition about the ghost-detections without arbitrary concepts like + 1 month - @jreubens and @PieterjanVerhelst
  • document the definition in this package (documentation website) and other relevant ETN user sources @jreubens
  • add the column with TRUE/FALSE about 'possible ghost detection' @bwydoogh

@PieterjanVerhelst
Copy link
Collaborator

I would suggest that a ghost detection is a detection of which the time < activation time & time > drop dead date.
Since tags have a programmed drop dead date according to the manufacturer (Vemco), I don't understand how it is possible we have so many detections with detection time > drop dead date.

@jreubens
Copy link
Collaborator

IMOS has written a nice piece of code on QCs
see 'https://github.com/aodn/aatams/tree/master/scripts/R/QC'
I suggest we currently keep this on hold. We should discuss with IMOS first

The reason we have so many detections after deployment recovery is because we have receivers with a built-in tag that keep on pinging until you disconnect battery (which is not always immediately done.
Anyway, we have to look at this in more detail....

@peterdesmet peterdesmet transferred this issue from inbo/etn Jan 16, 2019
@PieterjanVerhelst
Copy link
Collaborator

Is there still need for a conversation & brainstorm to implement rules related to ghost detections?

@peterdesmet peterdesmet transferred this issue from inbo/etn-occurrences Jan 17, 2019
@peterdesmet peterdesmet changed the title Check dataset for ghost detections post battery end date Handle ghost detections (at DB & pkg) Jan 17, 2019
@peterdesmet
Copy link
Member Author

@PieterjanVerhelst I don't know. But I would like you to have a look at the original question, which is now at inbo/etn-occurrences#7 😄

@peterdesmet
Copy link
Member Author

@PieterjanVerhelst @jreubens @IPauwels this issue has been dormant. No need to read it all, the basic question is: do you want an automatic assessment (by the database (ideally) or package) to assess which detections are likely to be ghost detections?

@peterdesmet peterdesmet added the question Further information is requested label Mar 20, 2020
@PieterjanVerhelst
Copy link
Collaborator

I would say yes, to some degree. That is, all 'detections' occurring after the battery of the transmitter dropped dead, I would consider ghost detections. Other forms of uncertainty are up to the researcher to decide what to do with it (include it in analysis or not). However, I think @jreubens may have some additional thoughts on this one ;-).

@IPauwels
Copy link
Collaborator

I will leave it up to Jan as well, but had this small thought: you are never sure about the actual battery-end-date of the transmitter isn't it? So perhaps detections after the expected end date are still real detections.. Or did I understand your suggestion wrongly PJ?

@PieterjanVerhelst
Copy link
Collaborator

Transmitters have an end date and on that particular date, the transmitter runs dead.

@jreubens
Copy link
Collaborator

Yes this is needed! However to my opinion this should be tackled on DB level.

@peterdesmet peterdesmet added the database Related to ETN database label Mar 25, 2020
@peterdesmet
Copy link
Member Author

@jreubens should I leave this issue open then? Or are you following this up on your side?

@PietrH
Copy link
Member

PietrH commented Sep 9, 2024

@jreubens , @peterdesmet Can this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database Related to ETN database question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants