Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate minimal event data and minimal session data over from cdptools #10

Closed
evamaxfield opened this issue Oct 30, 2020 · 4 comments · Fixed by #26
Closed

Migrate minimal event data and minimal session data over from cdptools #10

evamaxfield opened this issue Oct 30, 2020 · 4 comments · Fixed by #26
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@evamaxfield
Copy link
Member

Use Case

Please provide a use case to help us understand your request in context

Standardize and version the acceptable data format we accept for pipelines with the pipeline code itself rather than scrapper side.

Solution

Please describe your ideal solution

Migrate the work done by @isaacna on cdptools to this repo.

Potentially change the name from MinimalEventData and MinimalSessionData to just EventData and SessionData as I think with the Optional keys, they are the full event data spec.

We should also make objects for VoteData and such, which are a part of the Optional[List[MinutesItems]]. Basically the entire data structure of all minimal data and optional data should be documented as the accepted data structure we accept for pipelines.

Notes

This unfortunately will require a bit of copying the database ORM definitions unless we can think of a way to programmatically construct all of these models. I.E. run through the DB ORM definitions and created NamedTuples of every single model with the appropriate Optional tags. I think it's possible.

@evamaxfield evamaxfield added enhancement New feature or request help wanted Extra attention is needed labels Oct 30, 2020
@evamaxfield evamaxfield added the good first issue Good for newcomers label Oct 30, 2020
@isaacna
Copy link
Collaborator

isaacna commented Nov 12, 2020

Note to self, will also be adding an Optional[List[EventMinutesItem]]field to the EventDataclass

@evamaxfield
Copy link
Member Author

Adding more examples, I am not sure this is 100% accurate but I think you get the idea:

constants.VoteDecisions.Approved = "approved"
constants.VoteDecisions.Rejected = "rejected"
# etc

Vote(NamedTuple):
    person: str  # this could even be further to a person named tuple but idk
    decision: str
    ...

EventMinutesItem(NamedTuple):
    votes: Optional[List[Vote]]
    matter: Optional[Matter]
    ...

EventData(NamedTuple):
    ...
    event_minutes_items: Optional[List[EventMinutesItem]]

I think you see what I am building at. This is basically just taking the database models all together and stacking them in an order / structure that is relevant to a single event. In which it may be best to not make this is NamedTuple but rather just use the fireo.Model

End goal however is to have a single object of EventData to return from some user defined get_events() function.

@isaacna
Copy link
Collaborator

isaacna commented Nov 25, 2020

Now that I think about it, using the existing fireo models might be difficult because some models have a bottom up reference, for example Session.event_ref.

But if we want to output a single EventData object, we'll need to store sessions as part of the EventData object as a top down reference. Unless there's a workaround I'm missing, this would be a circular dependency.

We could do something like typing Event.sessions as List[str] for document id references, and update them after the session documents are created? But at that point all the EventData wouldn't really be in a single object.

With this would it make more sense to create separate ingestion objects apart from the fireo models? There would have to be some logic that handles uploading the nested object data to the database in the proper order to not mess up references

@evamaxfield
Copy link
Member Author

evamaxfield commented Nov 25, 2020

Yea that's a good catch. I think it just makes sense to define a bunch of NamedTuple objects. Especially for documentation purposes having the ingestion structure really well documented would be nice.

Also note we can add / check / validate a bunch of the ingestion data based off of just primary keys where as the fireo models are "full objects" i.e. a Person model has name, email, etc, but in the ingestion, the dev could add those but they could also just add the person's name and we would take care of the rest because of primary key. (This is just an example of where there is more value in separating them because they serve different ideas)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants