Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Census Repository data model and CSV import behavior #69

Merged
merged 3 commits into from
Nov 22, 2016

Conversation

danguita
Copy link
Contributor

@danguita danguita commented Nov 22, 2016

Connects to #55.

What does this PR do?

This PR implements a starting data model and back-end for the coming Census-related features. The first of them consists on the ingestion of data from CSV files, that is already implemented as part of this PR.

Regarding that Census data model and back-end, there're a few concepts that come into play. Let me introduce them:

  • CensusItem: Represents a Census record in the system. Note that it doesn't have any constraints, validations or integrity rules because it is intended to just behave like a system-agnostic storage.
  • Admin::CensusImport: Represents any Census Importing processes through the Admin namespace. A new record from this class is created every time an Admin user performs a CSV (for now) importing process, and stores its output to be used in coming imports. It is used as an archaic versioning system (every CensusItem belongs to a Admin::CensusImport process), as well as to summarize some import details in the proper view.
  • CensusRepository: This is an abstraction to provide a common interface to any Census related operations. It is taking care of adapting parameters and attributes to, in example, deal with the hashed document_number field at CensusItem level. It is also providing validations instead of doing so at AR Model level so the actual back-end or storage behind remains transparent to us.
  • SecretAttribute: That's a quite simple library to apply a hashing algorithm against any of our model attributes. Not much else to say.
  • Admin::CensusImportForm: This one keeps all the logic behind the CSV importing process, including parsing, item building (via CensusRepository), CensusImport tracking (via Admin::CensusImport) and deletion of old records (from previous Admin::CensusImport processes).

How should this be manually tested?

The importing form can be found at https://gobierto.dev/admin/census/imports/new, that is also reachable form the Users management view (https://gobierto.dev/admin/users).

Regarding the UI, there're two scenarios to check:

  • No import operations have been yet performed (the @latest_import object is not present)
  • There is an import done, thus we can summarize its details.

As seen in the sandbox files, these two have slight differences.

Regarding the import behavior, it is expected that any CSV file with valid rows will create those records into the Repository. The operation is transactional (and won't be committed to the database)
in terms of database issues, but not in CSV format. This means that a CSV file with a few invalid rows can be partially imported into the database.

Final notes

There are a few improvements that can be still applied as soon as we do larger imports, such as:

  • Perform the importing process in background and notify the requester (Admin) via email when it finishes.
  • Perform Census item building in batches to avoid potential memory issues.

Those have been added to the corresponding issue's description: #55. Let's leave it there just to get back to them on next iterations.

- CensusItem
- Admin::CensusImport
- Implement document_number attribute hashing via the SecretAttribute lib.
- Filter document_number parameter logging.
@ferblape
Copy link
Member

Impressive work!

I think we can move the improvemnts to a separate issue so we can close this one, for the time being they are not urgent.

@ferblape ferblape merged commit 989d114 into master Nov 22, 2016
@danguita
Copy link
Contributor Author

Yep, agreed. Thanks man!

@danguita danguita deleted the 55-census-data-ingestion branch November 22, 2016 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants