Census Repository data model and CSV import behavior #69

danguita · 2016-11-22T09:00:41Z

Connects to #55.

What does this PR do?

This PR implements a starting data model and back-end for the coming Census-related features. The first of them consists on the ingestion of data from CSV files, that is already implemented as part of this PR.

Regarding that Census data model and back-end, there're a few concepts that come into play. Let me introduce them:

CensusItem: Represents a Census record in the system. Note that it doesn't have any constraints, validations or integrity rules because it is intended to just behave like a system-agnostic storage.
Admin::CensusImport: Represents any Census Importing processes through the Admin namespace. A new record from this class is created every time an Admin user performs a CSV (for now) importing process, and stores its output to be used in coming imports. It is used as an archaic versioning system (every CensusItem belongs to a Admin::CensusImport process), as well as to summarize some import details in the proper view.
CensusRepository: This is an abstraction to provide a common interface to any Census related operations. It is taking care of adapting parameters and attributes to, in example, deal with the hashed document_number field at CensusItem level. It is also providing validations instead of doing so at AR Model level so the actual back-end or storage behind remains transparent to us.
SecretAttribute: That's a quite simple library to apply a hashing algorithm against any of our model attributes. Not much else to say.
Admin::CensusImportForm: This one keeps all the logic behind the CSV importing process, including parsing, item building (via CensusRepository), CensusImport tracking (via Admin::CensusImport) and deletion of old records (from previous Admin::CensusImport processes).

How should this be manually tested?

The importing form can be found at https://gobierto.dev/admin/census/imports/new, that is also reachable form the Users management view (https://gobierto.dev/admin/users).

Regarding the UI, there're two scenarios to check:

No import operations have been yet performed (the @latest_import object is not present)
There is an import done, thus we can summarize its details.

As seen in the sandbox files, these two have slight differences.

Regarding the import behavior, it is expected that any CSV file with valid rows will create those records into the Repository. The operation is transactional (and won't be committed to the database)
in terms of database issues, but not in CSV format. This means that a CSV file with a few invalid rows can be partially imported into the database.

Final notes

There are a few improvements that can be still applied as soon as we do larger imports, such as:

Perform the importing process in background and notify the requester (Admin) via email when it finishes.
Perform Census item building in batches to avoid potential memory issues.

Those have been added to the corresponding issue's description: #55. Let's leave it there just to get back to them on next iterations.

- CensusItem - Admin::CensusImport

- Implement document_number attribute hashing via the SecretAttribute lib. - Filter document_number parameter logging.

ferblape · 2016-11-22T12:06:56Z

Impressive work!

I think we can move the improvemnts to a separate issue so we can close this one, for the time being they are not urgent.

danguita · 2016-11-22T12:11:57Z

Yep, agreed. Thanks man!

danguita added 2 commits November 22, 2016 09:19

Implement Census ingestion related data model

2ab2c55

- CensusItem - Admin::CensusImport

Implement CensusRepository to abstract operations from the back-end

c5ffe13

- Implement document_number attribute hashing via the SecretAttribute lib. - Filter document_number parameter logging.

danguita added the in progress label Nov 22, 2016

Implement Census importing action in Admin namespace

019ee81

danguita force-pushed the 55-census-data-ingestion branch from af1c75c to 019ee81 Compare November 22, 2016 09:09

ferblape merged commit 989d114 into master Nov 22, 2016

ferblape removed the in progress label Nov 22, 2016

danguita deleted the 55-census-data-ingestion branch November 22, 2016 12:12

danguita mentioned this pull request Nov 22, 2016

User verification process against Census data #72

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Census Repository data model and CSV import behavior #69

Census Repository data model and CSV import behavior #69

danguita commented Nov 22, 2016 •

edited

Loading

ferblape commented Nov 22, 2016

danguita commented Nov 22, 2016

Census Repository data model and CSV import behavior #69

Census Repository data model and CSV import behavior #69

Conversation

danguita commented Nov 22, 2016 • edited Loading

What does this PR do?

How should this be manually tested?

Final notes

ferblape commented Nov 22, 2016

danguita commented Nov 22, 2016

danguita commented Nov 22, 2016 •

edited

Loading