Name | Matric |
---|---|
ADAM WAFII BIN AZUAR | A20EC0003 |
HONG PEI GEOK | A20EC0044 |
MIKHEL ADAM BIN MUHAMMAD EZRIN | A20EC0237 |
QAISARA BINTI ROHZAN | A20EC0133 |
Datatable is the premier package for manipulating large tabular datasets. Large datasets can be aggregated quickly, columns can be added/updated/removed with low latency, ordered joins can be made quickly, and file reading can be done quickly.
Among the features we wish to implement with datatable are:
- Fast data reading from CSV and other formats.
- Efficient algorithms for sorting/grouping/joining.
- Minimal amount of data copying, copy-on-write semantics for shared data.
- Use "rowindex" views in filtering/sorting/grouping/joining operators to avoid unnecessary data copying.
The dataset can be downloaded from Kaggle: Rate.csv
Acronym | Description |
---|---|
BusinessYear | The year for which the rate information applies. |
StateCode | The two-letter code for the state in which the health insurance plan is offered. |
IssuerId | A unique identifier for the insurer offering the health insurance plan. |
SourceName | The source of the rate information (e.g. the insurer, the state insurance department). |
VersionNum | A version number for the rate information. |
ImportDate | The date on which the rate information was imported into the Marketplace database. |
IssuerId2 | A unique identifier for the insurer offering the health insurance plan. |
FederalTIN | Federal income taxes |
RateEffectiveDate | The date for which the rate information is effective. |
RateExpirationDate | The expire date for the rate. |
PlanId | A unique identifier for the health insurance plan. |
RatingAreaId | The age of the insured person for which the rate information applies. |
Tobacco | The rate information applies to tobacco users or non-tobacco users. |
Age | The age of the insured person for which the rate information applies. |
IndividualRate | The monthly premium (cost) for the health insurance plan for an individual. |
IndividualTobaccoRate | The monthly premium for the health insurance plan for an individual tobacco user. |
Couple | The monthly premium for the health insurance plan for a couple. |
PrimarySubscriberAndOneDependent | The primary subscriber for the health insurance plan and one dependent. |
PrimarySubscriberAndTwoDependents | The primary subscriber for the health insurance plan and two dependent. |
CoupleAndOneDependent | The monthly premium for the health insurance plan for a couple and one dependent. |
CoupleAndTwoDependents | The monthly premium for the health insurance plan for a couple and two dependents. |
CoupleAndThreeOrMoreDependents | The monthly premium for the health insurance plan for a couple and three or more dependents. |
RowNumber | The row number of rate information. |
In conclusion, there are a variety of operations that can be performed by using Datatable and it is suitable to be used in large datasets. This is because it enables multi-threaded data processing, out-of-memory datasets, and configurable APIs. Therefore, there will be less time required to process the data.