Skip to content

This scripts read a matrix-styled data file, containing missing values, and infers these values by finding the k-nearest neighbors

License

Notifications You must be signed in to change notification settings

ninashenker/data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

data analysis

This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people.

  1. Is the age and gender distribution "normal" in the database? A yes/no answer is not good enough.
  2. At what age does the men become fathers first time (max age, min age, average age)?
  3. Is the distribution of first-time fatherhood age "normal"? A yes/no answer is not good enough.
  4. At what age does the women become mothers first time (max age, min age, average age)?
  5. Is the distribution of first-time motherhood age "normal"? A yes/no answer is not good enough.
  6. How many men and women do not have children (in percent)?
  7. What is the average age difference between the parents (with a child in common obviously)?
  8. How many people in percent has at least one grandparent that is still alive? A person is living if he/she is in the database.
  9. For those who have cousins, what is the average number of cousins?
  10. Is the firstborn likely to be male or female?
  11. How many men/women (percentage) have children with more than one woman/man?
  12. Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.
  13. Do tall parents get tall children?
  14. Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.
  15. Using the knowledge of blood group type inheritance, are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the "true" parents.
  16. Make a list of fathers who can donate blood to their sons. The list must identify must the father and the son(s) and their blood type. You must write the length of the list in the report.
  17. Make a list of persons who can donate blood to their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report.

About

This scripts read a matrix-styled data file, containing missing values, and infers these values by finding the k-nearest neighbors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages