root
|-- recid: string (nullable = true)
|-- givename: string (nullable = true)
|-- surname: string (nullable = true)
|-- suburb: string (nullable = true)
|-- postcode: string (nullable = true)
recId
entites with the same recId refer to the same entity.
-
Copy the unzip files into the data directory.
-
More information about the research see Evaluation of entity resolution approaches on real-world match problems
people.distinct()
.repartition(4)
.write
.option("compression","gzip")
.format("csv")
.mode(SaveMode.Overwrite)
.save("file:/home/jovyan/work/data/de-duplicated/")