Accurate and efficient data integration

Project description

This project explores the frontiers of data integration (known also as record linkage) which seeks to match records representing the same entity from multiple heterogenous, noisy datasets. A highly practical problem first studied for national Census data, data integration is now used across numerous sectors including technology, medicine, finance, government. Working in the database community (publishing in for example VLDB) we bring a machine learning and mathematical statistics view to the area, seeking scalable algorithms with guarantees on data efficiency. This project connects with another (differential privacy) project, in also considering data privacy with security colleagues in the school.

Project team

Leader: Ben Rubinstein

Staff: Shen Wang

Students: Neil Marchant

Sponsors: Australian Research Council, Australian Bureau of Statistics

Other projects

Networks and data in society projects


Computing and Information Systems


Networks and data in society


database systems; machine learning