EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data Publishing

  • T. Qamar Department of Computer Science and Software Engineering, Jinnah University for Women, Pakistan
  • N. Z. Bawany Department of Computer Science and Software Engineering, Jinnah University for Women, Pakistan
  • N. A. Khan Department of Computer Science & Information Technology, NED University of Engineering & Technology, Pakistan
Keywords: data anonymization, privacy-preserving data publishing, k-anonymity, l-diversity, t-closeness

Abstract

The evolution of internet to the Internet of Things (IoT) gives an exponential rise to the data collection process. This drastic increase in the collection of a person’s private information represents a serious threat to his/her privacy. Privacy-Preserving Data Publishing (PPDP) is an area that provides a way of sharing data in their anonymized version, i.e. keeping the identity of a person undisclosed. Various anonymization models are available in the area of PPDP that guard privacy against numerous attacks. However, selecting the optimum model which balances utility and privacy is a challenging process. This study proposes the Efficient Data Anonymization Model Selector (EDAMS) for PPDP which generates an optimized anonymized dataset in terms of privacy and utility. EDAMS inputs the dataset with required parameters and produces its anonymized version by incorporating PPDP techniques while balancing utility and privacy. EDAMS is currently incorporating three PPDP techniques, namely k-anonymity, l-diversity, and t-closeness. It is tested against different variations of three datasets. The results are validated by testing each variation explicitly with the stated techniques. The results show the effectiveness of EDAMS by selecting the optimum model with minimal effort.

Downloads

Download data is not yet available.

References

B. Marr, “How much data do we create every day? The mind-blowing stats everyone should read”, available at: www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read

R. Madge, “Five loopholes in the GDPR”, available at: medium.com/mydata/five-loopholes-in-the-gdpr-367443c4248b

J. Li, Y. Tao, X. Xiao, Preservation of proximity privacy in publishing numerical sensitive data, Chinese University of Hong Kong, 2008

L. Gomes, “Data analysis is creating new business opportunities”, available at: www.technologyreview.com/s/423897/data-analysis-is-creating-new-business-opportunities

J. Liu, “Privacy preserving data publishing: Current status and new directions”, Information Technology Journal, Vol. 11, No. 1, pp. 1–8, 2012

S. Chawla, C. Dwork, F. Mcsherry, A. Smith, H. Wee, “Toward privacy in public databases”, available at: www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tcc05-cdmsw.pdf, 1948

B. C. M. Fung, K. Wang, R. Chen, P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments”, ACM Computing Surveys, Vol. 42, No. 4, Article ID 14, 2010

A. Anjum, N. Ahmad, S. U. R. Malik, S. Zubair, B. Shahzad, “An efficient approach for publishing microdata for multiple sensitive attributes”, The Journal of Supercomputing, Vol. 74, pp. 5127–5155, 2018

M, Barbaro, T. Zeller Jr., “A face is exposed for AOL searcher no. 4417749”, available at: www.nytimes.com/2006/08/09/technology/09aol.html

D. Vatsalan, P. Christen, C. M. O ’Keefe, V. S. Verykios, “An evaluation framework for privacy-preserving record linkage”, Journal of Privacy and Confidentiality, Vol. 6, No. 1, pp. 35-75, 2014

K. E. Emam, S. Rodgers, B. Malin, “Anonymising and sharing individual patient data”, BMJ, Vol. 350, Article ID h1139, 2015

L. Sweeny, “k-anonymity: A model for protecting privacy”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, No. 5, pp. 557–570, 2002

A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity”, ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, pp. 1-12, 2007

N. Li, T. Li, S. Venkatasubramanian, t-closeness: Privacy beyond k-anonymity and l-diversity, CERIAS Tech Report 2007-78, 2007

J. Liu, K. Wang, “On optimal anonymization for l+-diversity”, IEEE 26th International Conference on Data Engineering, Long Beach, USA, March 1-6, 2010

L. Sweeney, Matching known patients to health records in Washington state data, Harvard University, 2013

L. Sweeney, J. S. Yoo, “De-anonymizing South Korean resident registration numbers shared in prescription data”, Technology Science, Article ID 2015092901, 2015

Y. A. D. Montjoye, L. Radaelli, V. K. Singh, A. S. Pentland, “Unique in the shopping mall: On the reidentifiability of credit card metadata”, Science, Vol. 347, No. 6221, pp. 536–539, 2015

A. Narayanan, V. Shmatikov, “Robust de-anonymization of large sparse datasets”, IEEE Symposium on Security and Privacy, Oakland, USA, May 18-22, 2008

N. Li, T. Li, S. Venkatasubramanian, “Closeness: A new privacy measure for data publishing”, IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 7, pp. 943–956, 2010

P. Samarati, L. Sweeney, “Generalizing data to provide anonymity when disclosing information”, 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Washington, USA, June, 1998

Z. E. Ouazzani, H. E. Bakkali, “A new technique ensuring privacy in big data: Variable t-closeness for sensitive numerical attributes”, 3rd International Conference of Cloud Computing Technologies and Applications, Rabat, Morocco, October 24-26, 2017

S. S. Devi, R. Indhumathi, “A study on privacy-preserving approaches in online social network for data publishing”, in: Data Management, Analytics and Innovation, pp. 99–115, Springer, 2011

H. Zhu, H. B. Liang, L. Zhao, D. Y. Peng, L. Xiong, “τ-Safe (l,k)-diversity privacy model for sequential publication with high utility”, IEEE Access, Vol. 7, pp. 687–701, 2018

A. S. M. T. Hasan, Q. Jiang, “A general framework for privacy preserving sequential data publishing”, 31st International Conference on Advanced Information Networking and Applications Workshop, Taipei, Taiwan, March 27-29, 2017

S. Hamid, N. Z. Bawany, S. Khan, “AcSIS: Authentication system based on image splicing”, Engineering, Technology & Applied Science Research, Vol. 9, No. 5, pp. 4808-4812, 2019

M. O. A. Dwairi, A. Y. Hendi, Z. A. AlQadi, “An efficient and highly secure technique to encrypt and decrypt color images”, Engineering, Technology & Applied Science Research, Vol. 9, No. 3, pp. 4165-4168, 2019

A. H. A. Omari, “Lightweight dynamic crypto algorithm for next internet generation”, Engineering, Technology & Applied Science Research, Vol. 9, No. 3, pp. 4203-4208, 2019

UCI, Adult data set, available at: archive.ics.uci.edu/ml/datasets/adult

Employee Salary dataset, available at: www.kaggle.com/varungitboi/employee-salary-dataset

Open Data Philly, Crime incidents, available at: www.opendataphilly.org/dataset/crime-incidents

Metrics

Abstract Views: 118
PDF Downloads: 51

Metrics Information
Bookmark and Share