EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data Publishing

  • T. Qamar Department of Computer Science and Software Engineering, Jinnah University for Women, Pakistan
  • N. Z. Bawany Department of Computer Science and Software Engineering, Jinnah University for Women, Pakistan
  • N. A. Khan Department of Computer Science & Information Technology, NED University of Engineering & Technology, Pakistan


The evolution of internet to the Internet of Things (IoT) gives an exponential rise to the data collection process. This drastic increase in the collection of a person’s private information represents a serious threat to his/her privacy. Privacy-Preserving Data Publishing (PPDP) is an area that provides a way of sharing data in their anonymized version, i.e. keeping the identity of a person undisclosed. Various anonymization models are available in the area of PPDP that guard privacy against numerous attacks. However, selecting the optimum model which balances utility and privacy is a challenging process. This study proposes the Efficient Data Anonymization Model Selector (EDAMS) for PPDP which generates an optimized anonymized dataset in terms of privacy and utility. EDAMS inputs the dataset with required parameters and produces its anonymized version by incorporating PPDP techniques while balancing utility and privacy. EDAMS is currently incorporating three PPDP techniques, namely k-anonymity, l-diversity, and t-closeness. It is tested against different variations of three datasets. The results are validated by testing each variation explicitly with the stated techniques. The results show the effectiveness of EDAMS by selecting the optimum model with minimal effort.

Keywords: data anonymization, privacy-preserving data publishing, k-anonymity, l-diversity, t-closeness


Download data is not yet available.


B. Marr, “How much data do we create every day? The mind-blowing stats everyone should read”, available at: www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read

R. Madge, “Five loopholes in the GDPR”, available at: medium.com/mydata/five-loopholes-in-the-gdpr-367443c4248b

J. Li, Y. Tao, X. Xiao, Preservation of proximity privacy in publishing numerical sensitive data, Chinese University of Hong Kong, 2008 DOI: https://doi.org/10.1145/1376616.1376666

L. Gomes, “Data analysis is creating new business opportunities”, available at: www.technologyreview.com/s/423897/data-analysis-is-creating-new-business-opportunities

J. Liu, “Privacy preserving data publishing: Current status and new directions”, Information Technology Journal, Vol. 11, No. 1, pp. 1–8, 2012 DOI: https://doi.org/10.3923/itj.2012.1.8

S. Chawla, C. Dwork, F. Mcsherry, A. Smith, H. Wee, “Toward privacy in public databases”, available at: www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tcc05-cdmsw.pdf, 1948

B. C. M. Fung, K. Wang, R. Chen, P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments”, ACM Computing Surveys, Vol. 42, No. 4, Article ID 14, 2010 DOI: https://doi.org/10.1145/1749603.1749605

A. Anjum, N. Ahmad, S. U. R. Malik, S. Zubair, B. Shahzad, “An efficient approach for publishing microdata for multiple sensitive attributes”, The Journal of Supercomputing, Vol. 74, pp. 5127–5155, 2018 DOI: https://doi.org/10.1007/s11227-018-2390-x

M, Barbaro, T. Zeller Jr., “A face is exposed for AOL searcher no. 4417749”, available at: www.nytimes.com/2006/08/09/technology/09aol.html

D. Vatsalan, P. Christen, C. M. O ’Keefe, V. S. Verykios, “An evaluation framework for privacy-preserving record linkage”, Journal of Privacy and Confidentiality, Vol. 6, No. 1, pp. 35-75, 2014 DOI: https://doi.org/10.29012/jpc.v6i1.636

K. E. Emam, S. Rodgers, B. Malin, “Anonymising and sharing individual patient data”, BMJ, Vol. 350, Article ID h1139, 2015 DOI: https://doi.org/10.1136/bmj.h1139

L. Sweeny, “k-anonymity: A model for protecting privacy”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, No. 5, pp. 557–570, 2002 DOI: https://doi.org/10.1142/S0218488502001648

A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity”, ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, pp. 1-12, 2007 DOI: https://doi.org/10.1145/1217299.1217302

N. Li, T. Li, S. Venkatasubramanian, t-closeness: Privacy beyond k-anonymity and l-diversity, CERIAS Tech Report 2007-78, 2007 DOI: https://doi.org/10.1109/ICDE.2007.367856

J. Liu, K. Wang, “On optimal anonymization for l+-diversity”, IEEE 26th International Conference on Data Engineering, Long Beach, USA, March 1-6, 2010 DOI: https://doi.org/10.1109/ICDE.2010.5447898

L. Sweeney, Matching known patients to health records in Washington state data, Harvard University, 2013 DOI: https://doi.org/10.2139/ssrn.2289850

L. Sweeney, J. S. Yoo, “De-anonymizing South Korean resident registration numbers shared in prescription data”, Technology Science, Article ID 2015092901, 2015

Y. A. D. Montjoye, L. Radaelli, V. K. Singh, A. S. Pentland, “Unique in the shopping mall: On the reidentifiability of credit card metadata”, Science, Vol. 347, No. 6221, pp. 536–539, 2015 DOI: https://doi.org/10.1126/science.1256297

A. Narayanan, V. Shmatikov, “Robust de-anonymization of large sparse datasets”, IEEE Symposium on Security and Privacy, Oakland, USA, May 18-22, 2008 DOI: https://doi.org/10.1109/SP.2008.33

N. Li, T. Li, S. Venkatasubramanian, “Closeness: A new privacy measure for data publishing”, IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 7, pp. 943–956, 2010 DOI: https://doi.org/10.1109/TKDE.2009.139

P. Samarati, L. Sweeney, “Generalizing data to provide anonymity when disclosing information”, 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Washington, USA, June, 1998 DOI: https://doi.org/10.1145/275487.275508

Z. E. Ouazzani, H. E. Bakkali, “A new technique ensuring privacy in big data: Variable t-closeness for sensitive numerical attributes”, 3rd International Conference of Cloud Computing Technologies and Applications, Rabat, Morocco, October 24-26, 2017 DOI: https://doi.org/10.1109/CloudTech.2017.8284733

S. S. Devi, R. Indhumathi, “A study on privacy-preserving approaches in online social network for data publishing”, in: Data Management, Analytics and Innovation, pp. 99–115, Springer, 2011

H. Zhu, H. B. Liang, L. Zhao, D. Y. Peng, L. Xiong, “τ-Safe (l,k)-diversity privacy model for sequential publication with high utility”, IEEE Access, Vol. 7, pp. 687–701, 2018 DOI: https://doi.org/10.1109/ACCESS.2018.2885618

A. S. M. T. Hasan, Q. Jiang, “A general framework for privacy preserving sequential data publishing”, 31st International Conference on Advanced Information Networking and Applications Workshop, Taipei, Taiwan, March 27-29, 2017

S. Hamid, N. Z. Bawany, S. Khan, “AcSIS: Authentication system based on image splicing”, Engineering, Technology & Applied Science Research, Vol. 9, No. 5, pp. 4808-4812, 2019 DOI: https://doi.org/10.48084/etasr.3060

M. O. A. Dwairi, A. Y. Hendi, Z. A. AlQadi, “An efficient and highly secure technique to encrypt and decrypt color images”, Engineering, Technology & Applied Science Research, Vol. 9, No. 3, pp. 4165-4168, 2019 DOI: https://doi.org/10.48084/etasr.2525

A. H. A. Omari, “Lightweight dynamic crypto algorithm for next internet generation”, Engineering, Technology & Applied Science Research, Vol. 9, No. 3, pp. 4203-4208, 2019 DOI: https://doi.org/10.48084/etasr.2743

UCI, Adult data set, available at: archive.ics.uci.edu/ml/datasets/adult

Employee Salary dataset, available at: www.kaggle.com/varungitboi/employee-salary-dataset

Open Data Philly, Crime incidents, available at: www.opendataphilly.org/dataset/crime-incidents


Abstract Views: 255
PDF Downloads: 135

Metrics Information
Bookmark and Share