Big Data Quality: Factors, Frameworks, and Challenges
AbstractBig Data applications are widely used in many fields; Artificial Intelligent, Marketing, Commercial applications, and Health care, as we have seen the role of Bid Data in the Convid-19 pandemic. Therefore, to ensure that the Big Data applications are used and generated in good quality for their consumers. It is important to have quality factors that the Big Data applications should satisfy, quality frameworks that applied and tested the quality factors for the Big Data application. However, the quality measurement process has some challenges to be applicable and trustworthy. In this research, we have listed different quality factors and dimensions and quality frameworks that are commonly used to measure the Big Data quality measurement. Also, we listed the frequent challenges that the researchers and data scientists are faced through the Big Data quality measurement process.
Rijmenam, M.v. A Short History Of Big Data. 2013 [cited 2018 16/11/2018]; Available from: https://datafloq.com/read/big-data-history/239.
Rider, F., The Scholar and the Future of the Research Library. A Problem and Its Solution. 1944: Hadham Press.
Sagiroglu, S. and D. Sinanc. Big Data: A review. in Collaboration Technologies and Systems (CTS), 2013 International Conference on. 2013. IEEE.
Kataria, M. and M.P. Mittal, Big Data: a review. International Journal of Computer Science and Mobile Computing, 2014. 3(7): p. 106-110.
Reinsel, D. and J. Gantz. The Digital Universe in 2020. 2012 [cited 2020; Available from: https://www.emc.com/leadership/digital-universe/2012iview/index.htm.
Dumbill, E., Making Sense of Big Data. Big Data, 2013. 1: p. 1-2.
Laney, D., 3D Management: Controlling Data Volume, Velocity, and Variety, in Application Delivery Strategies. 2001, META Group: blogs.gartner.com.
Khan, N., et al., The 51 Vâ€™s Of Big Data: Survey, Technologies, Characteristics, Opportunities, Issues and Challenges, in Proceedings of the International Conference on Omni-Layer Intelligent Systems. 2019, Association for Computing Machinery: Crete, Greece. p. 19â€“24.
Gandomi, A. and M. Haider, Beyond the hype: Big Data concepts, methods, and analytics. International Journal of Information Management, 2015. 35(2): p. 137-144.
Abdallah, M. Big Data Quality Challenges. in 2019 International Conference on Big Data and Computational Intelligence (ICBDCI). 2019. IEEE.
Batini, C., et al., From Data Quality to Big Data Quality. Journal of Database Management, 2015. 26: p. 60-82.
Strong, D.M., Y.W. Lee, and R.Y. Wang, Data quality in context. Communications of the ACM, 1997. 40(5): p. 103-110.
Pipino, L.L., Y.W. Lee, and R.Y. Wang, Data quality assessment. Commun. ACM, 2002. 45(4): p. 211-218.
Sidi, F., et al., Data quality: A survey of data quality dimensions. 2013.
Salih, F.I., et al. Data Quality Issues in Big Data: A Review. 2019. Cham: Springer International Publishing.
Mirzaie, M., B. Behkamal, and S. Paydar, State of the Art on the Quality of Big Data: A Systematic Literature Review and Classification Framework. 2019.
Mirzaie, M., B. Behkamal, and S. Paydar, Big Data Quality: A systematic literature review and future research directions. arXiv preprint arXiv:1904.05353, 2019.
Abdullah, N., et al., Data quality in Big Data: A review. 2015. 7: p. 16-27.
Hiba, J., et al., BIG DATA AND FIVE V'S CHARACTERISTICS. 2015: p. 2393-2835.
Ishwarappa and J. Anuradha, A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology. Procedia Computer Science, 2015. 48: p. 319-324.
Khan, M., M. Uddin, and N. Gupta, Seven V's of Big Data understanding Big Data to extract value. 2014. 1-5.
Khan, N., et al., The 10 Vs, Issues and Challenges of Big Data. 2018. 52-56.
Gani, A., et al., A survey on indexing techniques for Big Data: taxonomy and performance evaluation. Knowl. Inf. Syst., 2016. 46(2): p. 241â€“284.
Cai, L. and Y. Zhu, The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, 2015. 14.
Taleb, I., M.A. Serhani, and R. Dssouli. Big Data Quality: A Survey. in 2018 IEEE International Congress on Big Data (BigData Congress). 2018.
Catarci, T., et al. My (fair) Big Data. in 2017 IEEE International Conference on Big Data (Big Data). 2017.
Gyulgyulyan, E., et al., Data Quality Alerting Model for Big Data Analytics. 2019. p. 489-500.
Ridzuan, F. and W.M.N.W. Zainon, A Review on Data Cleansing Methods for Big Data. Procedia Computer Science, 2019. 161: p. 731-738.
Someswararao, C., Data Cleaning â€“ A Framework For Robust Data Quality In Enterprise Data Warehouse. International Journal of Computer Science and Technology, 2012.
Cichy, C. and S. Rass, An Overview of Data Quality Frameworks. IEEE Access, 2019. 7: p. 24634-24648.
Bath, G., The Next Generation Tester: Meeting the Challenges of a Changing ITWorld, in The Future of Software Quality Assurance, S. Goericke, Editor. 2020, SpringerOpen.
Staegemann, D., et al., Exploring the Specificities and Challenges of Testing Big Data Systems, in The 15th International Conference on SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS. 2019: Sorrento, Italy.
Technology, N.I.o.S.a. NIST Big Data Interoperability Framework: Volume 1, Definitions. 2018 [cited 2020 10/04/2020].
Gani, A., et al., A survey on Indexing Techniques for Big Data: Taxonomy and Performance Evaluation. Knowledge and Information Systems, 2015. 46.
Mills, S., et al., Demystifying big data: a practical guide to transforming the business of government. 2012, TechAmerica Foundation: Washington.
Katal, A., M. Wazid, and R.H. Goudar, Big Data: Issues, challenges, tools and Good practices. 2013. 404-409.
Elgendy, N. and A. Elragal, Big Data Analytics: A Literature Review Paper. Vol. 8557. 2014. 214-227.
Loshin, D., Evaluating the business impacts of poor data quality. Information Quality Journal, 2011.
Redman, T., The Impact of Poor Data Quality on the Typical Enterprise. Communications of the ACM, 1998. 41.
Samsudeen, S.N. and H. Atham, Impacts and Challenges of Big Data: A Review. 2020. 24: p. 2020.
Haug, A., F. Zachariassen, and D. Van Liempd, The costs of poor data quality. Journal of Industrial Engineering and Management (JIEM), 2011. 4(2): p. 168-193.
Press, G. 12 Big Data Definitions: What's Yours? 2014 [cited 2020 16/4/2020]; Available from: https://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#497bbf8713ae.
Geczy, P., Big Data characteristics. The Macrotheme Review, 2014. 3(6): p. 94-104.
Kolajo, T., O. Daramola, and A. Adebiyi, Big Data stream analysis: a systematic literature review. Journal of Big Data, 2019. 6(1): p. 47.
The submitter hereby warrants that the Work (collectively, the “Materials”) is original and that he/she is the author of the Materials. To the extent the Materials incorporate text passages, figures, data or other material from the works of others, the undersigned has obtained any necessary permissions. Where necessary, the undersigned has obtained all third party permissions and consents to grant the license above and has all copies of such permissions and consents.
The submitter represents that he/she has the power and authority to make and execute this assignment. The submitter agrees to indemnify and hold harmless the COMPUSOFT from any damage or expense that may arise in the event of a breach of any of the warranties set forth above. For authenticity, validity and originality of the research paper the author/authors will be totally responsible.