Optimization Of Big Data Processing Using Distributed Computing In Cloud Environments
DOI:
https://doi.org/10.62951/ijcts.v1i2.58Keywords:
Big data, Distributed computing, Cloud computing, Apache Hadoop, Apache SparkAbstract
The rapid growth of big data has significantly increased the demand for efficient and scalable data processing methods, particularly within cloud computing environments. This study aims to evaluate the effectiveness of distributed computing frameworks, specifically Apache Hadoop and Apache Spark, in optimizing big data processing. A qualitative approach using a Systematic Literature Review (SLR) method is employed to analyze existing studies related to distributed systems, cloud computing architectures, and performance optimization techniques. The analysis focuses on key performance indicators, including processing speed, resource utilization, and scalability, as well as the suitability of each framework for different data processing scenarios. The findings indicate that Apache Hadoop is highly effective for batch processing and storage-intensive tasks due to its disk-based architecture, while Apache Spark demonstrates superior performance in real-time and iterative processing through its in-memory computing capabilities. Additionally, system configuration factors such as cluster size, memory allocation, and network bandwidth are identified as critical elements influencing overall performance. The study also highlights emerging trends, including the adoption of hybrid cloud environments, the integration of artificial intelligence and machine learning, and the utilization of edge computing to enhance real-time data processing. In conclusion, distributed computing frameworks play a vital role in improving the efficiency and scalability of big data processing in cloud environments. The selection of an appropriate framework, combined with optimized system configuration, can significantly enhance operational performance and support data-driven decision-making.
References
M. Mahmoudian and others, “An overview of big data concepts, methods, and analytics,” in Proc. IEEE Global Power, Energy and Communication Conf., 2023, pp. 554–559. doi: 10.1109/GPECOM58364.2023.10175760.
Y. Zhang and others, “A survey on emerging computing paradigms for big data,” Chinese J. Electron., vol. 26, no. 1, pp. 1–12, 2017, doi: 10.1049/cje.2016.11.016.
Z. Naamane, “A systematic literature review: Benefits and challenges of cloud-based big data analytics,” Issues Inf. Syst., vol. 24, no. 1, pp. 291–304, 2023, doi: 10.48009/1_iis_2023_125.
K. Pareek and others, “Big data in cloud computing: Reviews and opportunities,” in Proc. Int. Conf. Advances in Computing, Control, and Telecommunication Technologies, 2024, pp. 1454–1458.
V. Shrivastava and others, “Evolutionary patterns in modern-era cloud-based healthcare technologies,” in Lecture Notes in Networks and Systems, vol. 878, 2024, pp. 19–32. doi: 10.1007/978-981-99-9489-2_3.
S. Drisya and K. Sreekumar, “Volume challenges of big data: Traditional vs big data,” J. Adv. Res. Dyn. Control Syst., vol. 9, pp. 326–329, 2017.
D. Swami and others, “Storing and analyzing streaming data: A big data challenge,” in Big Data Analytics: Tools and Technology, 2017, pp. 229–245. doi: 10.1201/b21822.
S. A. El-Seoud and others, “Big data and cloud computing: Trends and challenges,” Int. J. Interact. Mob. Technol., vol. 11, no. 2, pp. 34–52, 2017, doi: 10.3991/ijim.v11i2.6561.
H. Sohail and others, “Challenges and opportunities in big data and cloud computing,” in Lecture Notes ICST, 2017, pp. 175–181. doi: 10.1007/978-3-319-51207-5_17.
S. Kalyani and others, “Pattern recognition applications in distributed systems and distributed machine learning,” in Decentralized Systems and Distributed Computing, 2024, pp. 117–144. doi: 10.1002/9781394205127.ch6.
G. Avirappattu, “On efficient acquisition and recovery methods for certain types of big data,” in Big Data: Concepts, Methodologies, Tools, and Applications, 2016, pp. 105–115. doi: 10.4018/978-1-4666-9840-6.ch006.
I. D. Corporation, “Data age 2025: The evolution of data to life-critical,” 2020.
N. Marz and J. Warren, Big Data: Principles and Best Practices of Scalable Real-Time Data Systems. Manning Publications, 2015.
M. Zaharia and others, Spark: The Definitive Guide: Big Data Processing Made Simple. O’Reilly Media, 2016.
M. Chen, S. Mao, and Y. Liu, “Big data: A survey on applications and security issues,” IEEE Access, vol. 7, pp. 2320–2340, 2019, doi: 10.1109/ACCESS.2019.2891586.
Q. Liu and S. J. Qin, “Perspectives on big data modeling of process industries,” Acta Autom. Sin., vol. 42, no. 2, pp. 161–171, 2016, doi: 10.16383/j.aas.2016.c150510.
A. Ashabi and others, “Big data: Current challenges and future scope,” in IEEE Symp. Computer Applications and Industrial Electronics, 2020, pp. 131–134. doi: 10.1109/ISCAIE47305.2020.9108826.
G. Erboz, “Review of big data: Opportunities and challenges,” in IADIS Int. Conf. Information Systems, 2018, pp. 279–282.
K. Djouzi and K. Beghdad-Bey, “A review of clustering algorithms for big data,” in 4th Int. Conf. Networking and Advanced Systems, 2019. doi: 10.1109/ICNAS.2019.8807822.
A. David and N. Ndjock, “Big data, knowledge organization and decision making,” Adv. Knowl. Organ., vol. 16, pp. 95–102, 2018.
Y. Tamura and S. Yamada, “Large scale fault data analysis and OSS reliability assessment,” Mach. Learn. Knowl. Extr., vol. 2, no. 4, pp. 436–452, 2020, doi: 10.3390/make2040024.
S. Goyal and S. Bhushan, “An optimized model for energy efficiency on cloud system,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 9, pp. 138–144, 2019, doi: 10.35940/ijitee.I1022.0789S19.
S. Shukla and others, “Security challenges in cloud,” in Int. Conf. Green Computing and Internet of Things, 2016, pp. 1448–1451. doi: 10.1109/ICGCIoT.2015.7380695.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 International Journal of Computer Technology and Science

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


