Optimization Of Big Data Processing Using Distributed Computing In Cloud Environments

Authors

  • Rahul Dev Singh Institut Teknologi Vellore (VIT)
  • Vikram Kumar Gupta Institut Teknologi Vellore (VIT)
  • Priya Anjali Patel Institut Teknologi Vellore (VIT)

DOI:

https://doi.org/10.62951/ijcts.v1i2.58

Keywords:

Big data, Distributed computing, Cloud computing, Apache Hadoop, Apache Spark, Data processing optimization

Abstract

The growth of big data has driven the need for efficient data processing methods, especially in cloud computing environments. This study evaluates distributed computing frameworks like Apache Hadoop and Apache Spark for optimizing big data processing. By analyzing different configurations, we demonstrate how distributed systems can significantly reduce processing time and improve resource utilization, making them ideal for handling complex datasets in cloud environments.

References

Airbnb. (2018). Data science at Airbnb: A case study. Retrieved from https://medium.com/airbnb-engineering/data-science-at-airbnb-a-case-study-3e5f6c1f8e6a

Capital One. (2020). How Capital One uses machine learning to combat fraud. Retrieved from https://www.capitalone.com/tech/machine-learning-fraud/

Chen, M., Mao, S., & Liu, Y. (2019). Big data: A survey on applications and security issues. IEEE Access, 7, 2320-2340. https://doi.org/10.1109/ACCESS.2019.2891586

Ghoting, A., et al. (2016). A comparison of Hadoop and Spark for big data applications. In Proceedings of the IEEE International Conference on Cloud Computing Technology and Science.

Gomez-Uribe, C. A., & Hunt, N. (2015). The Netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems, 6(4), 1-19. https://doi.org/10.1145/2843948

International Data Corporation. (2020). Data age 2025: The evolution of data to life-critical. Retrieved from https://www.idc.com/getdoc.jsp?containerId=prUS45751220

Kumar, S., et al. (2019). Big data in healthcare: A review of the applications and challenges. Journal of Healthcare Engineering, 2019. https://doi.org/10.1155/2019/8787602

Marz, N., & Warren, J. (2015). Big data: Principles and best practices of scalable real-time data systems. Manning Publications.

Shi, W., et al. (2016). Edge computing: A new frontier for computing. IEEE Internet of Things Journal, 3(5), 637-646. https://doi.org/10.1109/JIOT.2016.2564339

Zaharia, M., et al. (2016). Spark: The definitive guide: Big data processing made simple. O'Reilly Media.

Zhang, Y., et al. (2020). Performance evaluation of Hadoop and Spark for big data processing. Journal of Cloud Computing: Advances, Systems and Applications, 9(1), 1-15. https://doi.org/10.1186/s13677-020-00183-6

Downloads

Published

2024-04-30

How to Cite

Rahul Dev Singh, Vikram Kumar Gupta, & Priya Anjali Patel. (2024). Optimization Of Big Data Processing Using Distributed Computing In Cloud Environments. International Journal of Computer Technology and Science, 1(2), 01–07. https://doi.org/10.62951/ijcts.v1i2.58

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.