JISAR

Journal of Information Systems Applied Research

Volume 15

V15 N3 Pages 24-34

Oct 2022


A Scalable Amazon Review Collection System


Jamie Woodall
University of North Carolina Wilmington
Wilmington, NC USA

Douglas Kline
University of North Carolina Wilmington
Wilmington, NC USA

Ron Vetter
University of North Carolina Wilmington
Wilmington, NC USA

Minoo Modaresnezhad
University of North Carolina Wilmington
Wilmington, NC USA

Abstract: Amazon product reviews can provide a rich source of data for natural language processing research. However, the available data sets have become dated and do not have more recently included review metadata. To support a related research project, we built a custom system for obtaining Amazon product reviews. We used this project to explore modern cloud-based services and practices. The system used a variety of cloud-based distributed services such as Azure Data Factory, Azure Functions, Azure Data Lake Storage, and a third party web scraping service. The system was used to obtain 17,962 product reviews and produce data sets in several formats. This paper fully describes the system, and offers lessons learned from the experience.

Download this article: JISAR - V15 N3 Page 24.pdf


Recommended Citation: Woodall, J., Kline, D., Vetter, R., Modaresnezhad, M., (2022). A Scalable Amazon Review Collection System. Journal of Information Systems Applied Research15(3) pp 24-34. http://JISAR.org/2022-3/ ISSN : 1946 - 1836. A preliminary version appears in The Proceedings of CONISAR 2021