The journey of creating a serverless service to scrape contents from a web page

Spencer Feng
5 min readApr 30, 2022

In this article, I will share my experience of using Serverless Framework to build a service to extract the HTML content of a web page and store it in an S3 bucket. This article covers the tech stack, unit tests and CI/CD.

It is a very simple serverless service that I use as an opportunity to experiment with some ideas which I would like to share with everyone.

Serverless Framework

The reasons why I choose Serverless Frameworks are:

Infrastructure as code

It allows us to specify all the AWS services to be used in a file called serverless.yml. The syntax is straightforward and is also very flexible.

  • Follow separation of concerns principle, we can put related configurations in separate files and reference them in the main configuration file. For example, I put the configuration for S3 bucket IAM and S3 bucket in two separate files.
  • We can use variables in the serverless.yml file to dynamically replace configuration values. For example, I use variables to reference other resources either defined in the main configuration file or separate configuration files.

--

--