Introduction
The JAMstack architecture was introduced around five years ago and over the last two years, it has picked up a lot of steam. We at Axelerant are equally enthusiastic about it. Throughout this year, we have been experimenting with JAMstack by building a variety of products that have helped address various obstacles and deliver a pleasing experience.
The "A" in the JAMstack stands for APIs and is a crucial part of the architecture if you plan to move away from tightly coupled systems. These APIs are built using Lambda functions that are easier to maintain and cheaper to run than virtual machines. However, it isn't always possible to architect complex Event-Driven systems with just a bunch of Lambda functions; even though the prominent JAMstack cloud providers have significantly improved the developer experience, there are still a lot of missing pieces required to build such systems.
Architecture of the application using various AWS service
Contrary to that, AWS has a bunch of serverless services that play well together with JAMstack applications. To build a deeper understanding of such JAMstack applications powered by a more extensive serverless infrastructure, I have started experimenting beyond the common patterns.
In this blog, we will be discussing how we leveraged AWS Serverless services to go from an idea to implementation by using deep learning and event-driven architecture.
Backstory
Given the stay-at-home situation due to COVID-19, the number of podcasts I listen to has grown significantly. Amongst them, Soft Skills Engineering is my favorite. Besides the plethora of insight they share, I really like the funny opening they do in most of their episodes.
In that vein, I choose to combine my interest in JAMstack and serverless with my love for podcasts to build an application that would help me explore and understand the serverless ecosystem better.
Lifecycle
The application uses the RSS feed for the podcast and converts the funny bit in the introduction to the text. On the surface, the application might seem trivial; however, the internals is a lot more involved as it uses a bunch of serverless services and a bit of deep learning magic.
As you can see, it takes us several steps to go from a single podcast RSS feed file to a list of specific funny sentences. Let's start exploring how each service plays a role in this system.
Fetching bites
- The podcast premieres every week and based on that cadence, I have set up a Cloudwatch event that triggers once every Tuesday. In turn, the event invokes the GetFeed Lambda function, which fetches the podcast feed and pulls out an array of episodes after converting it from XML to JSON. The only data relevant to each episode that we need is its title and the URL to the audio file. We loop over this list and trigger multiple instances of the TrimMedia Lambda.
A Lambda triggering multiple instances of another Lambda
- Each TrimMedia function receives a title and a URL to an MP3 file. The funny bit is always within the first 30 seconds of each episode, so I trim the audio file based on that parameter using FFmpeg.
- To use FFmpeg, we either need to compile it from the source or download a binary and both of these tasks are slow, especially in the context of Lambda functions. To address this, I used a Lambda layer that was pre-populated with the binary. Once the episode is trimmed to 30 seconds, we put that clip into the Clips S3 bucket.
- Once the trimmed clip is stored in the Clips bucket, it triggers the s3:ObjectCreated events, which invoke the TranslateClip function to convert the audio clip into text. And since there are multiple such events triggered simultaneously due to parallel invocation of TrimMedia Lambda, this quickly becomes a bottleneck for our system.
- The TranslateClip Lambda doesn't do much besides starting a transcribe job using AWS Transcribe. However, Transcribe has a quota that allows us to run a limited number of jobs in parallel, and to address this, I changed the system to add an SQS queue in front of the Lambda that triggers Transcribe. The queue facilitates deferred execution, which prevents us from hitting the quota limits.
SQS controlling the back pressure and gradually triggering Lambda
- AWS Transcribe is where deep learning happens and the audio clip gets converted into text. Transcribe doesn't allow us to trigger a Lambda or any other service and after the clip is translated, it can only be stored in an S3 bucket. So in our case, it is stored in the Translations bucket, which triggers the StoreOnDynamo Lambda function.
- When the StoreOnDynamo has triggered the event argument that it receives, the details of the translated object in the Translations bucket and I use these details to load the file. Once we have the file's content, we massage it to pull out the specific sentence we need and store it on DynamoDB.
Accessing bites
To build the list of all the bites, I pulled the data from the DynamoDB table and rendered it on an HTML page. However, I didn't shy away from over-engineering this part of the system, even though most of the pieces are warranted.
- The HTML page is stored in the Website S3 bucket, which is pretty bare-bones except for the script that fetches all the bites.
Final deployed Application
- To access the bites from the Dynamo table, I have set up a GetBites Lambda function, which returns an array of all the bites. However, the Lambda function can't be directly invoked and for that, we have added an APIGateway REST endpoint in front of it.
- Finally, to improve the performance and security, I added Cloudfront, which adds HTTPS support, response compression, and caching.
Learn How We Built A Powerful Tool To Facilitate Our Team’s Retrospectives for teams to visualize data better and limit its access to relevant members.
Scaffolding Infrastructure
While building this application, I ended up provisioning a lot of services and reconfiguring them frequently as I figured my way around. This would have been a tiresome process had I chosen to go the AWS console route; instead, I used CDK with TypeScript. I have previously used the Serverless framework and written some Cloudformation templates, but it was always difficult to remember the various conventions and available plugins. I even considered going the Terraform route but learning it seemed like a hurdle to make quick progress. On the other hand, CDK is intuitive as it's just code and thanks to TypeScript, it made passing values like ARN and figuring out various configuration options a breeze.
Unlike the Serverless framework or Terraform, CDK is an Amazon-specific solution. However, if you plan to stick to AWS, you shouldn't look beyond unless CDK doesn't support the service you want to use. Regardless of what tool you use to provide the infrastructure, you can still run into problems if you don't have ample experience and understanding of architecting systems.
Some of the AWS community members maintain CDK Patterns, which has been an essential source of my learnings and an eye-opener in terms of what is possible with CDK and AWS services.
To address knowledge management challenges at Axelerant, we built a Blazingly Fast & Secure Knowledge Management System Using JAMstack.
Conclusion
As the title says, this is an over-engineered system and could be made a lot more simpler. However, it still checks many boxes, especially the ones related to lower cost, scalability, and faster delivery. I was able to build an Event-Driven system that includes deep learning in a matter of few hours and so far, I haven't been charged more than $1 even after testing it several times.
All the code for this project can be found on GitHub and you can access the site here.
Check out a news aggregator we built it for internet users in Kashmir that is fast to load, works offline, and notifies the user about the latest updates.
Bassam Ismail, Director of Digital Engineering
Away from work, he likes cooking with his wife, reading comic strips, or playing around with programming languages for fun.
Leave us a comment