Librosa and AWS Lambda
You’ve built an awesome machine learning model
You want to do some audio processing using Librosa
You’ve decided you want to use AWS Lambda functions for your backend.
Banging your head against the wall in frustration ← you are here
If you’re reading this, you probably find yourself in a similar situation to me where you’re trying to leverage the magic Librosa in an AWS Lambda function. You’re also most likely reading this because you really just can’t get everything to play nicely together. If that’s so, you have come to the right place. I am going to take you through all the steps that I took to get this working, there is, however, one caveat:
I had to downgrade my Python runtime to Python 3.7
The reason for this is that the Python 3.8 runtime runs on a pretty stripped back operating system, Amazon Linux 2, and doesn’t have some of the libraries you need to get the job done. A little more on this later.
If you need the latest and greatest version of Python and it’s a non-negotiable, you may have to find another solution. Or let me know how you managed to get around this! You could try the link just above, but the libraries eat up valuable storage space.
The limitations
Before we jump into the solutions, I want to jump into some of the limitations I came across to give some context to the issue.
As you may be aware, AWS Lambda functions, by design, have a very limited file storage capacity. The size of the unzipped code and all the dependencies has to be less than 250MB. This is a big issue if you’re trying to install Librosa, since the dependencies alone are over 300MB.
I tried various hacks to strip back the dependencies and try to get away with the most minimal dependencies possible. This ultimately didn’t work and even when I did temporarily get it running, it was painful having to make small changes and then reupload a large zip and try everything again. The AWS Elastic File System (EFS) was an absolute game-changer for this. EFS is a serverless file storage system built and managed by AWS to be able to share data across the AWS ecosystem. This means that we can provision an elastic file system to host our dependencies and then mount the file system to the Lambdas that need them.
First things first
Let’s get a brief overview of our plan of attack. What we need to do:
- Set up an elastic file system, with an access point
- Install our Python packages to the EFS via an EC2 instance
- Create lambda layers for our libraries
- Our Lambda function
- Mount our EFS to our Lambda
Prerequisites
Head over to the AWS Management Console. You can either use the default VPC or create a new one. For simplicity, I used the default VPC.
Next we need to create a security group. You can name the security group whatever you like, but make sure that you select the VPC you created before (or the default if you use that instead). The most important thing is to allow inbound traffic for TCP connections on port 2049 since this is the source port for working with AWS elastic file system (see documentation).
Creating the Elastic File System
In the AWS Management Console, the easiest way of accessing EFS is through the search bar at the top of the screen. When you’re there, click on Create File System. Give your file system a name, choose the same VPC as before and choose your availability zone.
Once this is created, you can choose your EFS. We need to add our security group to the subnets in the EFS, to do this click on the Network tab > Manage and then add the security group to each subnet.
Next, we need to create an access point for our file system. Give the access point a name. Select a path, I have chosen /access. For the POSIX user, set the User ID and Group ID both to 1000. Set the Owner user and group IDs to 1000 too. For the permissions, I used 0777.
IAM Roles
We now need to create some roles for our Lambda, so that it is able to access our file system. To do this, navigate to IAM Management in the AWS Management Console. Click on roles in the left-hand sidebar. Click on create role and choose Lambda as the use case and then click on Next and then select the following three permissions:
Then Next. Tags are optional. Click on Review and give your IAM Role a name and then create the role.
Dependencies
Now we have the EFS set up, we need to install all our Python dependencies. The way we go about doing this is by mounting the file system to an EC2 instance that is running the same version of Linux as our Lambda. We also need to make sure that we install the dependencies that are compiled for the correct version of Python. We get into major headache territory if we install NumPy built for Python 3.8 and then try to use that in our lambda with the Python 3.7 runtime.
To get started. Create an EC2 instance. Choose the Amazon Linux option. For the instance type, I would strongly recommend choosing the t2.medium or larger for more memory, you can always stop the instance as soon as you’re done with it. In the configure instance menu, make sure to use the same VPC as before. Click next and next again until you get to the screen for security groups. We need 2 inbound rules. A rule so we can SSH into the instance and a rule for the port 2049 for the file system:
Review and launch. Create a new key pair or use an existing key pair.
Once the instance is running, SSH into it and update with:
sudo yum update
Mounting EFS
Firstly, install the mount helper utility:
sudo yum -y install amazon-efs-utils
Create a new directory, with any name, I chose efs:
mkdir <dir-name>
Next, navigate to the EFS dashboard, click on your EFS, navigate to your access points and click on the access point you created. In the overview, click on Attach in the top right. A modal dialogue will appear with commands on how to mount the EFS. Copy the command for the EFS mount helper and then enter that into the terminal (make sure the directory at the end of the command matches the directory in your EC2 instance).
sudo mount -t efs -o tls,accesspoint=<your access point> <your efs>:/ <dir-name>
Where <your access point> is the id of the access point you created, <your efs> is the elastic filesystem you created and <dir-name> is the name you chose for the last step.
Your EFS should now be mounted to your EC2 instance. You can verify this by running the following command:
mount
Check python versions
Next, we need to check our Python and pip versions:
python3 — version
And
pip3 — version
You will most likely need to update pip.
pip3 install -U pip
Installing our packages
So now we are ready to get started with installing our dependencies. Change directory into the folder you created previously. Create a new folder, I’ll call mine pkgs(this just helps with organisation).
mkdir pkgs
And then we can run the following command to install Librosa and its peer dependencies into the newly created directory:
pip install -t <dir-name> librosa
As a side note, I also leveraged the EC2 instance and port-forwarding to install a package from one of my private repositories (for more information).
Not quite there yet
The C library libsndfile, which is required by Librosa, is not installed by default in either of the AWS Linux or AWS Linux 2 operating systems. This was probably the biggest blocker for using Librosa in an AWS Lambda function.
The obvious solution would be to install libsndfile and attach it to our Lambda functions as a layer. However, the Lambda Python3.8 runtime runs on Amazon Linux 2 (see this thread), which is a more stripped back version of Linux compared to AWS Linux. Even after installing the library, Amazon Linux 2 is missing other libraries that are required to locate our libsndfile library.
Why don’t we just install the missing libraries then? The issue with this is that these libraries add to the size of code and, when bundled with Ffmpeg, causes the bundle size to exceed the amount of combined storage permitted for a Lambda function and its layers. A far simpler and easier solution was to downgrade the Python runtime to Python3.7.
Layers
In addition to the libsndfile library, we also need Ffmpeg to process our audio. In order to get around this, we will leverage layers in AWS Lambda. Lambda layers allow you to share common code across multiple Lambda functions or for hosting your dependencies. There is still an overall size limitation, so, unfortunately, layers can’t be used to contain Librosa and its peer dependencies.
We will go ahead and set up two separate layers for our Ffmpeg and libsndfile libraries so that they can be managed and updated independently. I also create the layers outside of my application logic or business lambdas, since we don’t want to have to re-upload the layers every time we want to make changes to our code. For the sake of simplicity, I have created two GitHub repos (Ffmpeg and libsndfile) that can be used to get you started quickly. You will need to make sure you have configured the Serverless cli first (getting started with serverless), and have Docker installed.
Clone both repos:
git clone https://github.com/kingsleyzissou/lambda-libsndfile.gitcd lambda-libsndfile./build.shserverless deploy
Repeat the above steps for the Ffmpeg repository.
We’re now ready to create our Lambda function and attach these layers and our Elastic File System.
Creating the Lambda
For a larger project, I would recommend using a tool like Serverless for better and more efficient management of the code used for your project. It allows you to develop locally and then push the code later. Again, for the sake of brevity, I will create a simple Lambda function in the AWS Management Console. For an example of how to attach the layers and the EFS file system. For more information on this, you can read here and here.
In the AWS Management Console, go to the Lambda dashboard and create a new Lambda. Give the Lambda a name and make sure to select Python3.7 as the runtime.
Under advanced settings, add your VPC, subnets and the security group that were created in step 1. Click on Create Function.
Scroll down to the Layers section and click Add Layer
Choose custom layer and select your layer and version from the dropdown
Repeat the steps as above to add both of our custom layers to the Lambda function.
Attach the EFS
Next click on the Configuration tab in the Lambda function dashboard. Click on File Systems in the sidebar on the left and click on Add file system
For the next section, select your file system and access point and set the local mount path to /mnt/access/
We also need to update the memory allocation for the lambda and the timeout. To do this, we can click on the Configurations tab again, click on General configuration in the left-hand sidebar and then click on edit:
I have updated the memory to 1024MB and the timeout to 6 seconds.
Now, we’re not quite there yet, we can’t just import Librosa and our other packages just yet. We have to add the mount path to our system directory, here is a gist of the code (gist: https://gist.github.com/kingsleyzissou/eac047d99c4100f7b83a08cde306b007):
Create and execute the test for the Lambda and you should get a similar response:
Conclusion
In conclusion, AWS and the Serverless framework are some amazing tools to allow you to build cost-effective microservices to run as and when you need them. The idea of using AWS Lambda functions to handle and pre-process audio, with the help of Librosa, is incredibly appealing for minimising costs. However, it does come with its drawbacks and setting it up can be a bit tricky. Hopefully, this article helped walk you through the steps you need to take to get everything playing nicely together.