EP 06: Understanding Serverless with Yan Cui

About This Episode

Host David McKenney engages in a thought-provoking conversation with Yan Cui, an expert in AWS Lambda and serverless architecture. The journey kicks off with an insightful introduction to Yan Cui and his expertise in navigating the evolving landscape of AWS Lambda. Dive into the technical advancements and the convergence of containers with serverless architectures, exploring the benefits and challenges faced by developers. Here’s what this episode covers:

Critical considerations: Switching costs, portability, and potential lock-in in serverless environments.
Complexities and opportunities in pursuing standards and portability in the serverless realm.
Yan Cui shares expertise on cost estimation and optimization in serverless, explaining the FinOps approach for developers.
Estimating costs, optimizing functions, and gaining a comprehensive understanding of the financial aspects of serverless architectures.
The decision-making process for repatriation.
Insights into when it becomes cost-effective to transition workloads, considering factors like scale, consistency, and overall organizational expertise.
Yan’s guidance on navigating the serverless learning curve for developers.
The current state of education, exposure, and learning experiences in the rapidly advancing world of cloud technologies.

Don’t miss this engaging episode packed with valuable insights, industry trends, and expert advice, offering a comprehensive guide for developers navigating the complexities of serverless architectures.

Know the Guests

Yan Cui

Yan Cui is a distinguished figure in the cloud, leveraging his expertise to empower companies in accelerating processes and optimizing costs through effective serverless technology adoption. With a history of building on AWS since 2010, Yan emerged as a pioneer in running serverless in production from 2016 onwards, earning recognition as one of the inaugural AWS Serverless Heroes in 2018. As a co-author of "Serverless Architectures on AWS, 2nd Edition" by Manning, he imparts comprehensive insights into serverless solutions. Yan is deeply committed to educating clients, sharing valuable advice, and preventing costly errors. With over 150 articles on AWS and serverless, he has become an influential resource for developers globally. Yan extends his impact through online courses and workshops, reaching thousands of learners, and has been a featured speaker at over 150 conferences worldwide, solidifying his position as a leading authority in the cloud technology community. For those seeking to enhance their serverless skills, Yan Cui's courses and workshops present an invaluable opportunity to learn from a true expert in the field. Read more on Yan’s website.

Know Your Host

David McKenny

Vice President of Public Cloud Products at TierPoint

David McKenney is the Vice President of Public Cloud Products at TierPoint. TierPoint is a leading provider of secure, connected IT platform solutions that power the digital transformation of thousands of clients, from the public to private sectors, from small businesses to Fortune 500 enterprises.

Transcript Table of Content

00:00 - Intro to Yan Cui
05:26 - Navigating AWS Lambda and the Technical Advancements
10:58 - Understanding Containers and Convergence
18:12 - Navigating the Switching Costs, Portability and Lock-In in Serverless Environments
33:36 - Exploring the Potential for Standards in Serverless and the Search for Portability
38:14 - Cost Estimation and Optimization in Serverless: Breaking Down the FinOps Approach
48:00 - When to Repatriate: Navigating the Decision to Move from Serverless to Containers
53:14 – Outro: Navigating the Serverless Learning Curve for Developers

Transcript

(0:00) - Intro to Yan Cui

David McKenney: Hello everyone, and welcome to this episode of Cloud Currents. I'm Dave McKenney, and with me today I've got Yan Cui. How you doing today, Yan?

Yan Cui: Hey, David. Thank you for having me here.

David McKenney: Great. So I'm excited because the topic today is about serverless and it's an area that I just quite frankly, don't work with a lot as of yet, and so I know I'm going to learn a lot.

Hopefully, I'll sound like I know a lot, but this is why you're here. So before we get into the trenches about serverless, let's talk a bit about how you got here. So one of the few things I, I am aware of is that serverless really hasn't been around mainstream-wise for all that long. So I'm assuming you probably didn't get started right away with it.

Maybe you didn't study it outright. But tell us a little bit about your career leading up, maybe the early days leading up to serverless and how you got into it.

Yan Cui: Yeah, sure. I mean, the, I guess I, my first professional job was working in the investment banking. Actually, you know, I was working for Credit Swisse back when the reputation was still intact.

And yeah, so I was so I was working with you know, on-premises stuff. And you know, no, me and my team, we used to. Planned for months ahead of time for when we're going to get a new server into our rack. And what the, what wonderful things we're going to install on that server.

Yan Cui: And then 2009, I got a job working in gaming. So, you know, back when everyone was playing games on Facebook. So I joined this games company, building Facebook games. And that's when I first got into the cloud with AWS. And so we went from, so at least I went from you know, spending months planning for a new server to just clicking a button, set up auto scaling, and suddenly, five minutes later, I've got a new server running with my, you know, with whatever stuff that I've got deployed and then yeah, so that was a big change know game changer for me. And, and gradually 2014, you've got the containers came out and so I was playing around with the containers for a bit. And yeah, that make the things a lot even easier in terms of having the

consistent, the configuration for your execution environment for your code. So deployment and a lot of things that becomes a lot easier, more manageable, more rep, well, more I guess, repeatable. And then I guess 2014, that's when they announced the Lambda. And with Lambda, this at the time, you know, when it first announced it, it was it was fairly limited. It was just, okay, you drop a file into S3 and trigger this Lambda function thing to do some processing. There's not a lot you could do at, you know, back then.

And then they added support for Kinesis as well with Lambda. So you can do some real-time data streaming and the processing. It was really 2015 RE:invent, they really opened the floodgate with the integration between API Gateway and Lambda. So you can actually build entire REST API using, you know, API Gateway and Lambda both of which are considered the serverless components.

And I guess the phrase serverless really cover, well, I guess, It was still quite early in those days people were just starting to come up with a name for this new paradigm where you don't, you don't think about managing servers and configuring them and configuring scaling groups and things like that.

So nowadays, I guess the way you talk about serverless, we are really talking about. Any number of technologies where you don't have to manage the underlying infrastructure and ideally you only pay for them when you actually use them. So you're not paying for resources that are sitting there and not doing anything.

So, you know, with API Gateway and Lambda and DynamDB and lots of services like that, including S3. We can all call them serverless because we only pay for requests for when, you know, we actually use those resources to do something as opposed to paying for uptime by the hour, even if nothing is actually running.

So I guess in terms of you know, how long it's been around you know, it's. Probably been around, at least Lambda's been around as long as the containers has. But it probably just hasn't had the, the, the same mainstream adoption as well, first, no, for well as, as containers. But having said that, there's been a lot of company who's been on the whole serverless seen for a long time.

People like iRobot and people like Basel and a few other company has been you know, pretty serverless. First from very early on. And myself, I've been involved with the serverless I guess now going to serverless around 2016. And that was and yes, ever since then, I've been pretty much focused on the serverless space because I mean, for me, even though when I went to the cloud from on premises stuff, I was still spending about 60, 70 percent of my time, just Setting up and dealing with infrastructure, you know, there's a lot of things you've got to configure the networking, patching the machine images and configuring and updating scaling groups thresholds and storing agents on the boxes.

And that's just still a lot of things that you've got to, you've got to do. And so when the Lambda becomes something that's more feasible, I can use it to build entire applications and I can just focus more on. You know, what my code should do and what my customers want as opposed to thinking about infrastructure. It was yeah, it became, well, it became a no brainer. And nowadays I can build entire applications on my own, you know, fraction of the time and cost compared to what it used to take a whole team to do, you know, just a few years ago. So yeah, I mean, serverless for me has been a massive game changer for me, both personally, but also professionally.

05:26 - Navigating AWS Lambda and the Technical Advancements

David McKenney: Yeah. So I, I'm not going to look past the fact that, and I want to ask you a bunch of questions about how you made the transition from the financial industry to gaming. That's fantastic. Probably a much different level of stress, but so it sounds like you, so you started shortly after Lambda hit the street.

So 2016, so were there problems that you just sort of, that you noticed that things like Lambda could help solve, or did the technology just kind of speak to you in the sense that you mentioned how you were waiting for server to arrive. You got into the cloud. You could just start spinning up things at will as you needed them a lot less friction there and seemingly serverless adds only to that are that frictionless deployment being able to just to work on your code.

Have it executed in some backend compute. So was it the technology that really grabbed you or were there problems that you had that you said, all right, this is, this is the answer. I'm going to go work on this now.

Yan Cui: Yeah, I guess I mean, once you know, once you've got more integrations within AWS for Lambda, at least that's when I guess to the 2016, that's when it really happened.

I played around with Lambda back in 2015 as well but. Now, back then, a lot of the tooling just wasn't quite there in terms of deployment frameworks there's no, there wasn't any one that works particularly well after the box but also just in terms of use case, I mean, the, the, the power of this, the serverless compute, it really comes in what information can you send to it and how easy you can integrate, how easily you can integrate with other things within your AWS environment.

So with, you know, with you know, working with containers and working with with EC2, you still have this problem of, okay, yes, it's a lot easier compared to, you know, dealing with your own data centers and things like that and it's a lot faster, more scalable, but there's still a lot of things you got to do if you just think about You know, the networking side of things alone there's just so much to networking and it's, you know, it takes almost a degree to just understand how all the different networking controls you have to, to deal with within AWS.

And then there's also the, you know, trying to do any kind of a reasonable security posture. There's just also a lot of things you got to do as well from the virtual machines itself to your code, your dependencies. And when it comes to containers, the images that you, you know, you base on, you still have to patch the machines that you run your container images on depending on whether or not you're running ECS or Fargate.

Obviously Fargate is a fairly new thing, well, relatively. So there's, you know, a lot of people that was running containers, at least when I was doing a lot of the containers stuff was you're building, you know, on top of one level of infrastructure, and then you're going to build on top of that, but you still have to manage the EC2 instances under the hood, which means that there's still a lot of things you've got to worry about and deal with.

And when you are, you know, working in a relatively small company where you can't just make it somebody else's problem, that means as a feature developer, I have to work on everything that Business wants, and then everything else that we need in terms of actually having something to run our code, which nobody really cares until it goes wrong.

It's one of those things that me and my buddies always talk about how as a backend developers you know, you never get you know, say no, no, one, no one ever pats you on the back. Say job was done when you're working. Once you know, you only ever hear from people when something's not working.

And restructure dilemma. Yeah, infrastructure is kind of like that, where you kind of just expect them to always work except when they don't. So that's the kind of problem that you know, things on Lambda really solves is just taking care of all of that scaling having that good redundancy and Resiliency out of the box you know, we first, you know, when I was working with for EC2, I have to, another thing you often have to do is before you launch a new game or product, you have to do this capacity planning, you know, work out how many requests per second can we handle really roughly, and then you have to work out, okay, how many AZs do we need to play you know have a presence in and how much machines we need to have what's the scaling threshold how much do we need to, how much headroom do we need to give ourselves so that when there's a spike of traffic you We're not going to see a huge degradation performance because we are CPU saturated and whatnot.

So there's still a lot of things you really have to think about, plan and and pay for when, even when the users don't come as you plan. And so, you know, with serverless and the pay per user pricing model and the built in scalability, you just get a lot of that out of the box. You just don't have to think, there's just a lot less things you have to be responsible for and have to think about.

And for small applications that don't have I don't know, millions of users you know, you probably also find that your application is going to be a lot cheaper to operate than run because you only pay for the request when they happen as opposed to paying for uptime and having to pay for having, you know, multiple servers in every single availability zone just so that if one AZ goes down, you don't lose the entire, you don't lose, you don't, you don't lose the entire application.

And the fact that the security is going to be a lot easier when you don't have to be responsible for the security of the operating system, which is a whole class of attack vector that's taken off the plate from your response, from your plate. And and that's why, in fact, I think security is one of the reasons why a lot of the I guess, financial companies financial institutions like a Capital One, FINRA and many others have actually gone really big into serverless because they see that as a.

much easier way for them to meet all the security requirements that they have in the regulated requirements that they are in then to try to meet everything themselves and try to build all the infrastructures and manage all of that infrastructure themselves. So yeah, so there's just so many different ways, so many reasons why I think serverless just makes our lives a lot easier and make you more productive as a developer and as a business.

10:58 – Understanding Containers and Convergence

David McKenney: Yeah, you've already answered another question that I, I, I had in the back of my head there about it plays into the security, but the whole attack surface comment and, you know, whether things are online sitting idle or only enacted when they're ready to either be executed.

That's an interesting I thought that was kind of paradigm to wrap our head around as so you mentioned a couple of times now we've been talking about Lambda and AWS.

We know there's others like Microsoft and Google, but. You're right, 2014 was when they announced it, and I remember. Being at reInvent when it was announced and I, and I was thinking in the audience sitting there and as they talked about Lambda, like trying to re explain it to myself in my head, like it was a great internal monologue.

Trust me, it was fantastic. And the way I sort of dumbed it down and maybe this is the way they even talked about it was that. Look, you're using containers. We're effectively allowing you just to take the code that you've written and you no longer even have to manage the container or the back end compute.

If you use these particular types of languages or frameworks, you can simply submit to have your code executed. And I remember thinking, man, I'm still trying to get Containers all wrapped up. Did we even, at that time, we hadn't even really decided that Kubernetes kind of won. And so anyway, I it was interesting to see that we've already made that leap in, I guess, as a, as a method of timelines here, we.

We talk about how your virtual machines were the natural progression of bare metal and then maybe even containers have been the natural progression of virtual machines. Is it right to say that serverless is a natural progression of containers or really were they a more of a fork in the road that happened at the same time?

Because it seems like a container can do what a serverless can do, but serverless is like as you've pointed out. Removing and abstracting a lot of that back end compute. But do you think of the two as a progression of one or the other? Or do you think of them as two separate technologies really altogether?

Yan Cui: I consider them as a two separate two different parallel tracks. But interestingly, they're also kind of converging in a way, because if you look at things like a Google Cloud Run you know, it allows you to, to, you know, to have triggers, have events trigger some. Compute tasks to be run in your in your in your container cluster.

And equally you have nowadays you have things like Fargate which allows you to kind of run, no, they call it the serverless containers, which is probably not my favorite way of describing it. But you also have on the other hand, you have, now, Lambda functions that are now able to allow you to instead of uploading your code into like a zip file, you can zip your, you can deploy your code as a container image.

Excuse me. So you can deploy your code to Lambda as a container image. And so it's able to, you know, load a container image from ECR and you can have something that's up to a 10 gigabytes instead of you know, 250 meg in the zip file. And you're going to have a potentially something that's you know, we've learned the function that can be more long running as opposed to just the ephemeral only runs when there is when there's a request. So I think eventually the two technologies might just converge in terms of what you can do and you know, you can have a more ephemeral container environments and then you can have a more long

running source service for, lambda functions. That's not limited by. So right now you're limited by 15 minutes of execution time, but maybe they will relax that.

And in terms of the concurrency model right now with containers, you've got your application basically handles the concurrency in your application, whereas with Lambda, the concurrency is handled at the platform level, where Now, Lambda functions would create all of this execution environments. Every single one of them is going to handle one concurrent request at a time.

So essentially in your code, there's no, you know, the application, there's no concurrency. You're not handling concurrent requests but there's also other fast or function as a service platforms that are. That are that are based on containers, which do allow you to have concurrency in the process so that you have, you know, your application can handle multiple requests at the same time so that you get more efficiency in terms of cost and also making use of those idle CPUs when you're making an IO call to some database or some other third party API and you're waiting for a response.

So lambda may do something like that in the future as well. So that the for customers who has a more. A more high throughput environment where they don't want, you know, they want to make more use of the the Lambda execution environment and the handle concurrency themselves and, you know, potentially paying less for for the Lambda functions, then yeah, the, you know, the two may even become closer and closer in terms of the use cases you have, and also what's your, I guess, development story what experience make look like.

So I do think that, I think. So I don't think there are two separate developments or different paradigms, but certainly the technologies are converging in terms of, I guess, characteristics.

David McKenney: Yeah, that's interesting about the concurrency model, because I'm kind of thinking if I was to deploy something in serverless, I'm probably deploying something that's more expecting really just an input and output, like a state, if you will, where It's, it's got a job to do, and it's that one job.

If, if you're going to probably do it well, otherwise he'll just be back to the days of creating some very monolithic functions, which is probably what I would do. I would probably do it all wrong, probably put everything all in one function. But we've kind of danced around defining serverless as we talk about containers, what each one maybe does that the other one doesn't do.

But if you had to summarize what we've talked to you about now, maybe anything additionally. Defining serverless to somebody, even in using containers as examples, what would be your way of describing serverless to somebody who's new to it?

Yan Cui: So, you know, if I, if we think of containers as a way to abstract away the machine then the serverless is a way for you to to, to do away with the machine altogether and just think about your actual application.

So you know, the, instead of still having something that wraps around your application that you, you know, you deploy application to run something on top of something infrastructure. You just think about this application and then, you know, that's it. You, you upload it to, to the, to the cloud and tell the clouds okay, run this when something happens.

So, no, I guess yeah, that's, that's kind of the model I'm thinking in terms of describing it to a developer but in terms of describing the technology itself and how to classify something as serverless, I tend to classify as technologies that where you don't have to think about, you don't have to worry about provisioning and managing servers. And you don't have to worry about how the scaling is going to happen. And you, you only pay for the technology, well, the The resources when they run, and so you focus on just your application and let the clouds deal with everything else, including the security, including the the provisioning of the underlying infrastructure, as well as the scaling of those infrastructure pieces.

18:12 - Navigating the Switching Costs, Portability and Lock-In in Serverless Environments

David McKenney: So do you find that most serverless platforms are they? From a compute perspective, are they backed by container runtimes or are they, are they using any sort of bare metal or, or virtualization technology?

Yan Cui: Yeah, so I mean, the Wayne talk about lamb functions or some, any other, so I guess functions of service type of things.

They all kind of run on servers in the same way that you know ultimately we still need to have a machine that runs on servers. That's not, that's no secret. And I think every time I say serverless, someone always reminds me of that. Which is true for me, which is true, but it's also kind of missing the point because so is Wi Fi.

Wi Fi has got wires. I mean, it's got a, that's why I wouldn't doubt a wire is going to a cable going to a router some way just that when you're using those Wi Fi connections, you don't think about those wires because it's not your problem, right? And when it comes to serverless or specifically functions as a service you know, I think Microsoft Azure functions are running on containers.

They actually expose the container to you so that you can actually access and I think you might be able to connect to the container instance itself. I'm pretty sure Google Cloud Functions also runs on containers as well. But the Lambda doesn't. Lambda runs on the MicroVM technology which AWS has open sourced called Firecracker.

It's the same kind of virtualization technology that is also underpins Fargate as well. And so it's so it doesn't run on containers and which you know, which allows them to do some optimizations that are just not possible. And it's also why, when you think about the Lambda functions and cold start performance there's been a lot of benchmarks in the past, and you will find that the Lambda functions, the cold start would tend to be a lot better compared to Azure functions and Google Cloud functions, because Lambda is the best way to do it.

You know, service is able to do a lot of optimization that is just not possible if you're tied to open standards like like docker and you have to rely on the existing container infrastructure but but ultimately those micro VMs firecracker execution environments are still created on the bare metal instances.

David McKenney: Yeah, and it makes, it does make sense. I know, and I think you've put it perfectly. I mean, it, I gather that it doesn't really require a container, but it can, as far as the compute side, you can do it with really any compute that makes sense, but the container is well suited for it certainly lightweight, especially since you're submitting code, and the container might be out there running, but I assume it would have to be assigned to whatever code you're.

Yan Cui: Yeah, there's also other things like you know, Cloudflare has got these workers I don't know if he's still, they're still running on the V8 V8 workers because there, there was some security, I think issues that was that was, that was raised by some researchers.

I forgot, I forgot the exact details, but I think it was something to do with the fact that they were using C groups and someone was able to find out, find a way to, you know, Basically escape the C group set boundary. And so yeah, potentially able to access maybe memory or, or the data that's belongs to somebody else's functions.

Yeah, exactly. And that's one of the reasons actually why Lambda no, I guess it's not Lambda, but AWS teams has created firecrackers so that they have a stronger. Isolation and in terms of that you know, execution environment then, then just using cgroups and that's one of the security reasons why they created the firecracker technology.

But even then, as far as I know, they still don't do multi tenancy. So if a bare metal instance is running your lambda functions. Then they won't allocate another customer's lamb functions to the same bare metal instance. So so I, I'm guessing it's just for precautions so that even if they have confidence in the, in the security isolation that you get with Right, it's the unknowns and the peace of mind.

Yeah, exactly. So just a peace of mind that they don't want someone else something to happen so that the potentially customers can can, can break out and the assets somebody else is at.

David McKenney: That's interesting. I need to look more into Firecracker for sure. I was aware of it, but I have not done much research with it.

So as you're talking here, it's dawned on me a bit that As we compare, again, I feel like we're going to compare serverless to containers throughout the rest of the discussion, so like I said, this is what it is, if it helps me, but the container, a big allure of containers over virtual machines is that that promised portability that we really didn't get to quite so maybe what we dreamed of, right?

The whole dream, drag and drop of a VM from one cloud to another, right? Not really there in practice compared to what maybe a container can do, but a container is wrapped with libraries, run time, all those things that need to execute that environment. And as you abstract this with serverless, it makes me think, are do you lose some of that portability?

And are you then, and I hate using the term, but are you, are you a little bit more into a lock in state with the platform because you are really reliant on what the platform, whether we talk a lot about Amazon, Google, Microsoft, you're kind of coding to what they choose and then how they run it, right, because you're not bringing your environment.

You're only bringing your code. So is there a bit of lock in? And how does portability work with serverless per se? Yeah.

Yan Cui: Yeah, so so Greg Horp who actually works for AWS he did a really good talk at reInvent this year. I forgot the, the, the, the, the session number, but I think it's called something like do modern cloud lock you in?

So he did a really good talk specifically around the lock in and And I 100 percent agree with what he said. So basically, you know, there's two things. Firstly, you're never really truly locked in because, you know, you have, you can always revive application. So what we're talking about really is is the cost of moving or switching.

So when you're talking about, you know, rather than the lock in being a binary yes or no, when you're thinking about the cost, you can think about in terms of a degrees of, you know, how expensive that the switching cost really is. And it's not, and when you think about, you know, technologies in terms of switching cost, it's not just going to be about the vendor itself, there's also the product.

So you are You know, in the same way, you have a switching cost to containers. You're using Docker, you're not using rocket. So what if Docker goes away tomorrow or they do something drastic in terms of changing the the open, the open source licensing, maybe they start charging you like Terraform or something else.

Does that mean that you now have to re architect your container? Images so that you use a rocket instead of of docker and same goes, same goes to other tools that you may be using you know, for you within your containers environment. So there's also a switching cost involved in terms of the decision that, you know, you may use containers and same goes to, you know, potentially also switching costs involved with you know, working with, we've certain libraries and frameworks or programming languages. I've worked with companies in the past that couldn't, you know, spend, have to spend years to try to move away from a NET web framework that is, you know, that has, that has been abandoned but because they're reading their code in such a way that all of their business logic is written into the the request handler.

the, you know, they spend a lot of time to rewrite a lot of the code base to, you know, to, so that so that they can move away to a different web framework. So there's also switching costs involved with that as well. And not to forget a lot of enterprises go through this cycle for every three years.

Let's rewrite everything in a different language. New CTO comes in now, let's do everything in Java. Three years later, CTO goes. new guy comes in, okay, now NET is a new thing. So that's also that.

David McKenney: Well, that's a good, so on that topic specifically, are there certain languages that lend themselves a little more agreeable to serverless?

You got the new kids on the block like Python and Ruby, but you got the old school guys, Java, C sharp, and I'm sure there's perks of either whether it's less code, more code, quicker execution time, or but those just seem kind of. Trivial unless you're building something at scale, but are there certain ones that have an upper hand when it comes to learning serverless?

Yan Cui: Yeah can I just let me finish the last question as I was getting to the punchline. So yeah, that's so yeah, so, you know, that's the, that's the, you know, the first thing you should consider in terms of instead of, instead of locking being a binary thing, think about sufficient cost instead but also, On the other side, that's a risk that everyone should be thinking about in terms of how likely do we would we make a switching cost well, switching decision later that's going to incur that cost and how much effort it's going to take and how much cost it's going to cost us in terms of building that abstractions up front.

So that we can guard against some of those costs that's going to come later. But then the other side of that, the risk is a reward because there's no reward in life without taking some risk. Every time you put food in our mouth, there's a chance that we may get food poisoning, but doesn't mean that we stop eating.

Right. And so, you know, when you are buying it into a cloud and working with your cloud, the partner and you know, trying to get the most value from that relationship by using AWS for all the things that you can do, you can give, you know, do for you. Then you're also going to get maximum utility and value from that partnership.

And we've server technologies. You're really, you know, going really deep into that, the relationship and trusting the cloud and using the cloud for everything it has to offer, which means that you're going to get a maximum amount of value in return so that you can go to market a lot faster, everything's going to be a lot easier, potentially more scalable, cheaper, and, more secure as a result. But of course, then you know, you have to worry about, okay, what if we make a decision later to switch a different cloud provider or switch to our code from Lambda to containers? What are the different things we can do to come kind of help us mitigate some of that cost later?

And there are patterns you can use. Things like Hasago architectures is a really good coding pattern for you to write your code in such a way that you know, you can easily, your, your domain logic is is, is encapsulated away into some domain modules so that you can easily migrate your codes by just changing the adapter so that instead of hand using the lambdas no event, the context Directly, you would translate them to domain objects, and then that's going to call, you're going to use that to call your domain modules and libraries so that you know, you have that translation layer.

And when you need to switch your code from running in Lambda to running in a container, you just change the adapter. So instead of working with Lambdas to event and context, now you're working with I know ExpressJS is a request object handler and response handler objects, and you have that translation layer.

The adapter layer that translate things from the, from the, from the web frame, what you're using to your, to the domain object that you're, that you're working with in your domain logic. So there are lots of different things you could do to help mitigate that switching costs. But of course, all of those means that you have to do more work upfront to help mitigate some of that cost.

So it becomes a question of, okay, how likely are we going to do that? So how much effort should we put upfront to help mitigate that cost later? But yeah, it's, it's something that it should be thinking about but it should be thinking about it in the right context of framing, as opposed to just, oh, locking is bad, the locking is expensive.

So we should have never used the native services. So to your other question about which languages is going to work better with Lambda. It really depends on what it is you're trying to do. The main impact the language is going to have on your Lambda functions performance is cold start and, with Java and NET, because of the fact that you've got this virtual machine that needs to be run, needs to be sort of bootstrapped and initialized at runtime, and that takes a lot more time. And with the Lambda, you've got this lots of really small independent execution environments for your application, which means that the initialization has to happen every single time the new environment is created.

Which is very frequently or can be very frequently, then so you are more likely to see cold starts that are in the order of several seconds compared to maybe a few hundred milliseconds for, say, a Node. js function or for a function within the Rust or Golang, which is compiled to native. So for that AWS has done a lot of work in the recent years to help improve things.

So for one, you have the provisioning currency, which is a. Mechanism mechanism for you to say, you know what? I'm happy to, instead of just having tab well on demand execution environments gets created. When a user request comes in I'm happy to pay a certain amount of money to, for to so that they will keep a certain number of those execution environments around.

all the time. So to serve my baseline traffic, essentially, I'll never see cold starts. And so my user's experience is preserved until when there's like a sudden unexpected spike in traffic. So you can do that. But then for Java and specifically, there's also a new feature, well, relatively new, it was announced in the re inventor 2022 called the Snap Start.

Which is a way for you to say, okay AWS is going to, it only works for Java but what AWS is going to do in that case is that when you deploy a code change to your Lambda function what it's going to do is as part of the deployment process, it's going to create a new execution environment, and it's going to then, you know, load pull up the, to drive the JVM, it's going to you know, the class loader is going to run and load all your, well, a lot of your classes, except for anything that you're doing with a reflection and things like that, a runtime.

But then it's going to then take a snapshot of the memory and the disk space so that it's going to cache those. And so that means that when the request actually comes and the Lambda functions is going to create a, create a new execution environment on demand, it's going to, instead of creating booting up the JVM from scratch, it's going to load the memory snapshot instead and the disk snapshot so that you've got something that can boot up a lot faster because most of the, work has been done ahead of time and there's been, you know, saved into this snapshot.

So certain things won't work things like random you know, you have to do special handling for random and any I guess a lot of the cryptographic libraries that requires, like, a seed of value that gets initialized when you, when the class loader runs. So things like that you have to pay attention to.

But otherwise you know, you can see people report the costar time for Java functions going from a few seconds to a few hundred milliseconds. So you can make a big difference in terms of the performance of Java functions, which is important when you have a user facing Lambda functions, typically, you know, APIs that are used by the front end.

So if you have a costar that's a few seconds, that means the user is going to know, maybe just 1 percent of the time. That means that 1 percent of the request user makes is going to take a few seconds to respond. And so it's not going to be very good for user experience, but if you're able to make your call starts fast enough so that even when they do happen, it's not a big impact on your user experience, then the, yeah, then the happy days.

There's also the ability to use a grilled VM for Java or a NET native for NET applications so that, you know, there are other solutions that are perhaps more Heavyweight, because there's a lot more work developer team, well, development team have to do rather than just switching on this setting.

You have to, you know, compile your application in a certain way to compile them to native. But again, if you're doing user facing API stuff, that's something that you really have to think about when you're using Java and NET. But if you're just doing background data processing, You know, who cares if there's like a five second cold start occasionally, because that's what happened in the background anyway.

So no user is actually waiting for that. And yeah, so those are some of the things you have to think about in terms of choosing the language mostly is around cold start and, and potentially memory use yeah, so mostly, I think mostly is, you know, is the most important thing to think about is the cold start performance for your language.

33:36 - Exploring the Potential for Standards in Serverless and the Search for Portability

David McKenney: So outside of public cloud environments, are there opportunities to? To sort of test and run containers in say a premises cloud or software solution sets.

Yan Cui: So there are a few frameworks out there, things like IBM OpenWhisk and there's also Kubelets and a few other frameworks that allows you to essentially run, well, Give your developers the same program, so you can event based the program model that the Lambda and other functional service things that gives you by running them on top of your existing Kubernetes cluster.

So, you know, you, you know, if for an on-premises environment where you've got Kubernetes going and you've got this containers environment, and you've got developers that really likes the. programming model and the program model that the Lambda gives you and the ability to trigger your function to trigger your code to run without having to, you know, run these containers 24 7 you can use things like OpenWhisk and Kubernetes and there's quite a few other ones like that so that you can actually bring those same developer, well, similar developer experience to your own on-premises setup.

But no, that is just mostly about mirroring Mimicking the developer experience and deployment experience but the, but you still have to, well, if somebody in your organization is still going to be responsible for the runtime behavior of the system in terms of, okay, you know, it's all, it's all one again, being able to. You know, create this container environment to run your functions for every request. But then you know, you're still running on machines. So that you know, you start to think about, okay, how to

efficiently allocate those machines to the right to sorry, those are containers to the right underlying machines.

And then when you run out of space on your physical machines, what do you do at that point? No, you don't have the same. Well, I don't want to say limitless, but a very deep a huge amount of compute resources that, say, a public cloud can, can offer you so, and also when something goes wrong and machines need to be replaced, or there's networking issues and, and things like that, also security.

Now you're, you're someone in your organization is still going to be responsible for maintaining and securing and managing the underlying infrastructure for your on premises environment. So yes, you can kind of, you can kind of mimic that developer experience on premises, but. No, you know, you are still responsible for the runtime behavior and security and everything else.

David McKenney: Yeah, I mean, the parallels to like containers is is interesting. So, and maybe this is maybe this isn't a good example here, but look at Amazon and Microsoft. Both of them had their own container services, ECS on. on Amazon and then was Microsoft's ACI Azure Container Instance. But once Kubernetes rose to a level of adoption, they both released their Kubernetes services.

Do you see a level of standards happening to serverless that, that, like maybe the big three, like Google, Amazon, and Microsoft agree to in the serverless that will help some of the portability here? Or? Is it still too early to tell?

Yan Cui: Probably too early to tell. And also don't think that is likely to happen.

And I mean, certainly, I, you know, if I was the, someone from the Lambda team I would think that you know, we've got The reason why we're able to offer our customers the level of performance and experience and developer experience is because we're able to customize the execution environment because we can do things that that, you know, that we wouldn't be able to do if we were to use containers.

There may be spaces where you can you can, you can form some standardization specifically around invocation event payloads sort of structure. So there's, there's been like a move for things like something called the cloud events which is kind of, so try to push towards that as some kind of a standard well, standards around the invocation event itself which even within AWS, it's just all over the, it's all over the place.

You know, depending on what triggers a Lambda function, the payload shape is going to look something different. But in terms of the actual platform itself, I, I don't know, I struggle to see what or how. Why AWS, someone like AWS would do that when the, that's, that's kind of the opposite of what their customer is asking for. The customer is asking for, okay, better performance better scalability you know, faster co stars and and everything else. And so instead of you know, you see, so trying to meet some standards and potentially giving your customers a worse experience I think at least I want AWS to just do their own thing and try to optimize their platform in any way possible.

38:14 - Cost Estimation and Optimization in Serverless: Breaking Down the FinOps Approach

David McKenney: That's a great answer. I, that, I mean, it helps clear it up for me in the sense that why, why hasn't it happened yet, but that's it, it makes a lot of sense. So you mentioned cost a couple of times in there in a couple of these responses, and it did make me want to ask about how, how one goes about estimating the, the cost of a potential deployment because If it's anything like trying to spec DNS usage and how many millions of requests you're going to see, or S3 transactions, how many puts or gets are you going to have, like, that's, that's kind of a tough thing to estimate, and it seems like this is really how you're, you're estimating the cost.

I'm sure there's still some perpetual things running or arguably perpetual in the background. You might have a database of some sort, some storage out there, but when it comes to planning costs around serverless, What are some of the best practices and how you're, how you do that? Like, I'm sure there's plenty of lessons learned in your experiences on, on how you estimate cost, but what advice can you give?

Yan Cui: Sure. And I guess before, before I answer, can I switch the question back to you and say, how would you estimate a cost for a containerized application?

David McKenney: Oh, I would say it's outside of the steady state. So there's the, the containers. I would, I would guess that most would have at least a floor that there's, there's something sitting there running.

And then there's this whole scaling mechanism, so I see where you're going that I've got my floor, just the running environment. And then as I scale, here's how I'm going to here's my points at which I'm going to scale, whether it's CPU or memory or request driven, what have you, and I'm going to put a cost associated with that as I scale to X number of users or requests.

Here's how much that's going to cost. And with functions, I guess I don't know how much of that. I assume a lot of that's still relevant, but you also don't really have to have much underlying running, right? It's because, as you said, you pay for the code when it's executed and really only then. And so you're really, are you just really good about knowing how long every piece of code is going to take to run?

And is that really that important? Like whether it takes a second to run or five seconds to run?

Yan Cui: The execution duration is important, but probably the most important calculation, or, or, or, you know, going how you go from A to B is still the same way you talk about, you talk about how many containers I need to run for a given number of users.

The given number of users is, is, is, is a proxy to the actually important thing, which is how many requests per second you're going to handle. And then, and then from there, you work out how many based on some, you know some, some testing you've done. How many requests per second can we handle with with one instance of my container? And that's how you then work out, okay, what's the baseline number of containers we need to run? And then as you scale and look at, okay, what if the number of users you know, in a daily peak would

be, say 10, 000 Concurrent users and so, you know, with the number of requests, how many requests per second do we expect from that number, that many users and what does that translate to the number of containers we need to run based on our understanding of how many requests per second we can handle with a single instance of our container.

So that's kind of how you predict your, your, your container cost. You can still do the same thing with you know, with your serverless environment. In fact, it can be much more precise because you know exactly how much a single request API gateway is going to is going to cost you.

Excuse me. And you also, you know, from some Some testing, you probably can figure out how much time it's going to take your Lambda function to execute and to handle one single user request. And so from there, you can look, we can work out, okay, what's the cost per, per, per transaction. So imagine a really simple.

API function that's including votes that are API gateway, calling a Lambda function, putting something to DynamoDB. So every single hop component in that user transaction is pay per request. And so you can, with some estimate, well, some some heuristics, I mean, some testing, you can probably work out a reasonable average Lambda function.

And given your memory allocation, you can work out how much that single invocation is going to cost you. And then. Go from the same number that you're using to estimate the cost of your container environment. How many requests per second do we expect to see? At, you know, at the peak and based on the number of users we expect to see and this kind of level of activity we expect to see from those users.

And so you can work out, okay, given the fact that at the 10, 000 users, 10, 000 concurrent users, we're expecting to see maybe about like a 1 to 100 ratio. So about A hundred requests per second. And so every single user request involves API gateway call, lamb function invocation, and the dial db right, the put operation.

And so that's some 0 0 0 0 0 $0.12 per user transaction multiplied by a hundred requests per second. Give you some number, multiply that by number of seconds, and then hours, and then you go from there to work out what's the likely or estimate cost for your for your application. And then you could start looking into your entire application and see, okay, all right.

So we're not just using lambda functions and API Gateway and DynamoDB. And from there, we also have to trigger something else, do some event processing. And so there's also going to be some, you know, calculation based on flow rates, because say we are using SQS. We're writing, you know, a bunch of records into SQS.

And we are using SQS to invoke a lambda function in batches of 10. And so again, you go from there. Okay. How much is those no, a hundred requests per second to SQS is going to cost us because we know how much a single request is going to cost us from the S for SQS, because again, it's pay is based on per request. So based on the, the, the, the initial input of number of users you expect to see, and the number of requests you expect is from those users. You can start to look at your, your, your architecture, almost like, okay, every single hop, you know, my architecture, I can start to put a number on, on

that. And there are also tools like a cloud zero that allows you to turn this estimate into more so like a real time monitoring so that you can see, you know, for individual components, how much of how much costs that you are incurring based on the request that you're actually handling.

But in terms of just doing the estimate, yeah, it's just, you know, starting from the initial number, how many, what's the, what's the input rate, and then and then go from there, and start looking at your architecture, just, you know, one component at a time, and then start putting a number on that.

David McKenney: Yeah, it's funny, like, as you're talking about that, I'm thinking, okay, this is a really long Math problem problems starting from like the user request to the end of the request, but I'm like, man, you can actually probably put a price against an actual request.

That's pretty interesting.

Yan Cui: And that's what we actually talk about when we talk about the fin dev. Which, because we've, this and this, which is quite a popular topic, topic in the, within the sort of serverless com serverless community. Because with this paper user pricing model, you can, you can now really fairly at what provided that you, you have a good understanding of your user behavior and traffic pattern.

You can very accurately estimate the cost for individual components and the functions and so on. So, when it comes to optimizing, you know, before I, I've done, you know, many optimizations on my. You know, server EC2 applications or container applications, oftentimes it's just, okay, I've got a theory that this may do something.

And then I just go ahead and do it and then see what actually happens. Because I couldn't, I can't easily attribute the cost of running those containers of EC2 instances to specific user transactions and specific workloads that we are running because everything can't blend it together into one number, which is Okay.

What's the cost per hour that we're running, that we're paying for this service. And so we come with a way for things like a serverless and the ability to, you know, look at every single component and and understand the cost of that component. You can now start to put the cost for say, supporting a feature.

And I should know this guy who's who wrote the book specification by example is one of my. No favorite books on testing. And he had this thing where he's built a startup. And one of the things he did was, okay, he had this idea for a new feature, which he thinks people's going to like he's his co founder was like, no, one's going to want that.

And he didn't believe it. He went ahead and built it anyway. And he was able to see, okay, okay, right. Actually, not many people use that but he's able to put a number on that user transaction that are related to that feature that he's just implemented. So he can see the cost for that feature. And then he can also then work out, okay, how much revenue are we bringing in with this new feature? And he can actually work out, okay, you know what, we're actually losing money on this feature. So he, you know, eventually just decided to just cut the feature altogether. And when you can

understand The cost of your system to that level of granularity, it opens up some really interesting possibilities in terms of, A, optimization.

You know, when I want to talk about the optimization, okay, do I, which function do I optimize? Well, if I look at a function, I see that it's kind of cost, it's costing us 10 cents a month. There's no possible reason for me to waste any time to optimize it. But if I see the function and it's costing us a thousand dollars a month, then okay, maybe it's worth spending a few days of engineering time and take that down to say half, then the amount of time that I'm spending to fix it.

So to optimize this is going to pay itself back within say the next four or five months or whatever. So you can now start to look at every single optimization you're doing as exercise of that has got a return on investment. Which is a very clear to see when you can really precisely predict the cost and the F and the return on those on those investment that you're going to put into improve that cost for that component.

48:00 - When to Repatriate: Navigating the Decision to Move from Serverless to Containers

David McKenney: Let me ask you this. So serverless, if you're paying by, let's say the second to run code, is there a point with which that that sort of luxury of being able to pay at that level? Is overwritten by bringing, I won't say repatriation, but like, is it at some point does it get to a level where it makes sense to actually have a persistent container running if you were otherwise running that code for, let's say, an hour at a time, or does it still make sense to continue writing it and running it and serverless?

Maybe I don't know if there's any sort of thing where savings plans or reservations apply to draw down on commitments. But at any point, does it make sense to move back to a container for running workloads or not?

Yan Cui: Absolutely. My rule of thumb is Done is that once you hit the something clo anything close to about a thousand requests per second say for like API that workload is gonna work out a lot cheaper running on containers as opposed to running in, you know, with APAC gateway and lamb then whatnot.

So the, the actual approach that most of us advocate for is the serverless. First mindset whereby you start with serverless that gives kind of, that's going to give you the, you know, faster time to market and the cost efficiency as you're starting out. So we've got the example of, you know, Goyco I talked about earlier where you're building a new product or building something that you don't know if anyone's going to adopt.

So you go for something that's going to be cheap and scalable. And so, you know, you don't pay for it if no one's using it. But if lots of people are using it you can still handle the scale and hopefully your costs and your profit is going to grow with that usage as well. So that you know, eventually things are going to cover for yourself. And once you have a more stable environment that is run at a fairly high throughput consistently, because you found market fit, which is the congratulations. At that point, you can then optimize for

cost because every single request that you're paying for Lambda and API Gateway, you're getting a lot of good things out in return, but you're also paying a premium for that.

And so when that premium starts to accumulate over, say, a consistently high number of requests per second then you're going to look at potentially, you know, saving a lot of money by moving that workload back to containers. And if you look at the article from the prime video team, which talked about that, how.

They, they build something which they don't know if anyone's going to use. They use the step functions and lambda functions to do a lot of orchestration. And then like a year later, they actually surprised, you know, they were surprised themselves and they found that the, the, the service got a lot of traction internally, so they decided to, you know, optimize for cost and move their workload into containers.

And they were able to save the cost by something like 90%. And so when you've got, yeah. So when you got really consistent throughput it can potentially save a lot of money by moving stuff back into a containerized environment where you're paying for uptime and the rule of thumb you have really well that I have in AWS is that any service that charges you based on uptime is going to be about 10%.

Well, it's going to be like 10x cheaper at scale. The important thing there really is look at your organization and understand the What skill sets you have and whether or not you have the skill set available to run a scalable large containerized environment already Because that may decide at which point it becomes more expensive and or more cost efficient for you to actually run that workload on containers Because you can't just look at your AWS Bill and decide to make all the decision or make all of your decisions based on that.

You have to think about the total cost of ownership. If you know by move, if a workload is now costing you a hundred dollars a month on running on Lambda and a P gateway, all of that, and you think, I can run this for now $50 a month instead on just running on two container imaging or two containers on, you know, two different AZs then the, you promised the fact that the.

You know, you've got, you're going to get you know, that you may not have the right expertise in the company to, to work with containerized environment. And so if you need to bring someone into the company to do that, suddenly you're looking at, I don't know, what's a reasonable number for DevOps engineers in the US these days.

But probably a few thousand dollars a month. So suddenly, you know, you are down a few thousand dollars a month because you didn't have the right skillset in the company. And now you need to have that skillset in the company to operate your new containerized environment. And so you've got to think about total cost of ownership.

which is going to be very different for a large company that's got a lot of internal expertise with containers versus a team that's more sort of front end focused and don't maybe necessarily have the right expertise in the company to help them run a large scale container environment. So yeah, think about total cost of ownership.

And but at some point, yes you know, it's going to be cheaper to move your workload workload to containers. If you, Also, another thing to think about is that the opportunity cost, because if your team is spending, going to spend more time just looking after the infrastructure, then that means that they're going to spend less time innovating and iterating on the product.

So, so it's not just simply, you know, simple case, simple case of if it's cheaper on the Biobus bill, then I should do this because there's also other costs involved that are less, I guess, less easily measured.

53:14 – Outro: Navigating the Serverless Learning Curve for Developers

David McKenney: So last few minutes here, and I was going to ask a different question, but I think that the timing wise with your response there,

It's super interesting. So labor market affecting everybody right now, education is, is often trailing these rapid advances in technology. You're right. Containers, we're seeing more and more folks who are coming equipped to handle and administer containers. Where do you see this in the serverless market?

Are our developers being exposed early on to this type of development architecture to like infrastructure as code and coding to serverless where they're, you know, in the AWS world, whether they're also using step functions and basically other services to pull this all together, or is this still something that you're learning on the job as you enter the professional market?

Yan Cui: Are you talking about the formula like a new graduate perspective, someone who is just fresh out of university?

David McKenney: Yeah, I'm just, I'm curious at what point is somebody really getting exposed to serverless? Because obviously most of us who go through a computer background have programming abilities. We learned that.

To say that we are programming to an architecture like serverless and all the things that it takes to, to bring this together. It just strikes me as something that I, I guess I wouldn't know if it was something being taught or learned early on or if it's something that you arrive at in, you know, your, whatever job or career you've gone down.

Yan Cui: Yeah, I don't think I don't Marissa, I don't know of any university that has that includes actually maybe Imperial College in the, in the UK because I've seen this, I've done some talks with with the with the students there, and they seem to have some exposure to the cloud and serverless technologies but the good news is that you know, learning serverless and learning the cloud, it's actually not as hard as it used to be and certainly when you're working with a serverless That's probably only about 10 services.

You need to know reasonably well to be able to build most applications. And those services once you've learned them, they're also fairly reusable. The knowledge that you get are fairly reusable. And it's, it's, it's certainly, I think it's something that you can get, you know, you can get to a competent degree fairly quickly.

And I've got some training courses and I've got workshops that, you know, help teach people how to get them faster as well. And but, but yeah, there's nothing really beats just learning by just. No, it doesn't have to be on a job is just learning by doing. I mean, pretty much all of us kind of, kind of, kind of did that just learn by doing, and so when you've got like a, you know, something that you want to try out is the nice thing for, you know, using service technologies is that you only pay when you, someone uses it.

So. Now, when you're building like a hobby application and using Lambda, chances are, you're never going to pay anything because, you know, you're not going to, you're never going to hit anything or go over the free tier. So it's much better to, you know, much easier to try those hobby projects and try things out for yourself and learn the technology really well.

Without spending a lot of money on, you know, paying for EC2 instances that you forgot to turn off. And suddenly you see a 100 bill in your AWS account.

David McKenney: Yeah, no kidding. That's pretty that's a nicety here, right? If it's not sitting there running idle and I think we'll go, I'm sure we'll include some, some notes on some of those resources you talked about.

I know you've got website and a lot of tutorials. which are fantastic and they're always up to date with the latest and greatest things that are coming out. So, Jan, thank you. I'm sure we got to half of the content that we had hoped to, maybe we'll have a part two to cover all the other fun stuff, but I want to thank you for joining us for this this edition of CloudCurrents and I look forward to a followup conversation in the future.

Okay.

Yan Cui: Thank you, David. Thanks again for having me.

David McKenney: All right. We'll see you. Thank you, Yan

More Episodes

EP. 36 Sustainable AI Infrastructure with Clean Energy Innovation with Tamanna Sait

Listen now

EP. 35 Infrastructure as Code: Scaling Cloud Operations with Ryan Raub

Listen now