In the dynamic world of software development, processing data comes in various shapes and forms. You’ll often encounter terms like stateful, stateless, and (more recently) serverless or computerless. If you want to design efficient and scalable applications, you’ll have to understand the differences between these processing methods, especially when you want to combine them. In this blog, we'll delve into each approach and explore their characteristics, use cases, and how they relate.
Let's begin with stateless processing, a straightforward approach where the processing relies solely on the input data. For example, converting a text document into a PDF file, calculating the total price of items, or analysing an image for content detection. As long as the input is the same, the output will remain the same as well.
In other words, the outcome is entirely determined by the input. Since there are no external dependencies or side effects, these calculations are often called pure functions. Pure functions can be easily scaled horizontally, because there is no interference between instances. Finally, stateless processing is also idempotent: these functions can easily be retried. Because there is no state, there is no risk of losing consistency.
On the other hand, stateful processing involves manipulating data with dependencies on external states, which are elements beyond the data that is being processed. Examples of stateful processing include a user adding items to a shopping cart in an e-commerce platform or counting the number of clicks on a website. In these examples, the external states are the shopping cart and the previous number of clicks.
Stateful processing introduces a side effect, because the result is defined by the combination of the pre-existing state and the input. As you might have guessed, this means that stateful processing is not idempotent. Adding the same item to a pre-existing shopping cart will lead to a different outcome, since it will contain two copies of that item. Adding the same visitor to the website count twice will result in an incorrect total.
This is why stateful processing is often associated with consistency. The state that is being manipulated must be protected from becoming corrupted; if two users are rating the same restaurant simultaneously, the application should behave correctly. Combining stateful and stateless processing is common in most applications, where the need for external data intertwines with the processing logic.
Now, let's unravel the mystery of serverless processing. Despite its name, serverless computing does involve servers and computers, which is why some people object to the term. The difference is that, from a developer's perspective, you won’t be aware which and how many servers will be used to run your application. Instead, serverless frameworks like Spring Cloud automatically instantiate functions when needed, allowing you to focus solely on writing code.
In this example (which is a complete application), the developer only needs to provide a function that accepts a string, and returns a string. All the processing (in this case, converting the input text to upper case), happens within the context of that function. The framework will take care of all specifics to make it work on platforms like Amazon Lambda, Apache OpenWhisk, Azure Functions, and Project Riff.
Because serverless functions are stateless, they are ideal for horizontal scaling. If there is more demand than one instance can handle in timely fashion, multiple instances can be spun up and the load can be evenly distributed automatically. This automatic scaling is called elasticity. There is no additional logic or complexity to consider, other than basic load balancing. The more hardware resources you have, the more instances you can run.
Now that we’ve explored the differences between stateless, stateful and serverless processing, let’s look at an interesting use case where they intersect. What if you want the elasticity benefits of serverless computing, but your application requires stateful processing?
There are two ways of including an external state in a serverless application. The first is to connect to the database (an external state) when the application starts. When the data starts pouring in, the processing can use the connection that has already been set up. The second is to connect to the database when the data arrives in the application. Both approaches have their pros and cons, but you will end up with something like this when you spin up multiple instances of your application.
While this may look fine at first, there are some challenges related to the elasticity of your applications.
Serverless computing solves part of the elasticity puzzle for computational tasks, but it does not resolve scaling challenges related to state. The database often becomes the limiting factor in achieving true elasticity for stateful applications. Having a single database is fine for some applications, but it’s important to consider the limits of an architecture. Sharding and distributing the data of your applications across multiple database systems can help alleviate some limitations, but this introduces higher complexity and costs in turn.
What if your application not only has to be both elastic and stateful, but also straightforward to develop and scale like in a serverless framework? One possible solution is to use Kalix, formerly known as AkkaServerless.
Kalix allows you to execute a similar function as we showed in our Spring Cloud example, but while incorporating state. To do so, you will have to convert the current state of the application into its new state, based on an event.
In this example, the current state of the application is ‘injected’ into the function together with the event causing the change. The Kalix serverless framework takes care of everything else, while ensuring that your application will be elastic.
Obviously, there is more to it than just this small code snippet, but this illustrates what people mean when they talk about stateful serverless applications.
If you want to make an informed decision about the architecture of your applications, it’s crucial to know the differences between stateful, stateless, and serverless processing. Stateless processing thrives in serverless environments, because it enables easy horizontal scaling and optimal resource utilisation. However, if you want to work with stateful applications in elastic serverless environments, it's essential to consider the potential challenges and limitations that may arise.
Need some expert advice on which architecture would be best for your application landscape? Want some more information about Kalix and other serverless frameworks? At Optis, we’ve got plenty of experience with all kinds of processing for our custom Java and JavaScript applications. Feel free to contact us with any questions!