lambda-gateway: Building a Serverless Hosting from Scratch
Before wanting to build our own Serverless Hosting, we first need to understand what Serverless Hosting or Serverless Platforms are. These are cloud services that allow you to run code, frontend, and modern web applications, such as Vercel and AWS Lambda. All this code is executed through specific events, such as making an API request or accessing a SaaS frontend.
When multiple events occur simultaneously, the platform automatically creates more instances of the function to handle them (horizontal scaling). Once the function finishes executing, the environment shuts down automatically and doesn't consume resources after completing its task.
There are also platforms that allow concurrency in functions and/or events, where a single instance handles multiple simultaneous calls. This optimizes efficiency by leveraging idle time and can reduce costs by up to 50%.
Main Features
- Auto-scaling: Your code runs by automatically scaling up or down based on demand.
- Pay-per-use: You only pay for actual execution time, not for idle servers.
- No infrastructure management: Lambda manages all the infrastructure to run your code in a highly available and fault-tolerant environment, freeing you to focus on building differentiated backend services.
- Event-driven: Functions execute in response to events (HTTP requests, file uploads, database changes, etc.).
From all of this, we can now define the minimum features for our Serverless Hosting, which will be:
- Event-driven activation: Execute code in response to HTTP requests.
- Event concurrency: Handle multiple simultaneous requests efficiently.
- Front-End hosting: Serve static and dynamic web applications.
- Deployment interface: A system that allows generating the necessary builds for deployment.
To handle all the build and invocation logic, I decided to create a class called BuildandRunLambda
that will internally communicate with Docker
using the Python SDK. When initializing the class, we check whether it's for build or invocation. If it's for build, we validate if the .dockerignore
exists; if not, we create it automatically and get the environment variables to connect to Docker
.
class BuildandRunLambda: def __init__(self, project_path: str | None = None): self.client = docker.from_env() self.project_path = Path(project_path) if project_path is not None else None self._invoke_only = project_path is None # This is used to verify that a .dockerignore exists. # If it doesn't exist, we create it so builds aren't too heavy. if self.project_path is not None: self._ensure_dockerignore() @classmethod def for_invoke(cls): return cls(project_path=None)
Helper Methods
Before getting into the main methods, we need some helpers to make everything work correctly.
Stopping containers: The stop_and_collect(...)
method handles stopping active containers and, if remove_after
is True, removes them afterward.
def stop_and_collect(self, container, timeout: int = 10, remove_after: bool = True): try: if container.status == 'running': container.stop(timeout=timeout) result = container.wait(timeout=timeout) logs = container.logs(stdout=True, stderr=True) exit_code = result.get('StatusCode', None) if isinstance(result, dict) else None if remove_after: try: container.remove() except Exception: pass return { "exit_code": exit_code, "logs": logs.decode() if isinstance(logs, bytes) else str(logs) } except Exception as e: try: container.remove(force=True) except Exception: pass return {"error": str(e)}
Handling Dockerfiles: The following methods get the correct dockerfile
and environment variables according to the framework. If the dockerfile
doesn't exist, they create it automatically (we support Next.js
and Vite
).
def _get_dockerfile(self, framework: str) -> str: dockerfiles = { "nextjs": "Dockerfile.nextjs", "vite": "Dockerfile.vite", "react": "Dockerfile.vite" } return dockerfiles.get(framework, "Dockerfile") def _get_runtime_env(self, framework: str, env_vars: dict) -> dict: if framework == "nextjs": return { k: v for k, v in env_vars.items() if not k.startswith("NEXT_PUBLIC_") } return {} def create_dockerfile(self, framework: str): if self.project_path is None: raise RuntimeError("You need project_path to create Dockerfiles") dockerfile_path = self.project_path / self._get_dockerfile(framework) if dockerfile_path.exists(): print(f"{dockerfile_path.name} already exists") return content = self._get_dockerfile_content(framework) dockerfile_path.write_text(content) if framework in ["vite", "react"]: self._create_nginx_conf()
Generating .dockerignore: To keep builds from being too heavy, we create a .dockerignore
automatically if it doesn't exist.
def gen_dockerignore(self): # We generate a generic .dockerignore # for JS and TS frameworks file = """ node_modules .next .git .env*.local npm-debug.log* README.md .dockerignore Dockerfile """ return file def _ensure_dockerignore(self): # We validate if the .dockerignore exists if self.project_path is None: return dockerignore_path = self.project_path / ".dockerignore" if not dockerignore_path.exists(): dockerignore_content = self.gen_dockerignore() dockerignore_path.write_text(dockerignore_content)
I'm not going to put the code for _get_dockerfile_content(...)
and _create_nginx_conf(...)
here because they're long, but basically they generate the Dockerfiles
and nginx configuration depending on the framework. You can see the complete code in the repository.
The build(...) method
This is where the build magic happens. We receive the app_name
, framework
and env_vars
, validate that the instance is for builds, create the base_path
for internal routing, and generate the Dockerfile
.
Then we use client.images.build()
to build the Docker image. The important parameters are:
- path: where the code is
- dockerfile: which Dockerfile to use
- tag: image name (
{app_name}:latest
) - buildargs: environment variables for the build
- rm: cleans intermediate containers
- pull: downloads the latest version of the base image
While building, we display the logs in real-time to see what's happening.
def build(self, app_name: str, framework: str, env_vars: dict | None = None): if self._invoke_only or self.project_path is None: raise RuntimeError("""This instance does not have a 'project_path'. Use the constructor with project_path to 'build'.""") env_vars = env_vars or {} base_path = f"/app/{app_name}" env_vars['BASE_PATH'] = base_path self.create_dockerfile(framework) dockerfile = self._get_dockerfile(framework) image, logs = self.client.images.build( path=str(self.project_path), dockerfile=dockerfile, tag=f"{app_name}:latest", buildargs=env_vars, rm=True, pull=True ) try: for log in logs: if isinstance(log, dict) and 'stream' in log: print(log['stream'].strip()) else: print(str(log)) except Exception: pass return image
The invoke_function(...) method
This method executes containers with already built applications. It receives the app_name
, framework
, port
and optionally env_vars
.
First we get the runtime-specific environment variables and configure the port. Depending on the framework, the internal port changes: Next.js uses 3000, while Vite and React use nginx's port 80.
Then we run the container with limited resources (128MB RAM and 0.5 CPU) to simulate a real serverless environment. The container runs in detach
mode (background) and we add labels to identify it easily.
At the end we wait 5 seconds for it to start, reload the container state and display the logs to verify that everything is working.
def invoke_function(self, app_name: str, framework: str, port: int, env_vars: Optional[Dict] = None): env = self._get_runtime_env(framework, env_vars or {}) env['PORT'] = str(port) env['HOSTNAME'] = '0.0.0.0' # We don't pass BASE_PATH at runtime - only at build # The container serves from "/" internally if framework in ['vite', 'react']: internal_port = 80 else: internal_port = port container = self.client.containers.run( image=f"{app_name}:latest", detach=True, remove=False, ports={f'{internal_port}/tcp': port}, # Limited resources mem_limit="128m", nano_cpus=500000000, # 0.5 CPU environment=env, labels={ "type": "serverless", "invocation": str(time.time()) } ) time.sleep(5) container.reload() logs = container.logs(tail=50).decode() print(f"\n{'='*50}") print(f"State: {container.status}") print(f"Port: {port}") print(f"Logs:\n{logs}") print(f"{'='*50}\n") return container
With these main methods (build
and invoke_function
) and the helpers we saw before, we now have a complete class to build and run frontend applications in Docker, simulating the basic behavior of serverless hosting.
The utils module
This module contains helper functions used throughout the application to manage containers, HTTP requests and configuration.
cleanup_idle_containers(...): It's a background task that checks every 5 seconds if there are idle containers. If a container hasn't received requests for more than 15 seconds (defined in CONTAINER_IDLE_TIMEOUT
), it stops and removes it automatically. This simulates the behavior of serverless platforms where containers shut down when not in use to save resources.
get_app_url(...): Builds the complete URL of an application from the app name and HTTP request. Basically takes the server's base URL and adds /app/{app_name}
to generate the route where the application will be available.
wait_for_service(...): Waits for a service to be available before continuing. Makes periodic HTTP requests to a URL until it responds correctly or until the timeout expires. It's useful after starting a container to make sure the application is ready before starting to send traffic to it.
filter_request_headers(...): Filters HTTP headers that shouldn't propagate between client and container. Removes "hop-by-hop" headers like connection
, host
, transfer-encoding
, etc., which are connection-specific and shouldn't be passed to the container.
get_next_available_port(...): Automatically assigns an available port for each new container. Starts at port 3500 and increments to avoid conflicts. Each time a function is invoked, it gets the next available port.
The main Backend
This is where everything comes to life. This module uses FastAPI to create the API that handles builds, deployments and acts as a reverse proxy for applications. What's interesting is how we handle concurrency to optimize resources.
Initial configuration
We define three global dictionaries to maintain state:
from fastapi import FastAPI, Request, HTTPException, Response from fastapi.middleware.cors import CORSMiddleware from contextlib import asynccontextmanager from buildlambda import BuildandRunLambda from pydantic import BaseModel, Field from typing import Dict import asyncio import httpx import time from utils import (cleanup_idle_containers, get_app_url, wait_for_service, filter_request_headers, get_next_available_port) deployed_apps: Dict[str, dict] = {} # Apps we've already built app_locks: Dict[str, asyncio.Lock] = {} # Locks to avoid race conditions running_containers: Dict[str, dict] = {} # Active containers with their timestamp
Lifecycle
When starting the application we create a global instance of BuildandRunLambda
and start the automatic container cleanup task. When the app shuts down, we make sure to stop all containers cleanly.
@asynccontextmanager async def lifespan(app: FastAPI): app.state.run = BuildandRunLambda() cleanup_task = asyncio.create_task(cleanup_idle_containers(running_containers)) yield cleanup_task.cancel() for app_name, info in running_containers.items(): try: if info.get('container'): info['container'].stop() info['container'].remove() except: pass app = FastAPI(lifespan=lifespan)
Building
This endpoint receives the project, does the build and saves the app info. If you don't specify a port, we assign one automatically. We also create a lock for each app - this is key for the concurrency handling we'll see later.
class JSONBuild(BaseModel): project_path: str app_name: str framework: str env_vars: dict port: int | None = Field(default=None) @app.post("/build/lambda") async def build_lambda(q: JSONBuild): build = BuildandRunLambda(q.project_path) try: build.build(app_name=q.app_name, framework=q.framework, env_vars=q.env_vars) if q.port is None: port = get_next_available_port() else: port = q.port deployed_apps[q.app_name] = { "framework": q.framework, "port": port, "env_vars": q.env_vars } app_locks.setdefault(q.app_name, asyncio.Lock()) return {"success": True} except Exception as e: print(f"Error: {e}") return { "success": False, "error": str(e), }
Listing apps
Simple - returns all built apps with their metadata and status (running or stopped).
@app.get("/apps") async def get_apps(request: Request): results = [] for name, info in deployed_apps.items(): port = info.get("port") framework = info.get("framework") env_vars = info.get("env_vars", {}) is_running = name in running_containers try: url = get_app_url(name, request) except Exception: url = f"{str(request.base_url).rstrip('/')}/app/{name}" results.append({ "app_name": name, "url": url, "port": port, "framework": framework, "env_vars": env_vars, "status": "running" if is_running else "stopped" }) return {"apps": results}
Basic redirect
To avoid issues with relative paths, we redirect /app/{app_name}
to /app/{app_name}/
.
@app.get("/app/{app_name}") async def redirect_app_root(app_name: str): return Response( status_code=307, headers={"Location": f"/app/{app_name}/"} )
The reverse proxy - where the serverless magic happens
This is the heart of the system. Here we handle concurrent invocations intelligently: if multiple requests arrive simultaneously for the same app, we only start one container and all requests are redirected to that same container. This simulates exactly how real serverless platforms work.
First we validate that the app exists and build the target URL according to the framework. For Vite/React we remove the /app/{app_name}
prefix because nginx serves from root, while Next.js handles the basePath
internally.
Now comes the good part: we check if there's already a running container. If it doesn't exist, we use the app's lock (remember we created it during build) to ensure only one thread starts the container. The "double-check locking" pattern is key here - we check twice if the container exists to avoid race conditions.
Imagine this scenario: 10 simultaneous requests arrive for an app that isn't running. Without the lock, all 10 would try to start separate containers. With the lock, the first request acquires the lock, verifies there's no container, starts it and registers it. The other 9 requests wait at the lock, but when they finally acquire it, the second check detects that a container already exists and they simply reuse it. Boom - resource optimization.
Once the container is starting (or was already running), we wait up to 15 seconds for it to respond to the health check. If it doesn't respond, we stop it and return an error. If everything works, we update the last access timestamp (so the cleanup task knows it's active) and proxy the request.
@app.api_route("/app/{app_name}/{path:path}", methods=["GET", "POST", "PUT", "DELETE", "PATCH"]) async def proxy_to_app(app_name: str, path: str, request: Request): if app_name not in deployed_apps: raise HTTPException(404, f"App '{app_name}' not found") app_info = deployed_apps[app_name] target_port = app_info['port'] framework = app_info['framework'] # Build the URL according to the framework if framework in ["vite", "react"]: target_url = f"http://localhost:{target_port}/{path}" else: target_url = f"http://localhost:{target_port}/app/{app_name}/{path}" if request.url.query: target_url += f"?{request.url.query}" lock = app_locks.setdefault(app_name, asyncio.Lock()) # First check - is there a container? container_info = running_containers.get(app_name) if container_info is None: async with lock: # Second check for safety (double-check locking) # This prevents multiple concurrent requests from starting the same container container_info = running_containers.get(app_name) if container_info is None: try: print(f"Starting container for '{app_name}'...") container = await asyncio.to_thread( app.state.run.invoke_function, app_name, app_info['framework'], target_port, app_info.get('env_vars', {}) ) running_containers[app_name] = { 'container': container, 'last_access': time.time() } container_info = running_containers[app_name] # Health check if framework in ["vite", "react"]: health_check_url = f"http://localhost:{target_port}/" else: health_check_url = f"http://localhost:{target_port}/app/{app_name}/" ready = await wait_for_service(health_check_url, timeout=15.0, interval=0.2) if not ready: try: info = await asyncio.to_thread(app.state.run.stop_and_collect, container, 3, True) running_containers.pop(app_name, None) except Exception: info = {"error": "The container could not be stopped after a timeout"} raise HTTPException(status_code=503, detail=f"""The service on '{app_name}' did not respond in a timely manner. Info: {info}""") except Exception as e: running_containers.pop(app_name, None) raise HTTPException(status_code=500, detail=f"Error starting container: {e}") # We already have a container (just created or already existing) # Update the timestamp so it doesn't shut down container_info['last_access'] = time.time() # Proxy the request try: body = await request.body() headers = filter_request_headers(dict(request.headers)) async with httpx.AsyncClient() as client: resp = await client.request( method=request.method, url=target_url, headers=headers, content=body, timeout=30.0 ) response_headers = {k: v for k, v in resp.headers.items() if k.lower() not in ("content-encoding", "transfer-encoding", "connection")} return Response(content=resp.content, status_code=resp.status_code, headers=response_headers, media_type=resp.headers.get("content-type")) except httpx.ConnectError: running_containers.pop(app_name, None) raise HTTPException(503, f"Could not connect to '{app_name}'") except httpx.TimeoutException: raise HTTPException(504, f"Timeout connecting with '{app_name}'") except Exception as e: raise HTTPException(500, f"Proxy error: {str(e)}")
CORS and static files
We enable CORS for development and handle a curious case: when frameworks request static files without specifying the app, we use the referer
header to guess which app they belong to and redirect correctly.
app.add_middleware( CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) @app.get("/{filename:path}") async def catch_static_files(filename: str, request: Request): static_extensions = ( '.svg', '.png', '.jpg', '.jpeg', '.gif', '.ico', '.webp', '.woff', '.woff2', '.ttf', '.eot' ) if not filename.lower().endswith(static_extensions): raise HTTPException(404, "Not found") if not deployed_apps: raise HTTPException(404, f"File '{filename}' not found") referer = request.headers.get('referer', '') target_app = None for app_name in deployed_apps.keys(): if f"/app/{app_name}" in referer: target_app = app_name break if not target_app: target_app = list(deployed_apps.keys())[0] redirect_url = f"/app/{target_app}/{filename}" return Response( status_code=307, headers={"Location": redirect_url} ) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=5500)
And that's it. The complete system works like this: you make a request to /app/{app_name}/
, the backend checks if there's an active container. If there isn't, it starts one (but only one, no matter how many requests arrive simultaneously). Once ready, it proxies your request to the container. If the container doesn't receive traffic for 15 seconds, it shuts down automatically. When the next request arrives, it starts up again. Exactly like AWS Lambda or Vercel.
Running lambda-gateway
To interact with this FastAPI
backend I built a basic frontend with Next.js
and React
. Before running lambda-gateway, make sure to create a .env.local
in the frontend root folder containing this:
NEXT_PUBLIC_BACKEND_URL="http://127.0.0.1:5500"
Or the backend URL you've configured. Also make sure you've run npm install .
beforehand to avoid errors when executing it.
Once everything is configured, run lambda-gateway and you'll see an interface like this:
Testing lambda-gateway with Next.js and Vite+React
Conclusion
And that's how I built lambda-gateway. It started as an experiment to understand how serverless platforms work internally, and ended up being a functional implementation that simulates the basic concepts: event-driven activation, concurrency, auto-scaling (well, more like auto-shutdown) and application hosting.
The complete code is available on GitHub under Apache 2.0 license, so you can clone it, modify it, break it and improve it as you like. If you find bugs or have ideas to improve it, pull requests are welcome.
Obviously this isn't a replacement for Vercel or AWS Lambda - there are a thousand things I didn't implement (data persistence, monitoring, robust logging, secrets management, CDN, etc.). But as a learning exercise, it helped me understand much better how these platforms work internally. I hope it helps you too.
If you have questions or comments, you can find me on my portfolio.