Hi, I'm Fredy Rivera, a Full-Stack Developer and AI-Developer!

Tech Stack

React
React
Tailwind CSS
Tailwind CSS
PostgreSQL
PostgreSQL
Vercel
Vercel
GitHub
GitHub
Git
Git
Python
Python
FastAPI
FastAPI
Redis
Redis
OpenAI
OpenAI
Anthropic
Anthropic
Stripe
Stripe
PayPal
Paypal
Supabase
Supabase
DigitalOcean
DigitalOcean
AWS (EC2)
React
React
Tailwind CSS
Tailwind CSS
PostgreSQL
PostgreSQL
Vercel
Vercel
GitHub
GitHub
Git
Git
Python
Python
FastAPI
FastAPI
Redis
Redis
OpenAI
OpenAI
Anthropic
Anthropic
Stripe
Stripe
PayPal
Paypal
Supabase
Supabase
DigitalOcean
DigitalOcean
AWS (EC2)

My Projects

Things I've built

Web budgeting application - Image 1

Web budgeting application

A budget management platform for construction companies

PythonFastAPITailwind-CSSHTMLPostgreSQLSupabaseVercel
Aquiles-RAG: High-Performance Vector Search System - Image 1

Aquiles-RAG: High-Performance Vector Search System

Production-ready Retrieval-Augmented Generation (RAG) solution with multiple vector database support

PythonFastAPIRedisQdrantPostgreSQLRAGVector SearchHNSWAI
GitHub
AtlasServer-Core: Self-Hosted Application Server - Image 1

AtlasServer-Core: Self-Hosted Application Server

Fast deploy. No cloud. Just code. A self-hosted server platform with AI-powered deployment capabilities

PythonFlaskFastAPIDjangoCLISelf-HostedAIOllamaDevOpsNgrok
GitHub
LLaDA-from-scratch: Diffusion Language Model Implementation - Image 1

LLaDA-from-scratch: Diffusion Language Model Implementation

Building LLaDA, a diffusion-based language model that learns text distribution through progressive masking and reconstruction

PythonPyTorchHuggingFaceDiffusion ModelsNLPDeep LearningA100Research
GitHub

Latest Articles

Recent thoughts and insights

RequestScopedPipeline: Concurrent Inference in Diffusers without Race Conditions or Memory Duplication

Diffusers pipelines weren't designed for concurrency: calling pipe() simultaneously causes race conditions in schedulers, 'Already borrowed' errors in Rust tokenizers, or duplicates entire models in memory. My contribution (#12328) introduces RequestScopedPipeline, which solves these issues by creating lightweight per-request views, cloning only small mutable components, and adding automatic locks to tokenizers. Result: a server that handles multiple concurrent users without exploding GPU memory.
diffusersaiserver-async+3
Sep 30, 20259 min
Read more

lambda-gateway: Building a Serverless Host Demo

Have you ever wondered how platforms like Vercel and AWS Lambda work under the hood? I built lambda-gateway, a serverless hosting demo using Docker, FastAPI, and Next.js to discover how serverless architectures work through hands-on experimentation.
lambdaserverlessdocker+4
Invalid Date15 min
Read more
Fine-tuning Asclepio-8B and Qwen2.5-VL-3B: Medical Reasoning and Screenshot-to-Code with LoRA
Video

Fine-tuning Asclepio-8B and Qwen2.5-VL-3B: Medical Reasoning and Screenshot-to-Code with LoRA

Training specialized models doesn't require impossible GPUs or weeks of compute. Asclepio-8B learns clinical medical reasoning with 1.3M examples in 6.7 hours, reaching 76.9% accuracy. Qwen2.5-VL-3B converts UI screenshots into functional HTML/CSS with 94.6% accuracy in 5.5 hours. Both trained on L4 24GB using LoRA, demonstrating that specialized fine-tuning with smaller models can outperform giant general-purpose models. This post documents configurations, data pipelines, real metrics, and why fast iteration matters more than raw parameters.
fine-tuningLoRAmedical-ai+6
Oct 19, 202516 min
Read more

My First Blog Post

Welcome to my blog! This is my first article where I'm going to share my experiences as a Full-Stack developer. In this space I'll be writing about web development, artificial intelligence...
web developmentnext.jsreact+1
Sep 28, 20251 min
Read more