Hi, I'm Fredy Rivera, a Full-Stack Developer and AI-Developer!

Tech Stack

React

Tailwind CSS

PostgreSQL

Vercel

GitHub

Git

Python

FastAPI

Redis

OpenAI

Anthropic

Stripe

Paypal

Supabase

DigitalOcean

AWS (EC2)

React

Tailwind CSS

PostgreSQL

Vercel

GitHub

Git

Python

FastAPI

Redis

OpenAI

Anthropic

Stripe

Paypal

Supabase

DigitalOcean

AWS (EC2)

My Projects

Things I've built

Web budgeting application

A budget management platform for construction companies

PythonFastAPITailwind-CSSHTMLPostgreSQLSupabaseVercel

Aquiles-RAG: High-Performance Vector Search System

Production-ready Retrieval-Augmented Generation (RAG) solution with multiple vector database support

PythonFastAPIRedisQdrantPostgreSQLRAGVector SearchHNSWAI

GitHub

AtlasServer-Core: Self-Hosted Application Server

Fast deploy. No cloud. Just code. A self-hosted server platform with AI-powered deployment capabilities

PythonFlaskFastAPIDjangoCLISelf-HostedAIOllamaDevOpsNgrok

GitHub

LLaDA-from-scratch: Diffusion Language Model Implementation

Building LLaDA, a diffusion-based language model that learns text distribution through progressive masking and reconstruction

PythonPyTorchHuggingFaceDiffusion ModelsNLPDeep LearningA100Research

GitHub

Latest Articles

Recent thoughts and insights

RequestScopedPipeline: Concurrent Inference in Diffusers without Race Conditions or Memory Duplication

Diffusers pipelines weren't designed for concurrency: calling pipe() simultaneously causes race conditions in schedulers, 'Already borrowed' errors in Rust tokenizers, or duplicates entire models in memory. My contribution (#12328) introduces RequestScopedPipeline, which solves these issues by creating lightweight per-request views, cloning only small mutable components, and adding automatic locks to tokenizers. Result: a server that handles multiple concurrent users without exploding GPU memory.

diffusersaiserver-async+3

Sep 30, 20259 min

lambda-gateway: Building a Serverless Host Demo

Have you ever wondered how platforms like Vercel and AWS Lambda work under the hood? I built lambda-gateway, a serverless hosting demo using Docker, FastAPI, and Next.js to discover how serverless architectures work through hands-on experimentation.

lambdaserverlessdocker+4

Invalid Date15 min

Video

Fine-tuning Asclepio-8B and Qwen2.5-VL-3B: Medical Reasoning and Screenshot-to-Code with LoRA

Training specialized models doesn't require impossible GPUs or weeks of compute. Asclepio-8B learns clinical medical reasoning with 1.3M examples in 6.7 hours, reaching 76.9% accuracy. Qwen2.5-VL-3B converts UI screenshots into functional HTML/CSS with 94.6% accuracy in 5.5 hours. Both trained on L4 24GB using LoRA, demonstrating that specialized fine-tuning with smaller models can outperform giant general-purpose models. This post documents configurations, data pipelines, real metrics, and why fast iteration matters more than raw parameters.

fine-tuningLoRAmedical-ai+6

Oct 19, 202516 min

My First Blog Post

Welcome to my blog! This is my first article where I'm going to share my experiences as a Full-Stack developer. In this space I'll be writing about web development, artificial intelligence...

web developmentnext.jsreact+1

Sep 28, 20251 min