A project aimed at understanding the fundamentals of modern, scalable web applications involved designing, building, and deploying an AI-powered chat app from scratch. The approach eschewed popular frameworks and commercial platforms like ChatGPT, focusing instead on engineering, backend development, and cloud deployment. The app was built with a cloud-native architecture, incorporating several APIs, a database, a private network, a reverse proxy, and a simple user interface with session management, all running on a local computer. Key components included:
- Language Model: Utilized the quantized Qwen2.50.5B-Instruct model from Alibaba Cloud, running on CPU with low memory requirements. The model was served using llama.cpp, which supports CPU-based inference and provides a Docker image with an inference engine and a simple web server.
- Database: PostgreSQL was chosen for its robustness and ease of setup via Docker. The database API was built using FastAPI, with SQLAlchemy for database interactions.
- Networking: Services communicated over a user-defined bridge Docker network, ensuring isolation and security. Nginx was used as a reverse proxy to manage external access to the app.
- Deployment: The entire setup was orchestrated using Docker Compose, allowing for easy management of multiple containers.
This project not only provided insights into the inner workings of AI chat applications but also prepared the groundwork for cloud deployment, which will be covered in a subsequent part.
Source: towardsdatascience.com
