A small open-source LLM, Qwen2.57B-Instruct, can run on consumer hardware and power a multi-agentic RAG (Retrieval Augmented Generation) system. This system, designed to answer complex queries by searching Wikipedia, consists of three agents: a manager agent, a Wikipedia search agent, and a page search agent. Each agent has specific tasks and can use the agent below it in the hierarchy as a tool. The manager agent breaks down user queries into sub-queries, the Wikipedia search agent retrieves relevant information, and the page search agent extracts specific details from Wikipedia pages. This setup allows for iterative problem-solving, where each agent contributes to the final answer by handling different aspects of the query. The system’s architecture leverages the ReAct framework, which involves interleaved reasoning and action steps, enabling dynamic reasoning and planning. Despite its capabilities, the system faces challenges like increased computation time, repetitive actions, and potential hallucination propagation. However, it showcases the potential of small, open-source LLMs in handling complex tasks traditionally reserved for larger, proprietary models.
Source: towardsdatascience.com
