In the world of AI chatbots, bigger models with larger context windows often outperform others in complex tasks. However, practical applications like Snapchat’s My AI show that the most relevant models, not necessarily the best, are key. For instance, while one model might excel in answering Ph.D.-level math questions, another might be more suitable for simple, everyday interactions. This choice is driven by cost and latency considerations. If most requests are simple greetings or chit-chat, using the largest models would be inefficient. Moreover, when serving millions of requests, server capacity becomes a bottleneck. During peak hours, using less in-demand models can reduce latency, ensuring users don’t wait long for responses, even if the quality is slightly lower.
Source: towardsdatascience.com
