80% of Meetings Don't Need AI: Simple Models Outperform Costly Alternatives

In a startup focused on classifying online sales meeting transcripts, two tasks were identified: distinguishing between prospects and sales representatives, and classifying meetings as internal or external. With labeled data available, a suggestion was made to use two tf-idf/count vectorizers combined with simple machine learning models. This approach was proposed due to the perceived simplicity of the tasks. However, team members, lacking data science experience, proposed training two separate Llama3 models for each task and also considered using ChatGPT. The suggested Llama3 approach would require significant resources, including an A100 for training and inference due to large contexts of over 10,000 tokens. In contrast, the tf-idf method could be run on a lambda and trained locally. Notably, 80% of meetings already have true labels in their metadata, negating the need for any model in those cases. Even if the tf-idf model performed 10% worse than Llama3, the actual performance difference would only be 2%.

Source: www.reddit.com