Shocking 3.0+ Test Loss: Is Your AI Model Ready for the Real World?

In a recent project, a dataset of 215 rows extracted from PDF files was used to fine-tune a model. The dataset, focused on law, was transformed into small text chunks. Two approaches were tested to prepare the data. The first involved using NLP techniques and a pre-trained model to generate questions from these chunks, but the results were poor. The second approach used only the chunks themselves, resulting in a dataset shape of (215, 1). After training for 2000 steps, the model showed signs of overfitting. The training loss was remarkably low at 0.00something, while the test loss was significantly higher at around 3.0. These statistics raise important questions about data preparation from PDF files and the readiness of models for real-world deployment.

Source: www.reddit.com

Related X Posts

Sebastian Raschka @rasbt · Mar 23
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling. And finally, we

ℏεsam @Hesamation · Feb 15
we aren’t losing our minds enough over this paper: > test-time compute by implicitly reasoning in layers > no special training data like CoT > can reason beyond words > 3.5B model with this performs like a 50B model > more like human brain than current “reasoning” LLMs

Aayush Karan @aakaran31 · Feb 28
Can machine learning models predict their own errors ?In a new preprint w/ @Apple collaborators Aravind Gollakota, Parikshit Gopalan, Charlotte Peale, and Udi Wieder, we present a theory of loss prediction and show an equivalence with algorithmic fairness!A thread (1/n):

𝚐𝔪𝟾𝚡𝚡𝟾 @gm8xx8 · Feb 10
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth ApproachThis model scales test-time computation by reasoning in latent space instead of generating more tokens. It uses a recurrent block to improve performance at inference without needing specialized

neptune.ai @neptune_ai · Apr 7
When training large-scale models, where do you look for issues?If a few layers are unstable, the overall loss might not show it. Silent failures at the layer level can cause inefficient learning, emergent biases, or even training collapse.#Foundationmodels demand more

Dr Satoshinakamoto @DrSatoshiN · 17h
from transformers import AutoModelForCausalLM, AutoTokenizer import torchmodel_name = “gpt2” # using GPT-2 small as an example model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)text = “Hello, my dog is cute” enc =

Shocking 3.0+ Test Loss: Is Your AI Model Ready for the Real World?

Related Videos

Related X Posts

Related Posts:

Stats Bag

Easter Eggs

Stats Bag