In a recent project, a dataset of 215 rows extracted from PDF files was used to fine-tune a model. The dataset, focused on law, was transformed into small text chunks. Two approaches were tested to prepare the data. The first involved using NLP techniques and a pre-trained model to generate questions from these chunks, but the results were poor. The second approach used only the chunks themselves, resulting in a dataset shape of (215, 1). After training for 2000 steps, the model showed signs of overfitting. The training loss was remarkably low at 0.00something, while the test loss was significantly higher at around 3.0. These statistics raise important questions about data preparation from PDF files and the readiness of models for real-world deployment.
Source: www.reddit.com

Related Videos
Related X Posts
Sebastian Raschka
@rasbt
Β·
Mar 23
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling. And finally, we
βΞ΅sam
@Hesamation
Β·
Feb 15
we aren’t losing our minds enough over this paper:
> test-time compute by implicitly reasoning in layers
> no special training data like CoT
> can reason beyond words
> 3.5B model with this performs like a 50B model
> more like human brain than current “reasoning” LLMs
Aayush Karan
@aakaran31
Β·
Feb 28
Can machine learning models predict their own errors ?In a new preprint w/
@Apple
collaborators Aravind Gollakota, Parikshit Gopalan, Charlotte Peale, and Udi Wieder, we present a theory of loss prediction and show an equivalence with algorithmic fairness!A thread (1/n):
ππͺπΎπ‘π‘πΎ
@gm8xx8
Β·
Feb 10
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth ApproachThis model scales test-time computation by reasoning in latent space instead of generating more tokens. It uses a recurrent block to improve performance at inference without needing specialized
neptune.ai
@neptune_ai
Β·
Apr 7
When training large-scale models, where do you look for issues?If a few layers are unstable, the overall loss might not show it. Silent failures at the layer level can cause inefficient learning, emergent biases, or even training collapse.#Foundationmodels demand more
Dr Satoshinakamoto
@DrSatoshiN
Β·
17h
from transformers import AutoModelForCausalLM, AutoTokenizer
import torchmodel_name = “gpt2” # using GPT-2 small as an example
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)text = “Hello, my dog is cute”
enc =














