

10·
12 days agoIf I’m creating a corpus for an LLM to consume, I feel like I would probably create some data source quality score and drop anything that makes my model worse.


If I’m creating a corpus for an LLM to consume, I feel like I would probably create some data source quality score and drop anything that makes my model worse.


500 million was specific to Claude Code, they are at 5 billion annual run rate and growing


Could this be attributed to the driver mix changing?
It’s quite possible tesla drivers are worse in 2025 than 2024


Yeah, I never understood this. Targeting legal firearms just annoys people.
I have a theory, that canada has been pressuring the US on gun control (the actual problem), and the whole ‘fentanyl crossing the border’ thing is projection


Yes, but items are targeted to inflict the least amount of pain. We don’t neeed orange juice or bourbon for example.
Yeah, after reading a bit into it. It seems like most of the work is up front, pre filtering and classifying before it hits the model, to your point the model training part is expensive…
I think broadly though, the idea that they are just including the kitchen sink into the models without any consideration of source quality isn’t true