Yeah, you’d spend more time filtering out nonsense than you would save vs actually implementing some decent logic.
Maybe use AI trained from a better source to help filter the nonsense from Reddit, and then have a human sample the output. Maybe then you’d get some okay training data, but that’s a bit of putting the cart before the horse.