Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes

Zetaphor@zemmy.cc · 10 months ago

Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes

noneabove1182@sh.itjust.works · 10 months ago

Woah this is pretty interesting stuff, I wonder how practical it is to do, I don’t see a repo offering a script or anything so may be quite involved but looks promising. Anything to reduce size while maintaining performance is huge at this time