Predicting the Order of Upcoming Tokens Improves Language Modeling

(arxiv.org)

7 points | by wavelander 4 days ago ago

2 comments

NitpickLawyer 4 days ago ago
Are any of these methods doable on pre-trained models? Like freeze the model and only train these add-ons? Having to redo the training runs with these optimisations doesn't sound too practical, in the great scheme of things.
[-]
- impossiblefork 3 days ago ago
  It's obviously practical for the next model you train from scratch. The point of research is obviously not to improve existing commercial products.