Writing
-
The Reward Horizon Problem
When we train models, we use rewards observed at short horizons. When we use models, we care about objectives realized at longer horizons
-
StoriesLM-v2: A Family of Language Models With Time-Indexed Training Data
StoriesLM-v2 is a family of 125 encoder-only language models trained on an expanding sequence of historical language data.