@allo Interesting! I don't have much experience with training neural nets as big as GPT-2 from scratch. The coherence does seem to be less than I'd expect from transfer learning from the pretrained GPT-2. If you use gpt-2-simple for transfer learning, you'd be able to do your training for free, so that's always an option. It's interesting to see the results of training from scratch though.