The neural net did spectacularly well with this fanfic if you assume it's supposed to have been written by Kylo Ren himself

@janellecshane imagining kylo ren with a large beard is not something that we should wish on anybody, machine or human, I think 😂

@cbrachyrhynchos @janellecshane

I am trying to decide whether the Victoria Sandwich Cake recipe was the same one that Darth Vader was trying to follow

I am currently training this net from scratch, i.e., without the gpt-2 base model, with Harry Potter stories only.
See @harry_botter for a neural net trained on the english books and @harry_botter_de for a neural net trained on german Harry Potter fanfiction.


The english version using only the books creates okayish unconditional quotes and works badly in the conditional interactive mode.

In addition, I am not sure if the network overfitted the training text a bit. It did not learn the text by heart, but it seems to me that some of the earlier training samples were better than the later ones.
As the input is only 6 MB, it could just be too little input.
Or I am just expecting too much.

You may have a look at the examples in the bot account and tell me if you have an idea what could be improved in the training process.

The german version uses 395 MB of HP fanfiction as input, what may make it much better. I currently have a model with 50000 word dictionary that I consider converged.
I am now training a model with 2000 word dictionary and more layers to see the difference between having almost all words in the dictionary or only the most common ones.

When I am satisfied with the german models (I may try another one with even fewer words), I want to train a model with 10 GB of english fanfiction.

I am not sure about what ressources I will need to train it, though.

Here is an unconditional sample at temperature 0.7:

The ⁇ are unknown words, and the vocabulary of the net does not contain uppercase J or X, so it can only construct words that start with these letters, when the whole word is in the dictionary.

For the newer nets I changed the sentencepiece commandline to create an alphabeth, that covers 100% of the text.

For some reason it seems to always start quotes with an unknown word like "⁇ ust".

The book model is configured like the 345M model (16 and 24 layers) what is probably much too large.
The current german model is configured like the 117M model (12 and 12 layers). The new german model has 16 and 16 layers, as I wanted to compensate for the need to create words from the syllables in the dictionary.
In addition, this parameters result in a model of similar size as the previous one.

Here are some of the earlier training samples. I think, they make more sense than the sample in the pastebin.

@allo Interesting! I don't have much experience with training neural nets as big as GPT-2 from scratch. The coherence does seem to be less than I'd expect from transfer learning from the pretrained GPT-2. If you use gpt-2-simple for transfer learning, you'd be able to do your training for free, so that's always an option. It's interesting to see the results of training from scratch though.

@janellecshane Maybe I should refine the GPT2 models for the english texts.

I thought I want to start from scratch, so Harry does not fly space ships. On the other hand, the power of GPT2 is probably related to the much larger text corpus.

For the german model it does not make sense to refine GPT2, because it seems to contain very little german text.

I am using the original GPT2 training code from the rkfg-fork following the instructions from different posts in github issues.

Playing with such a network is a fun experience, and some texts are quite interesting and funny.

For example I really like when it starts to mix metaphors, as for example "Harrys heart flew into a table".
Such results show, which associations the net is capable to understand and what are the limitations in understanding the context.

The downside of the large models is, that I need to do CPU training, because they do not fit on my graphics card. Otherwise I may have trained the models I consider converged a bit longer and maybe they would reach a lower loss and still improve quite a bit.

@allo it might be worth trying the finetuning even for German. I found finetuning worked pretty well for crochet. Especially since GPU finetuning is quick and free via gpt-2-simple on colab.

@janellecshane Brush with buttercream! If you don't know what to do, just brush with buttercream! It's always the answer!

Maybe not the right answer, though.

Sign in to participate in the conversation
Wandering Shop

The Wandering Shop is a Mastodon instance initially geared for the science fiction and fantasy community but open to anyone. We want our 'local' timeline to have the feel of a coffee shop at a good convention: tables full of friendly conversation on a wide variety of topics. We welcome everyone who wants to participate, so long as you're willing to abide by our code of conduct.