In-context learning has been consistently shown to exceed hand-crafted neural learning algorithms across the board.
But it's limited by the length of the context. Even with neural architectures allowing context to grow to infinity, these come with high costs and scaling problems.
Is there a way to incorporate new knowledge learned in-context back into neural network weights?
Of course there is!
Let's imagine we have a lot of data, sequences of instructions and outputs where in-context learning happens.
From this data we can produce a dataset of synthetic data which presents the new knowledge learned. We can continually train the model with this dataset.
Of course this is super slow and inconvenient. But as a result we'll get a dataset with in-context learning happening, and old model weights against new model weights.
We can use this data to train a neural programmer model directly!
That model would take in the context as such, and if in-context learning has happened in those interactions, it can predict the changes to the neural network weights which would happen if the long and heavy synthetic data pipeline had been run.
Instead of the heavy pipeline, we can just use the neural programmer model to directly update the large model weights based on the in-context learning it experienced, to crystallize the learnings into its long-term memory, not unlike what hippocampus does in the human brain.