"Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify." If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs.