And in some sense, it should be easy to fool yourself that way! After all, it *did* work, that's a correct observation of the stochastic program's behavior! But it may not be typical, expected, or even anything other than a minor miracle.
Testing the performance of stochastic programs not only means fixing the input and initial state, but being very disciplined about how data about that program is gathered. I used to do that, did it for a decade or so, and that shit is *hard*.