“In terms of interesting properties -- some type of test time sampling to improve generalization would be interesting (in combination with a contrastive loss to shape the energy function)”
“One big issue is that there is no unique advantage of your method — it is not better than any of the baseline methods and nor does it have any unique properties.”
“For psuedolikelihood, people have been training with it for several decades. For EBMs, you can reference the previous paper Implicit Generation with Energy Based Models and Improved Contrastive Divergence training of Energy Based Models. Both can be added as references and you can also see the paper referenced. ”
TODO: