Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If they don't actually help then the attention weight for that connection would tend to 0, right? Then it becomes a problem of overfitting which we have a large arsenal to combat.


> If they don't actually help then the attention weight for that connection would tend to 0, right?

Not necessarily. That is what this paper is about. From what I understand, they also consider graph attention networks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: