For 1., the author used Ameo (the weird activation function) in the first layer and tanh for others, later on notes:
"While playing around with this setup, I tried re-training the network with the activation function for the first layer replaced with sin(x) and it ends up working pretty much the same way. Interestingly, the weights learned in that case are fractions of π rather than 1."
By the looks of it, any activation function that maps a positive and negative range should work. Haven't tested that myself. The 1 vs π is likely due to the peaks of the functions, Ameo at 1 and sine at π/2.
"While playing around with this setup, I tried re-training the network with the activation function for the first layer replaced with sin(x) and it ends up working pretty much the same way. Interestingly, the weights learned in that case are fractions of π rather than 1."
By the looks of it, any activation function that maps a positive and negative range should work. Haven't tested that myself. The 1 vs π is likely due to the peaks of the functions, Ameo at 1 and sine at π/2.
Regardless, it's not Ameo.