I’m currently trying to implement my own minibatch neural network, and had a question regarding common implementation…

Currently, I pass a minibatch through each layer simultaneously. So if the minibatch consists of 50 samples, I pass all 50 through layer 1, then all 50 through layers 2, then layer 3… Same with backprop.

However, I’m wondering if it’s better (or more common) to pass single samples at a time through the network, accumulate the average gradients, then update. Do you have any advice you can give me? ]]>

Rep: https://github.com/kaihein/nn_comments/blob/master/nn_comments.ipynb

]]>Are “constrain the L2 norm of the weight vectors” and “adding L2 regularization to the network” the same operation? What is the difference?

Thank you. ]]>

http://datascience.stackexchange.com/questions/16364/simple-neural-network-regression-with-autograd

Ay help would be appreciated! Thanks,

]]>I recently met a problem that the training algorithm becomes much slower when the vocabulary size gets extreme large. There is a warning from tensorflow saying that “Converting sparse IndexedSlices to a dense Tensor with 145017088 elements. This may consume a large amount of memory.”. So I am thinking this may be caused by the gradient updates on the embedding matrix. Do you have any solutions on that?

Thanks

]]>