I have a question. How do we determine the starting and ending of a training sample? For example if we are feeding in a very very long passage of essay to train it to deduce the next word, what would be the starting and ending of each training sample?

Taking an example from this sentence:

…….Peter live in France. He likes to eat bread. He also speak fluent French……..

In the above example, in order to deduce the word French, the country France will have to be taken into account. Then how would I know that my starting point should be “Peter “and ends at “French”? Especially if we are given a very long passage, I have no idea how to go about doing it.

I have googled that vanilla RNN cannot support very long chain and that is why they suggest to use LSTM. Does it mean LSTM chain can go endless? But it does not quite make sense to me as we still need to do back propagation, so the chain should not be endless.

How do we then determine how long the chain should be?

]]>W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)

b1 = np.zeros((1, nn_hdim))

W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)

b2 = np.zeros((1, nn_output_dim))

Why do you divide by the square root of the dimension of the input to W1 and W2? Is that some kind of normalization? If so, what does that give us? What’s the mathematical motivation for it?

Cheers!

]]>I see that a lot of people have posted about the large amount of time that it takes to evaluate all the responses in the databases. One simple solution can be that you first encode every sentence in your response database and store their 256 bit encoding, be it 100 or 10000 sentences. You don’t need to do that again when a new context comes in. You only need to encode that context into a vector, multiply it with the M matrix to get an another 256 bit encoding and finally compare it with all the saved encodings and get the best response. This wouldn’t take that much time.

I could be wrong tho. ]]>