*The content of this post is written in a hurry. Please feel free to contact me at the bottom of page.*

Here is a commented version of the code proposed by Andrej Karpathy in his article Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy .

The model is a simple RNN with one hidden layer of size `hidden_size`

.

It is parameterized by:

## Reading data

Steps are:

- Reading of text input.
- Recovery of the list of unique characters that compose the other words of vocabulary.
- Construction of two dictionaries:
- A dictionary encoding characters in number so that the RNN works with numbers. Example:
`{'a':0, 'b':1, 'c':2, ...}`

.
- A dictionary decoding number into characters to translate the character output of RNN. Example:
`{0:'a', 1:'b', 2:'c', ...}`

.

## Forward pass

Let $c(t)$ the character read at $t$ and $c(t+1)$ the next character to be predicted.
$x(t)$ encodes $c(t)$ as a vector of `vocab_size`

size.

$y(t)$ encodes probabilistically the next letter to be predicted $c(t+1)$.

Using the example of previous encoding dictionary:

In accordance with the structure of the RNN, the equations relating $ x(t) $ and $ p(t) $ are the following:

We also find these equations “forward” in the sampling function.

## Backward pass

Let the cost function: .

Let $ \Delta y(n) = \hat{y}(n) - y(n) $.

The minimization of cost function using gradient descent gives:

where

Same for and .

## Sampling

The forward propagation equation is used.

Then the probability distribution estimated by the RNN is used to select randomly the letter sent to output.

## Main code

Main code of the program that calls successively:

- Read the input and target
- Random drawing of Monte Carlo type
- Update the neural network parameters