This diagram shows how information flows through Encoders Decoder model. This diagram is using a word embedding layer to get better word representation. A word embedding layer is usally GloVe or Word2Vec algorithm that just takes a bunch of words and creates a weighted matrix that allows similar words to be correlated with each other. Using an embedding layer genererally makes your RNN more accurate because it is a better representation of how similar words are so the net has less to infer.
Another amazing application of RNNs is machine translation. This method is interesting because it involves training two RNNs simultaneously. In these networks the inputs are pairs of sentences in different languages. For example you can feed the network an English sentence paired with its French translation. With enough training you can give the network an english sentence and it will translate it to french! This model is called a Sequence 2 Sequences model or Encoder Decoder model.
This diagram taken from the article referenced above shows how the model would predict “hello”. This gives a good visualization of how these networks take in a word character by character and predict the likely hood of the next probable character.