Another task that we may want to perform is creating a list of the words that appear in a text. If you simply want to list the words that appear in the first chapter of Frankenstein or the first book of Herodotus, you would use the command frank.words[,5] or hdt.words. Since many lemmas will be repeated in any text, you can generate a list of unique lemmas using the unique() command.

If you want to generate a frequency list for each of these lemmas, you can do this in the same way that we calculated the number of words that appeared in each segment of Books 9 - 12 of the Odyssey using the table command. Following the recipe from, we can issue the command hdt.frq <- table(hdt.words) generate a table showing each word and its frequency. This table can be sorted using the sort command so that hdt.frq <- sort(hdt.frq, decreasing=TRUE). After this, the command hdt.frq[1:20] will show us the 20 most frequent lemmata in the first book of Herodotus.



We can combine several of these commands so that frank.frq <- sort(table(frank.words[,5]), decreasing=TRUE) generates a list of the most frequent lemmata in the first chapter of Frankenstein.

