Generate A Word List

Another task that we may want to perform is creating a list of the words that appear in a text. If you simply want to list the words that appear in the first chapter of Frankenstein or the first book of Herodotus, you would use the command frank.words[,5] or hdt.words. Since many lemmas will be repeated in any text, you can generate a list of unique lemmas using the unique() command.

If you want to generate a frequency list for each of these lemmas, you can do this in the same way that we calculated the number of words that appeared in each segment of Books 9 - 12 of the Odyssey using the table command. Following the recipe from http://johnvictoranderson.org/?p=115, we can issue the command hdt.frq <- table(hdt.words) generate a table showing each word and its frequency. This table can be sorted using the sort command so that hdt.frq <- sort(hdt.frq, decreasing=TRUE). After this, the command hdt.frq[1:20] will show us the 20 most frequent lemmata in the first book of Herodotus.

ἵημιεἰμίὅςδέω2
27121770156115271453
δέωδεῖδέομαιεἰςεἰ
14511438143713301299
εἰ2αἴαἴ2δέκαίω
12991221122111851178
ἀκήἀκή2ἀκή3καίκαί2
11751175117511751175

 

We can combine several of these commands so that frank.frq <- sort(table(frank.words[,5]), decreasing=TRUE) generates a list of the most frequent lemmata in the first chapter of Frankenstein.

<<-- Previous: Calculate Totals and Subtotals-->>
Next: Graphing Results: Bar Graphs and Pie Charts -->>