Generate A Word List
Another task that we may want to perform is creating a list of the words that appear in a text. If you simply want to list the words that appear in the first chapter of Frankenstein or the first book of Herodotus, you would use the command frank.words[,5]
or hdt.words
. Since many lemmas will be repeated in any text, you can generate a list of unique lemmas using the unique()
command.
If you want to generate a frequency list for each of these lemmas, you can do this in the same way that we calculated the number of words that appeared in each segment of Books 9 - 12 of the Odyssey using the table
command. Following the recipe from http://johnvictoranderson.org/?p=115, we can issue the command hdt.frq <- table(hdt.words)
generate a table showing each word and its frequency. This table can be sorted using the sort
command so that hdt.frq <- sort(hdt.frq, decreasing=TRUE)
. After this, the command hdt.frq[1:20]
will show us the 20 most frequent lemmata in the first book of Herodotus.
ὁ | ἵημι | εἰμί | ὅς | δέω2 |
2712 | 1770 | 1561 | 1527 | 1453 |
δέω | δεῖ | δέομαι | εἰς | εἰ |
1451 | 1438 | 1437 | 1330 | 1299 |
εἰ2 | αἴ | αἴ2 | δέ | καίω |
1299 | 1221 | 1221 | 1185 | 1178 |
ἀκή | ἀκή2 | ἀκή3 | καί | καί2 |
1175 | 1175 | 1175 | 1175 | 1175 |
 
We can combine several of these commands so that frank.frq <- sort(table(frank.words[,5]), decreasing=TRUE)
generates a list of the most frequent lemmata in the first chapter of Frankenstein.
<<-- Previous: Calculate Totals and Subtotals-->>
Next: Graphing Results: Bar Graphs and Pie Charts -->>