Generate A Word List

Another task that we may want to perform is creating a list of the words that appear in a text. If you simply want to list the words that appear in the first chapter of Frankenstein or the first book of Herodotus, you would use the command frank.words[,5] or hdt.words. Since many lemmas will be repeated in any text, you can generate a list of unique lemmas using the unique() command.

If you want to generate a frequency list for each of these lemmas, you can do this in the same way that we calculated the number of words that appeared in each segment of Books 9 - 12 of the Odyssey using the table command. Following the recipe from http://johnvictoranderson.org/?p=115, we can issue the command hdt.frq <- table(hdt.words) generate a table showing each word and its frequency. This table can be sorted using the sort command so that hdt.frq <- sort(hdt.frq, decreasing=TRUE). After this, the command hdt.frq[1:20] will show us the 20 most frequent lemmata in the first book of Herodotus.

ὁ	ἵημι	εἰμί	ὅς	δέω2
2712	1770	1561	1527	1453
δέω	δεῖ	δέομαι	εἰς	εἰ
1451	1438	1437	1330	1299
εἰ2	αἴ	αἴ2	δέ	καίω
1299	1221	1221	1185	1178
ἀκή	ἀκή2	ἀκή3	καί	καί2
1175	1175	1175	1175	1175

We can combine several of these commands so that frank.frq <- sort(table(frank.words[,5]), decreasing=TRUE) generates a list of the most frequent lemmata in the first chapter of Frankenstein.

<<-- Previous: Calculate Totals and Subtotals-->>
Next: Graphing Results: Bar Graphs and Pie Charts -->>

Statistical Methods for Studying Literature Using R

Jeff Rydberg-Cox, The University of Missouri-Kansas City

Generate A Word List

Table of Contents

Getting Started

Analyzing Literary Data