Free Download — 467 current fiction substrings (fiction.txt) The most frequently occurring 467 character sequences (n-grams) occurring in a best-selling novel by Amy Tan in 1990.
Offsite — Here are the datasets backing the Google Books Ngram Viewer. These datasets were generated in July 2009; we will update these datasets as our book scanning continues, and the updated versions will have distinct and persistent version identifiers (20090715 for the current set). Each of the links below will directly download a fragment of the given corpus. For instance, ...