Corpus of Erotica Stories

Overview

Excellent resource for working with natural language processing and machine learning. This corpus consists of 4771 raw text erotica stories collected from www.textfiles.com/sex/EROTICA. A logical flow from the encouragement of writing on BBSes, people have been writing some form of erotica or sexual narrative for others for quite some time. With the advent of Fidonet and later Usenet, these stories achieved wider and wider distribution. Unfortunately, the nature of erotica is that it is often uncredited, undated, and hard to fix in time. As a result, you might be looking at stories much older or much newer than you might think.

Format

A VERY Large amount of these stories came to textfiles.com in the memory of Universal Joint BBS, which had collected many thousands of them before it went down. Data are arranged as one tab-separated document per line where the filename in the scrape is used as the document id. The text of the documents themselves has been stripped of newline and linefeed characters (“\n” and “\r”).

WARNING Some people may find the content of this corpus to be offensive.