Corpus of Erotica Stories
Overview
Excellent resource for working with natural language processing and machine learning. This corpus consists of 4771 raw text erotica stories collected from www.textfiles.com/sex/EROTICA. A logical flow from the encouragement of writing on BBSes, people have been writing some form of erotica or sexual narrative for others for quite some time. With the advent of Fidonet and later Usenet, these stories achieved wider and wider distribution. Unfortunately, the nature of erotica is that it is often uncredited, undated, and hard to fix in time. As a result, you might be looking at stories much older or much newer than you might think.
Format
A VERY Large amount of these stories came to textfiles.com in the memory of Universal Joint BBS, which had collected many thousands of them before it went down. Data are arranged as one tab-separated document per line where the filename in the scrape is used as the document id. The text of the documents themselves has been stripped of newline and linefeed characters (“\n” and “\r”).
WARNING Some people may find the content of this corpus to be offensive.
Application Gallery
Do you have an application, visualization or otherwise great use of this data?
Submit it now, and be featured here!
Infochimps Platform
Use this data on the Infochimps Big Data Platform to unlock:
- Advanced analytical capabilities
- Hosting for customer databases
- Access to tools such as Hadoop, Pig, and R
- …and more to come!
Tags
Categories
Stats
| Added by: | Ganglion | |
|---|---|---|
| Created: | about 1 year ago | |
| Updated: | about 1 year ago | |
Share
