Introduction

Welcome to Just Fanfiction Statistics. In this mini-project, we gather data from the website fanfiction.net in order to gain some insight on the online fanfiction community. For those unfamiliar with fanfiction, it is any form of fiction in which the characters or settings are directly taken from another existing work. These works are typically created by fans of the original work, hence the name "fanfiction".

What is exciting about fanfiction is that ever since the rise of the internet, it essentially operated on a free market. Unlike the standard publishing industry, there is no middle-man, no gatekeeper, no moderators, no contract, little to no regulation, and very few barriers to entry. Common inefficiencies, distortions, and biases (eg. sexism, racism) are also limited or absent under this system. Anyone can write a story, and anyone can read it... for free!

What this creates is something known in economics as perfect competition. Stories are "priced" -- we will go into what this means, given that no money is being exchanged, later -- exactly at where supply (the stories authors are willing to write) meets demand (the stories readers are willing to read). This gives us invaluable insight into, say, what our next bestseller will be. Indeed, for those who didn't know, E.L. James' Fifty Shades of Grey began as an extremely popular fanfiction... and then reached massive commercial success with over 125 million copies sold.

Project breakdown

For this project, the below gives an outline of our final objectives.

Understanding the fanfiction community

  • Characterize the users on the site
  • Predict who will write what stories
  • Predict who will read what stories

Understanding fanfiction stories

  • Characterize the stories being published
  • Predict what types of stories will be successful
  • Tell how well a story is performing relative to its peers

Cool things to figure out

  • Figure out the sentiment of a review
  • Find the most influential author on the site
  • Recommend stories based off what a user has favorited
  • Characterize the content of successful stories

Note that is is an on-going project, and the results will be published in the same sequence that they are performed. Also note that some of the objectives on the list requires technologies that we currently do not have. We will leave an update when this has changed.