An analysis on the Top 4000 most backed projects ever on Kickstarter

Francisca Dias

Table of Contents

In this analysis I will explore the Top 4000 most backed projects ever on Kickstarter and search for interesting insights and trends.

Are these entrepreneurs starting more than one succesfull project?

What type of industry is the most successful on kickstarter? And what is the least successful?

What was the project with more pledged money yet?

Are there words that repeat more often in titles and description?

I will answer these and a lot other questions in this analysis.

Kickstarter is a platform for millions of creators to present their innovative ideas to the public.

This is a win-win situation where creators could accumulate initial fund while the public get access to cutting-edge prototypical products that are not available in the market yet.

This dataset can be found here.

It consists on the following features:

  • amt.pledged: amount pledged

  • by: author

  • category: project category (string)

  • currency: currency type of amount pledged

  • goal: original pledge goal

  • location

  • num.backers: total number of backers

  • num.backers.tier: number of backers

  • pledge.tier: pledge tiers in USD

  • title

  • url

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
%matplotlib inline
plt.style.use('fivethirtyeight')

from wordcloud import WordCloud, STOPWORDS

import warnings
warnings.filterwarnings("ignore")

import re
In [2]:
df = pd.read_csv('most_backed.csv')
df.head(2)
Out[2]:
Unnamed: 0 amt.pledged blurb by category currency goal location num.backers num.backers.tier pledge.tier title url
0 0 8782571.0 \nThis is a card game for people who are into ... Elan Lee Tabletop Games usd 10000.0 Los Angeles, CA 219382 [15505, 202934, 200, 5] [20.0, 35.0, 100.0, 500.0] Exploding Kittens /projects/elanlee/exploding-kittens
1 1 6465690.0 \nAn unusually addicting, high-quality desk to... Matthew and Mark McLachlan Product Design usd 15000.0 Denver, CO 154926 [788, 250, 43073, 21796, 41727, 21627, 12215, ... [1.0, 14.0, 19.0, 19.0, 35.0, 35.0, 79.0, 79.0... Fidget Cube: A Vinyl Desk Toy /projects/antsylabs/fidget-cube-a-vinyl-desk-toy
In [3]:
df.columns
Out[3]:
Index(['Unnamed: 0', 'amt.pledged', 'blurb', 'by', 'category', 'currency',
       'goal', 'location', 'num.backers', 'num.backers.tier', 'pledge.tier',
       'title', 'url'],
      dtype='object')
  • Renaming Columns,
  • define data types,
  • see all currencies that are represented in this dataset,
  • build a data frame with all exchange rates between USD and other currencies,
  • convert the pledge amount and goal to one single currency, USD.,
  • normalize the columns that contain text, that is, blurb and title,
  • define the columns that i will be using from now on and discard the ones that I will not need for my analysis.
In [4]:
df.rename(columns={'amt.pledged': 'amt_pledged',
                    'end.time': 'end_time',
                    'num.backers': 'num_backers',
                   'num.backers.tier': 'num_backers_tier',
                    'pledge.tier' : 'pledge_tier'}, inplace=True)
In [5]:
df.dtypes
Out[5]:
Unnamed: 0            int64
amt_pledged         float64
blurb                object
by                   object
category             object
currency             object
goal                float64
location             object
num_backers           int64
num_backers_tier     object
pledge_tier          object
title                object
url                  object
dtype: object

What is the most common currency on Kickstarter?

In [6]:
df['currency'].value_counts()
Out[6]:
usd    3438
gbp     252
cad     128
eur      96
aud      52
sek      14
nzd      10
dkk       7
chf       3
Name: currency, dtype: int64

The exchange rates as of November 22, 2017

In [7]:
# conversion rates for each currency, converted amount in USD

currencies = ['usd', 'sek', 'nzd', 'gbp', 'eur', 'dkk', 'chf', 'cad', 'aud']
conversion = [1.00, 0.11, 0.68, 1.32, 1.17, 0.15, 1.01, 0.78, 0.75 ]

Convert the exchange rate between USD and other currencies to a dataframe

In [8]:
exchange_rate = [ ('currency', currencies),
                     ('conversion_rate', conversion) ]
exchange_rate_table = pd.DataFrame.from_items(exchange_rate)
exchange_rate_table
Out[8]:
currency conversion_rate
0 usd 1.00
1 sek 0.11
2 nzd 0.68
3 gbp 1.32
4 eur 1.17
5 dkk 0.15
6 chf 1.01
7 cad 0.78
8 aud 0.75

Merge the currency dataframe with the dataset

In [9]:
final_df = df.merge(exchange_rate_table, on='currency')
In [10]:
final_df.head(2)
Out[10]:
Unnamed: 0 amt_pledged blurb by category currency goal location num_backers num_backers_tier pledge_tier title url conversion_rate
0 0 8782571.0 \nThis is a card game for people who are into ... Elan Lee Tabletop Games usd 10000.0 Los Angeles, CA 219382 [15505, 202934, 200, 5] [20.0, 35.0, 100.0, 500.0] Exploding Kittens /projects/elanlee/exploding-kittens 1.0
1 1 6465690.0 \nAn unusually addicting, high-quality desk to... Matthew and Mark McLachlan Product Design usd 15000.0 Denver, CO 154926 [788, 250, 43073, 21796, 41727, 21627, 12215, ... [1.0, 14.0, 19.0, 19.0, 35.0, 35.0, 79.0, 79.0... Fidget Cube: A Vinyl Desk Toy /projects/antsylabs/fidget-cube-a-vinyl-desk-toy 1.0

Converting all amounts to one common currency

In [11]:
# Convert amt_pledged and goal

final_df['amount_pledged_usd'] = final_df.amt_pledged * final_df.conversion_rate
In [12]:
final_df['goal_usd'] = final_df.goal * final_df.conversion_rate
In [13]:
final_df.columns
Out[13]:
Index(['Unnamed: 0', 'amt_pledged', 'blurb', 'by', 'category', 'currency',
       'goal', 'location', 'num_backers', 'num_backers_tier', 'pledge_tier',
       'title', 'url', 'conversion_rate', 'amount_pledged_usd', 'goal_usd'],
      dtype='object')

Function to process the text in our dataset

In [14]:
def normalize_text(s):
    s = s.lower()

    # remove punctuation that is not word-internal (e.g., hyphens, apostrophes)
    s = re.sub('\s\W',' ',s)
    s = re.sub('\W\s',' ',s)

    # make sure we didn't introduce any double spaces
    s = re.sub('\s+',' ',s)

    return s

Clean Blurb

In [15]:
final_df['clean_blurb'] = [normalize_text(s) for s in final_df['blurb']]

Clean Title

In [16]:
final_df['clean_title'] = [normalize_text(s) for s in final_df['title']]

Define the columns that I need and discard the ones I will not need from now

In [17]:
cols_to_use = ['amount_pledged_usd', 'goal_usd', 'clean_blurb', 
            'clean_title', 'by', 'category', 'location', 'num_backers']
In [18]:
clean_df = final_df[cols_to_use]

This is a rich dataset, and many insights can be taken here.

I will narrow down to just a few questions that I would like to have answered by the end of the analysis.

  • Are there entrepreneurs that start more than one succesfull projects or it is not common?
  • What type of products are sold in this platform and what type of industry is the most successful on kickstarter?
  • are entrepreneurs successful after these crowdfunding campaigns?
  • interesting findings during the process of analysis!

In this dataset we have projects from all around the world.

Fortunately we also have the cities where these projects come from, so below are the top ten cities that have more representation on the most backed projects ever on Kickstarter.

In [19]:
plt.subplots(figsize=(11,6))
sns.countplot('location',data=clean_df,palette='inferno',edgecolor=sns.color_palette('dark',7),order=clean_df['location'].value_counts()[:10].index)
plt.xticks(rotation=90)
plt.title('Top Locations with more projects', fontsize=20)
plt.ylabel('Number of Kickstarter Projects')
plt.xlabel('Location')
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.show()

San Francisco leads the ranking, followed by LA and NY. Seattle and London come next.

The top 4 cities with more projects belongs to the US.

Actually, if London wasn't here, the top 9 (at least) would belong to the United States.

What about the industry that has more projects?

In [20]:
plt.subplots(figsize=(11,6))
sns.countplot('category',data=clean_df,palette='inferno',edgecolor=sns.color_palette('dark',7),order=clean_df['category'].value_counts()[:10].index)
plt.xticks(rotation=90)
plt.title('Top Industry with more backed projects', fontsize=20)
plt.ylabel('Number of Kickstarter Projects')
plt.xlabel('Industry')
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.show()

We have 115 types of industry here represented.

Among those 115, Product Design is number one, followed by Tabletop Games and Video Games. Hardware and Technology come next.

What about the industry that has less projects?

Below are the top 12 industries that are less represented.

Please note the y axis. All these industries have just one project assigned to them.

In [21]:
plt.subplots(figsize=(11,6))
sns.countplot('category',data=clean_df,order=clean_df['category'].value_counts()[-12:].index, palette='RdBu')
plt.xticks(rotation=90)
plt.title('Top Industry with less backed projects', fontsize=20)
plt.ylabel('Number of Kickstarter Projects')
plt.xlabel('Industry')
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.show()

More than one project ratio

There are 461 authors that have more than one project.

It represents 11.5% of all entrepreneurs.

In [22]:
author_more_one_project = len(clean_df['by'].value_counts()[clean_df['by'].value_counts()>1])
In [23]:
author_more_one_project
Out[23]:
461
In [24]:
percent = (author_more_one_project/len(clean_df))*100
percent
Out[24]:
11.525

Top authors with more projects

In [25]:
pd.set_option('display.float_format', lambda x: '%.3f' % x)
In [26]:
plt.subplots(figsize=(11,6))
sns.countplot('by',data=clean_df,order=clean_df['by'].value_counts()[:10].index, palette='RdBu')
plt.xticks(rotation=90)
plt.title('Top Authors with more projects', fontsize=20)
plt.ylabel('Top Authors with more projects')
plt.xlabel('Number of Kickstarter Projects')
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.show()

As we can see from the graph above, the top 10 entrepreneurs* with more projects have more than 8 projects.

CoolMiniOrNot leads the ranking with an impressive number of projects: 24.

According to the web, this company is the Internet's largest gallery of painted miniatures, with a large repository of how-to articles on miniature painting.

So far we did an analysis based on absolute values, that is, we didn't take into account the amount pledge averaged by category or by the number of backers.

Let's do this!

Average pledge by each category

In [27]:
average_pledge_each_category = pd.pivot_table(clean_df, index= 'category', values= "amount_pledged_usd")
In [28]:
average_pledge_each_category.sort_values('amount_pledged_usd', ascending=False)
Out[28]:
amount_pledged_usd
category
Television 5764229.000
Gaming Hardware 2215906.667
World Music 785111.000
Sound 782088.588
3D Printing 748282.441
Typography 747961.000
Flight 698532.333
Narrative Film 632290.675
Action 630019.000
Sculpture 550389.410
Faith 538103.000
Architecture 517252.000
Space Exploration 496919.354
Web 491656.745
Wearables 446895.453
Mixed Media 445630.720
Drama 442313.000
Camera Equipment 423681.653
Technology 410683.455
Hardware 379238.890
Drinks 360616.327
Product Design 358966.076
Robots 348233.757
Food 308440.077
Science Fiction 303154.552
Photobooks 301918.350
Apparel 300264.534
Video Games 275573.504
Gadgets 273833.193
Tabletop Games 269771.701
... ...
Electronic Music 104204.304
Audio 103539.640
Fabrication Tools 102864.000
Software 101226.913
Shorts 100161.895
Nonfiction 99016.917
Anthologies 98088.618
Playing Cards 96679.680
Academic 91474.000
Comic Books 90432.828
Poetry 87370.060
Food Trucks 85470.000
Video 85176.000
Periodicals 84678.816
Apps 82937.071
Cookbooks 80451.360
Photo 70301.000
Journalism 66065.999
Conceptual Art 65955.280
Digital Art 65652.120
Jazz 60526.000
Zines 60431.000
Installations 58916.000
Calendars 58664.000
Fiction 58263.846
Literary Journals 53128.500
Vegan 52907.000
Stationery 47165.000
Young Adult 38257.830
Music Videos 33837.000

115 rows × 1 columns

Who would say that television is the most pledged category?

Taking a closer looks, well, it makes sense. There is just one project in this category and the amount pledge was 5,764.229.

In [29]:
final_df[final_df['category'] == 'Television']
Out[29]:
Unnamed: 0 amt_pledged blurb by category currency goal location num_backers num_backers_tier pledge_tier title url conversion_rate amount_pledged_usd goal_usd clean_blurb clean_title
17 19 5764229.000 \nYou did it: you brought back MYSTERY SCIENCE... Joel Hodgson Television usd 2000000.000 Minneapolis, MN 48270 [5319, 3024, 5269, 7452, 1649, 3543, 11148, 8,... [10.0, 25.0, 35.0, 50.0, 75.0, 85.0, 100.0, 10... Bring Back MYSTERY SCIENCE THEATER 3000 /projects/mst3k/bringbackmst3k 1.000 5764229.000 2000000.000 you did it you brought back mystery science t... bring back mystery science theater 3000

Therefore the winner is:

Joel Gordon Hodgson

According to Wikipedia, Joel Gordon Hodgson is an American writer, comedian and television actor.

He is best known for creating Mystery Science Theater 3000 and starring in it as the character Joel Robinson.

In [30]:
final_df[final_df['category'] == 'Music Videos']
Out[30]:
Unnamed: 0 amt_pledged blurb by category currency goal location num_backers num_backers_tier pledge_tier title url conversion_rate amount_pledged_usd goal_usd clean_blurb clean_title
2329 2706 33837.000 \n'If I Were Enlightened' is funded. OMGoodnes... Donnalou Stevens Music Videos usd 10000.000 Oakland, CA 1541 [147, 291, 405, 295, 78, 44, 3, 9, 1, 2, 0, 0] [1.0, 5.0, 10.0, 25.0, 50.0, 100.0, 100.0, 250... If I Were Enlightened /projects/418425358/if-i-were-enlightened 1.000 33837.000 10000.000 if i were enlightened is funded omgoodness le... if i were enlightened

The less "fortunate" is:

Donnalou Stevens

According to Google, she is a singer with a couple of hits already.

Average backers by each category

What do backers like the most?

In [31]:
average_pledge_each_category = pd.pivot_table(clean_df, index= 'category', values= "num_backers")
In [32]:
average_pledge_each_category
Out[32]:
num_backers
category
3D Printing 3394.130
Academic 1461.000
Accessories 3097.550
Action 17713.000
Animation 3650.837
Anthologies 2001.217
Apparel 2673.781
Apps 2589.353
Architecture 3575.000
Art 2786.600
Art Books 2110.971
Audio 1572.500
Calendars 1142.500
Camera Equipment 2300.115
Children's Books 3088.842
Childrenswear 2317.500
Chiptune 4187.333
Civic Design 2081.400
Classical Music 2324.500
Comedy 3628.533
Comic Books 1618.600
Comics 2674.810
Conceptual Art 1391.000
Cookbooks 2292.429
Country & Folk 1675.667
Crafts 3214.000
DIY Electronics 3326.452
Dance 4133.000
Design 2827.626
Digital Art 1383.000
... ...
Ready-to-wear 2052.333
Restaurants 1880.250
Robots 2363.722
Rock 2654.917
Romance 1334.000
Science Fiction 3717.400
Sculpture 1784.500
Shorts 1767.250
Small Batch 2817.750
Software 2539.037
Sound 4068.538
Space Exploration 6947.800
Spaces 1512.500
Stationery 1326.000
Tabletop Games 3316.513
Technology 3742.000
Television 48270.000
Theater 2093.600
Thrillers 2147.000
Typography 8609.000
Vegan 1302.000
Video 1183.000
Video Games 5725.193
Wearables 3175.172
Web 9830.077
Webcomics 2009.576
Webseries 2584.562
World Music 6518.500
Young Adult 1199.000
Zines 1153.000

115 rows × 1 columns

The 3D printing industry is trendy, attracting on average in this category, 3394 backers.

Then comes Academic and Accessories.

In the top bottom is World Music, Young Adult and Zines.

In [33]:
final_df[final_df['category'] == 'Young Adult']
Out[33]:
Unnamed: 0 amt_pledged blurb by category currency goal location num_backers num_backers_tier pledge_tier title url conversion_rate amount_pledged_usd goal_usd clean_blurb clean_title
3960 3649 32699.000 \nEver been glued to a crocodile? The most emb... Jim Jourdane Young Adult eur 8700.000 Angouleme, France 1199 [127, 648, 144, 184, 69, 9, 5] [8.0, 26.0, 33.0, 39.0, 61.0, 84.0, 195.0] Fieldwork Fail: When Science Goes Messy - BOOK /projects/953074743/fieldwork-fail-when-scienc... 1.170 38257.830 10179.000 ever been glued to a crocodile the most embar... fieldwork fail when science goes messy book

Since we previously cleaned the title and blurb using the function def normalize_text, we are now ready to extract some insights from both features.

I will create a new dataframe containg only the text on the dataset.

I will count the words for both title and blurb.

As expected, blurb (aka, description) has more words than the title itself.

On average, description has 20 words. The most common are: Game, new, world, first, card.

On average, title has no more that 5 words. The most common words here are: Game, card, smart, first, playing.

In [34]:
final_df.clean_blurb.head(2)
Out[34]:
0     this is a card game for people who are into k...
1     an unusually addicting high-quality desk toy ...
Name: clean_blurb, dtype: object
In [35]:
final_df.clean_title.head(2)
Out[35]:
0               exploding kittens
1    fidget cube a vinyl desk toy
Name: clean_title, dtype: object

Create a new dataframe : title_and_blurb_analysis dataframe

In [36]:
cols_to_analyse = ['category', 'clean_blurb', 'clean_title']
In [37]:
title_and_blurb_analysis = final_df[cols_to_analyse]
In [38]:
title_and_blurb_analysis.head()
Out[38]:
category clean_blurb clean_title
0 Tabletop Games this is a card game for people who are into k... exploding kittens
1 Product Design an unusually addicting high-quality desk toy ... fidget cube a vinyl desk toy
2 Web bring reading rainbow’s library of interactiv... bring reading rainbow back for every child eve...
3 Narrative Film updated this is it we're making a veronica ma... the veronica mars movie project
4 Video Games an adventure game from tim schafer double fin... double fine adventure

Blurb analysis

In [39]:
#Counts the number of characters 
title_and_blurb_analysis['char_count_blurb'] = title_and_blurb_analysis['clean_blurb'].str.len()

#Splits all characters into words and does a list of these words 
title_and_blurb_analysis['words_blurb'] = title_and_blurb_analysis['clean_blurb'].str.split(' ')

# Counts the number of words 
title_and_blurb_analysis['word_count_blurb'] = title_and_blurb_analysis['words_blurb'].str.len()

Average number of words on description

In [40]:
title_and_blurb_analysis.word_count_blurb.mean()
Out[40]:
20.7705

Function to convert the description to a list

In [41]:
class1 = title_and_blurb_analysis['clean_blurb'].tolist()
string1 = ''
for i in range(len(class1)):
    string1 += class1[i]

WordCloud on Title: the most common words on Titles

In [42]:
wordcloud1 = WordCloud(stopwords=STOPWORDS,
                       background_color='#007599',
                        max_words=30
                        ).generate(string1)
plt.figure(figsize=(10,8))
plt.imshow(wordcloud1)
plt.axis('off')
plt.show()

Title analysis

In [43]:
#Counts the number of characters 
title_and_blurb_analysis['char_count_title'] = title_and_blurb_analysis['clean_title'].str.len()

#Splits all characters into words and does a list of these words 
title_and_blurb_analysis['words_title'] = title_and_blurb_analysis['clean_title'].str.split(' ')

# Counts the number of words 
title_and_blurb_analysis['word_count_title'] = title_and_blurb_analysis['words_title'].str.len()

Average number of words on title

In [44]:
title_and_blurb_analysis.word_count_title.mean()
Out[44]:
5.94625

Function to convert the description to a list

In [45]:
class2 = title_and_blurb_analysis['clean_title'].tolist()
string2 = ''
for i in range(len(class2)):
    string2 += class2[i]

WordCloud on Title: the most common words on Titles

In [46]:
wordcloud2 = WordCloud(stopwords=STOPWORDS,
                       background_color='#b30086',
                        max_words=30

                      ).generate(string2)
plt.figure(figsize=(10,8))
plt.imshow(wordcloud2)
plt.axis('off')
plt.show()