An analysis on more than 200.000 Indiegogo Projects

Francisca Dias

Table of Contents

In this report I will explore the Projects on Indiegogo's crowdfunding platform that were listed as of October 2017.

I will introduce some dynamics in visualization where you can hover the mouse over the graphics and get instant information about that particular item.

I would like to answer the following questions:

  • What type of category gets more investment? and more Backers?
  • What currencies are represented in this dataset?
  • How many backers are in this platform? What is their average contribution?
  • What projects got more backers?
  • Are there projects that have partnered with some corporation?

These and other questions will be answered in this report.

Indiegogo is an international crowdfunding platform that offers funding. It allows people to solicit funds for an idea, charity, or start-up business.

Its business model is based on charging a 5% fee on contributions.

This dataset was taken from https://webrobots.io/indiegogo-dataset/.

The data collected is from 2017-10-15.

Below are the features that come in this dataset:

  • title: title of project

  • tagline: project's description

  • cached_collected_pledges_count: number of backers

  • balance: the investment made by backers so far

  • currency_code: the currency where the project is listed

  • amt_time_left: the time left for the project to be closed to funding

  • category_name: project's category

  • collected_percentage: the percentage collected regarding the project's goal

  • partner_name: the name of a corporation that has backed the project. It assumes null when there's none

  • in_forever_funding: some projects are continuously being funded

In [1]:
import pandas as pd
indie1 = pd.read_csv('Indiegogo001.csv')
indie2 = pd.read_csv('Indiegogo002.csv')
indie3 = pd.read_csv('Indiegogo003.csv')
indie4 = pd.read_csv('Indiegogo004.csv')
indie5 = pd.read_csv('Indiegogo005.csv')
indie6 = pd.read_csv('Indiegogo006.csv')
indie7 = pd.read_csv('Indiegogo007.csv')
In [2]:
frames = [indie1, indie2, indie3, indie4, indie5, indie6, indie7]
In [3]:
result = pd.concat(frames)

There are 216,283 rows in this dataset. We could assume that each represents a project, but as you will see in a while, it is not the case.

In [4]:
result.head()
Out[4]:
id title nearest_five_percent tagline cached_collected_pledges_count igg_image_url compressed_image_url balance currency_code amt_time_left ... category_url category_name category_slug card_type collected_percentage partner_name in_forever_funding friend_contributors friend_team_members source_url
0 1561807 Alger - Anti Lost & Forget System 2 Never forget the things that are important to ... 15 https://c1.iggcdn.com/indiegogo-media-prod-cld... https://c1.iggcdn.com/indiegogo-media-prod-cld... $273 USD No time left ... /explore/travel-outdoors Travel & Outdoors travel-outdoors project 1% null False [] [] https://www.indiegogo.com/explore/all?project_...
1 1582714 Portfolio and Lookbook 20 I am reaching the end of my college journey. ... 4 https://c1.iggcdn.com/indiegogo-media-prod-cld... https://c1.iggcdn.com/indiegogo-media-prod-cld... $230 USD No time left ... /explore/fashion-wearables Fashion & Wearables fashion-wearables project 23% null False [] [] https://www.indiegogo.com/explore/all?project_...
2 1616114 dsfpejejejejejejejejejejejeje 2 Banana the potassium for every sport 4 https://c1.iggcdn.com/indiegogo-media-prod-cld... https://c1.iggcdn.com/indiegogo-media-prod-cld... £26 GBP No time left ... /explore/health-fitness Health & Fitness health-fitness project 0% null False [] [] https://www.indiegogo.com/explore/all?project_...
3 1767164 Med-West Legal Defense Fund: Part 2 2 Legal San Diego based medical cannabis company... 5 https://c1.iggcdn.com/indiegogo-media-prod-cld... https://c1.iggcdn.com/indiegogo-media-prod-cld... $1,135 USD No time left ... /explore/health-fitness Health & Fitness health-fitness project 1% null False [] [] https://www.indiegogo.com/explore/all?project_...
4 1776704 The Groggle: Always There Safety Eye Wear 2 An innovative safety eye wear product that bot... 5 https://c1.iggcdn.com/indiegogo-media-prod-cld... https://c1.iggcdn.com/indiegogo-media-prod-cld... $326 CAD No time left ... /explore/fashion-wearables Fashion & Wearables fashion-wearables project 1% null False [] [] https://www.indiegogo.com/explore/all?project_...

5 rows × 21 columns

Below are the data types.

Please notice that balance is considered as an object. This is because the currency symbol comes attached to the amount value. Therefore I will have to clean this feature and convert to numeric.

In [5]:
result.dtypes
Out[5]:
id                                 int64
title                             object
nearest_five_percent               int64
tagline                           object
cached_collected_pledges_count     int64
igg_image_url                     object
compressed_image_url              object
balance                           object
currency_code                     object
amt_time_left                     object
url                               object
category_url                      object
category_name                     object
category_slug                     object
card_type                         object
collected_percentage              object
partner_name                      object
in_forever_funding                  bool
friend_contributors               object
friend_team_members               object
source_url                        object
dtype: object

Much of the time spent analysing and working with a dataset comes from making sure the data is ready for analysis.

By ready I mean that there are no nulls, all column types are correct, no duplicates, selecting the columns I will be working with, and so on.

Below is a description of all steps I have taken to clean this dataset:

  • Concat all files

  • Strip Column Balance so to remove all non-numeric characters

  • Convert Balance to numeric type (before was object)

  • Get the exchange rates for the 5 currencies here represented

  • convert Balance amount to a single currency: USD

  • Select columns I will be using

  • Rename those Columns

Columns I will be working with

  • title
  • tagline
  • cached_collected_pledges_count
  • balance
  • currency_code
  • amt_time_left
  • category_name
  • collected_percentage
  • partner_name
  • in_forever_funding

Extract symbols from column balance

In [6]:
result['balance'] = result['balance'].map(lambda x: x.lstrip('/€$£ACEhNU,.'))
In [7]:
result['balance'] = result['balance'].str.replace(',', '')
In [8]:
result['balance'] = result['balance'].str.replace('.', '')
In [9]:
import numpy as np
result['balance'].astype(np.int64)
Out[9]:
0         273
1         230
2          26
3        1135
4         326
5           0
6           0
7          15
8         290
9         104
10       7386
11        145
12        777
13        316
14        549
15        500
16        790
17         60
18       6624
19         31
20         20
21          0
22         10
23        120
24        200
25        450
26        290
27       2147
28         50
29        305
         ...
30354    3801
30355    3800
30356    3798
30357    3796
30358    3796
30359    3795
30360    3791
30361    3790
30362    3788
30363    3785
30364    3784
30365    3783
30366    3781
30367    3780
30368    3779
30369    3775
30370    3775
30371    3775
30372    3774
30373    3771
30374    3770
30375    3770
30376    3765
30377    3764
30378    3757
30379    3756
30380    3756
30381    3755
30382    3752
30383    3751
Name: balance, Length: 216283, dtype: int64

Steps I have taken here:

  • Make a new dataframe with the exchange rates as of November 26, 2017

  • Concat this new dataframe with the original one, matching the currency

  • Change the Balance data type, from object to numeric

  • Convert Balance to US Dollars so we normalize the amount for all projects

  • Select the colums I will be using

  • And rename those columns for better understanding

In [10]:
# exchange rates for each currency, converted amount in USD, as of November 26, 2017

currencies = ['USD', 'GBP', 'EUR', 'CAD', 'AUD']
conversion = [1.00, 1.33, 1.19, 0.78, 0.76 ]
In [11]:
exchange_rate = [ ('currency_code', currencies),
                     ('conversion_rate', conversion) ]
exchange_rate_table = pd.DataFrame.from_items(exchange_rate)
exchange_rate_table
Out[11]:
currency_code conversion_rate
0 USD 1.00
1 GBP 1.33
2 EUR 1.19
3 CAD 0.78
4 AUD 0.76
In [12]:
new_result = result.merge(exchange_rate_table, on='currency_code')
In [13]:
new_result['balance'] = new_result['balance'].apply(pd.to_numeric, errors='coerce')
In [14]:
# Convert Balance to US Dollars

new_result['Amount Pledged USD'] = new_result.balance * new_result.conversion_rate
  • title
  • tagline
  • cached_collected_pledges_count
  • balance
  • currency_code
  • amt_time_left
  • category_name
  • collected_percentage
  • partner_name
  • in_forever_funding
In [15]:
cols_to_use = ['title', 'tagline', 'cached_collected_pledges_count',
            'currency_code', 'amt_time_left', 'category_name',
            'collected_percentage', 'partner_name', 'in_forever_funding', 'Amount Pledged USD']
In [16]:
new_result = new_result[cols_to_use]
In [17]:
new_result.rename(columns={'title': 'Title',
                    'tagline': 'Description',
                    'cached_collected_pledges_count': 'Numb of Backers',
                    'currency_code': 'Currency Code',
                    'amt_time_left' : 'Any time left',
                    'category_name': 'Category Name',
                    'collected_percentage' : 'Collected Percentage',
                    'partner_name' : 'Partner Name',
                    'in_forever_funding' : 'Forever Funding'}, inplace=True)
In [18]:
new_result.head()
Out[18]:
Title Description Numb of Backers Currency Code Any time left Category Name Collected Percentage Partner Name Forever Funding Amount Pledged USD
0 Alger - Anti Lost & Forget System Never forget the things that are important to ... 15 USD No time left Travel & Outdoors 1% null False 273.0
1 Portfolio and Lookbook I am reaching the end of my college journey. ... 4 USD No time left Fashion & Wearables 23% null False 230.0
2 Med-West Legal Defense Fund: Part 2 Legal San Diego based medical cannabis company... 5 USD No time left Health & Fitness 1% null False 1135.0
3 Customized Titanium Signet Rings Finely milled titanium rings personalized with... 0 USD No time left Fashion & Wearables 0% null False 0.0
4 Scooter Dock - Awesome Kick Scooter Storage Worlds best kick scooter storage solution - Ea... 0 USD No time left Transportation 0% null False 0.0

I decided to make a special case for this problem since most datasets don't come clean.

And since they don't come clean, any duplicates in this dataset can and will ruin the analysis.

Duplicates can jeopardize our estimates. We have to be very careful, otherwise We will do an analysis that is biased and wrong.

We should think that each ID is unique, that is, for each ID corresponds ONLY ONE project.

If we look at the Title though, we will see that there are titles that repeat, but the ID is different. But it´s the same project.

We could have assumed that we were dealing with more than 200.000 projects, but when we remove the duplicates, we are left with only 70.000 projects.

Let us see these duplicates.

If we sort the initial dataframe by the Number of Backers, we can see that the project "Super Troopers 2" repeats 4 times, and the information is the same for all these rows. Therefore we should consider this a duplicate.

In [19]:
check_duplicates = new_result.sort_values(['Numb of Backers'], ascending=[False])
In [20]:
check_duplicates.head()
Out[20]:
Title Description Numb of Backers Currency Code Any time left Category Name Collected Percentage Partner Name Forever Funding Amount Pledged USD
138239 Super Troopers 2 The #SuperTroopers2 campaign is over, but you ... 54554 USD No time left Film 208% null True 4615178.0
133967 Super Troopers 2 The #SuperTroopers2 campaign is over, but you ... 54554 USD No time left Film 208% null True 4615178.0
158050 Super Troopers 2 The #SuperTroopers2 campaign is over, but you ... 54554 USD No time left Film 208% null True 4615178.0
135876 Super Troopers 2 The #SuperTroopers2 campaign is over, but you ... 54554 USD No time left Film 208% null True 4615178.0
126159 Protective iPhone X and 8 Cases - Mous Limitless LAST ORDERS October 16th with FREE Screen Prot... 53134 USD No time left Phones & Accessories 2,780% null True 2395193.0

Remove duplicates

In [21]:
new_result.drop_duplicates('Title', inplace=True)
In [22]:
# new_result.to_csv('new_result.csv')
In [23]:
agg_funcs = {'Amount Pledged USD':np.sum,
            'Numb of Backers':np.sum }
In [24]:
grouped = new_result.groupby('Category Name', as_index=False).agg(agg_funcs)
In [25]:
pd.set_option('display.float_format', lambda x: '%.3f' % x)

Create a new column with Amount Pledged per Million USD

In [26]:
grouped['Amount Pledged in Million USD'] = grouped['Amount Pledged USD'] / 1000000
In [27]:
grouped['Amount Pledged in Million USD'] = grouped['Amount Pledged in Million USD'].round()

Create a new column with Number of Backers per 100.000

In [28]:
grouped['Numb of Backers per 100.000'] = grouped['Numb of Backers'] / 100000
In [29]:
grouped['Numb of Backers per 100.000'] = grouped['Numb of Backers per 100.000'].round()
In [30]:
grouped.head()
Out[30]:
Category Name Amount Pledged USD Numb of Backers Amount Pledged in Million USD Numb of Backers per 100.000
0 Animal Rights 4565619.210 75892 5.000 1.000
1 Art 15973945.170 175971 16.000 2.000
2 Audio 64673166.770 247954 65.000 2.000
3 Camera Gear 26924975.100 76843 27.000 1.000
4 Comics 2968013.730 46732 3.000 0.000
In [31]:
# grouped.to_csv('grouped.csv')
In [32]:
currency_distribution = new_result.groupby(['Currency Code'])[['Currency Code']].count()
In [33]:
# currency_distribution.to_csv('currency.csv')

Here I will answer the questions that I asked in the beginning.

I will also introduce some interactive visualizations, that will make easier for someone to interpret the results.

Below are the first 5 rows in our dataset:

In [34]:
new_result.head()
Out[34]:
Title Description Numb of Backers Currency Code Any time left Category Name Collected Percentage Partner Name Forever Funding Amount Pledged USD
0 Alger - Anti Lost & Forget System Never forget the things that are important to ... 15 USD No time left Travel & Outdoors 1% null False 273.000
1 Portfolio and Lookbook I am reaching the end of my college journey. ... 4 USD No time left Fashion & Wearables 23% null False 230.000
2 Med-West Legal Defense Fund: Part 2 Legal San Diego based medical cannabis company... 5 USD No time left Health & Fitness 1% null False 1135.000
3 Customized Titanium Signet Rings Finely milled titanium rings personalized with... 0 USD No time left Fashion & Wearables 0% null False 0.000
4 Scooter Dock - Awesome Kick Scooter Storage Worlds best kick scooter storage solution - Ea... 0 USD No time left Transportation 0% null False 0.000

Please notice that I will use here the dataset that has been modified to accomodate both Number of backers and amount pledge, respectively per 100.000 and per 1 US Million Dollars.

I also rounded the numbers, so we are left with integers.

In [35]:
from plotly.graph_objs import *
import plotly.offline as py
py.init_notebook_mode(connected=True)



trace1 = {
  "x": ["5.0", "16.0", "65.0", "27.0", "3.0", "2.0", "1.0", "1.0", "12.0", "44.0", "25.0", "9.0", "129.0", "133.0", "19.0", "99.0", "100.0", "2.0", "53.0", "35.0", "122.0", "3.0", "2.0", "24.0", "0.0", "7.0", "29.0", "48.0", "94.0", "20.0", "5.0", "1.0", "12.0"],
  "y": ["1.0", "2.0", "2.0", "1.0", "0.0", "0.0", "0.0", "0.0", "2.0", "4.0", "1.0", "1.0", "4.0", "17.0", "1.0", "5.0", "4.0", "0.0", "5.0", "5.0", "9.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "1.0", "3.0", "2.0", "1.0", "0.0", "2.0"],
  "marker": {
    "autocolorscale": False,
    "cauto": True,
    "cmax": 32,
    "cmin": 0,
    "color": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32"],
    "size": ["5.0", "16.0", "65.0", "27.0", "3.0", "2.0", "1.0", "1.0", "12.0", "44.0", "25.0", "9.0", "129.0", "133.0", "19.0", "99.0", "100.0", "2.0", "53.0", "35.0", "122.0", "3.0", "2.0", "24.0", "0.0", "7.0", "29.0", "48.0", "94.0", "20.0", "5.0", "1.0", "12.0"],
    "sizemode": "area",
    "sizeref": 0.11875,

  },
  "mode": "markers",
  "name": "Numb of Backers per 100.000",
  "text": ["Animal Rights", "Art", "Audio", "Camera Gear", "Comics", "Community Projects", "Creative Works", "Culture", "Dance & Theater", "Education", "Energy & Green Tech", "Environment", "Fashion & Wearables", "Film", "Food & Beverages", "Health & Fitness", "Home", "Human Rights", "Local Businesses", "Music", "Phones & Accessories", "Photography", "Podcasts, Blogs & Vlogs", "Productivity", "Spirituality", "Tabletop Games", "Tech & Innovation", "Transportation", "Travel & Outdoors", "Video Games", "Web Series & TV Shows", "Wellness", "Writing & Publishing"],
  "textsrc": "FranciscaDias:28:774b15",
  "type": "scatter",
  "uid": "8f8dce"
}
data = Data([trace1])
layout = {
  "autosize": True,
  "hovermode": "closest",
  "title": "Distribution of Amount Pledged and # Backers, per Category",
  "xaxis": {
    "autorange": True,
    "range": [-32.5795788669, 150],
    "title": "Amount Pledged in Million USD",
    "type": "linear"
  },
  "yaxis": {
    "autorange": False,
    "range": [-8.5, 30],
    "title": "Numb of Backers per 100.000",
    "type": "linear"
  }
}
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='bubble')

If you hover over the bubles, you can see not only the category for which they belong, but also the number of Backers per 100.000 and the project balance amount in US Million dollars.

  • The category with both more investment and more backers is Film.
  • It is followed by Fashion in terms of investment: it has the second highest investment.
  • Phones and Accessories also comes in second, but in terms of backers, with 900.000 backers.
  • The in between categories are Travel, Health, Audio, Local Business and Transportation.
  • The projects that receive less attention, both in terms of investment and backers are Spiritually, Welness and Podcasts.

Now let us take a look at the currency representation.

It is a good assumption that the projects listed in Canadian Dollars are probably from Canandian entrepreneurs.

Therefore I want to see what countries are reprented in this dataset and their distribution in terms of currency representation.

In [36]:
trace1 = {
  "labels": ["AUD", "CAD", "EUR", "GBP", "USD"],
  "labelssrc": "FranciscaDias:25:f35bf0",
  "marker": {"colors": ["rgb(255, 255, 204)", "rgb(161, 218, 180)", "rgb(65, 182, 196)", "rgb(44, 127, 184)", "rgb(8, 104, 172)", "rgb(37, 52, 148)"]},
  "name": "count",
  "textfont": {"size": 14},
  "type": "pie",
  "uid": "ac71bc",
  "values": ["912", "3673", "3958", "5403", "56545"],
  "valuessrc": "FranciscaDias:25:ca40ef"
}
data = Data([trace1])
layout = {
  "title": "Currency Distribution per Projects",
  "autosize": True,
  "hovermode": "closest"
}
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='pie')
  • 80% of all projects are from the United States. Despite the fact that Indiegogo is and international crowdfunding website, 4 in 5 projects belong to the US
  • After the United States, comes Great Britain with 8% representation in Indiegogo
  • Then Euro, Canadian Dollars and Australian Dollars.
  • There are 7,638.792 Backers in this dataset
  • Each project has on average 108 Backers
  • Each Backers gives on average 150 US Dollars for each Project
In [37]:
print('There are',new_result['Numb of Backers'].sum(),'Backers in this dataset')
print('Each project has on average',round(new_result['Numb of Backers'].sum()/len(new_result)),'Backers')
print('Each Backers gives on average',round(new_result['Amount Pledged USD'].sum()/new_result['Numb of Backers'].sum()),'US Dollars for each Project')
There are 7638792 Backers in this dataset
Each project has on average 108 Backers
Each Backers gives on average 150 US Dollars for each Project
In [38]:
projects_more_backers = new_result.sort_values(['Numb of Backers'], ascending=[False])
In [39]:
projects_more_backers.head()
Out[39]:
Title Description Numb of Backers Currency Code Any time left Category Name Collected Percentage Partner Name Forever Funding Amount Pledged USD
133967 Super Troopers 2 The #SuperTroopers2 campaign is over, but you ... 54554 USD No time left Film 208% null True 4615178.000
124359 Protective iPhone X and 8 Cases - Mous Limitless LAST ORDERS October 16th with FREE Screen Prot... 53133 USD No time left Phones & Accessories 2,780% null True 2395147.000
128416 Solar Roadways Solar panels that you can drive, park and walk... 50162 USD No time left Energy & Green Tech 220% null True 2283418.000
142492 Con Man A new comedy from Alan Tudyk and Nathan Fillio... 46992 USD No time left Web Series & TV Shows 735% null False 3156178.000
130284 Nimuno Loops - The Original Toy Block Tape Instantly transforms virtually any surface int... 42368 USD No time left Home 20,052% null False 1648114.000

Super Troopers 2!

This project received more support in terms of backers than any other project.

54554 Backers rushed into this investment. According to Wikipedia, Super Troopers 2 is an upcoming American crime comedy mystery film.

What is also popular is this Protective Case for Iphone X: 53133 backers want this product!

In [40]:
import plotly.graph_objs as go
data = [go.Bar(
            x = ["Super Troopers 2", "Protective iPhone X", "Solar Roadways", "Con Man", "Nimuno Loops"],
            y = ["54554", "53133", "50162", "46992", "42368"]
    )]

layout = go.Layout(
    title='Projects with more Backers',

)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='basic-bar')

According to Indiegogo's webiste, “Partner” is a corporation or other legal entity that includes approved Campaigns on its dedicated page (“Partner Page”) using the Service.

Let's see how many of these projects are backed by a corporation or partner, and which partner has the most projects.

3 % of all projects have a partner:

In [41]:
print('The percentage of projects that has a partner is:'
      ,round((len(new_result[new_result['Partner Name']!='null'])/len(new_result))*100), '%')
The percentage of projects that has a partner is: 3 %

Who are they?

In [42]:
new_result['Partner Name'].value_counts().nlargest(10)
Out[42]:
null                                                                   68268
Fractured Atlas                                                          444
From the Heart Productions                                               127
Backstage                                                                 68
#GivingTuesday                                                            57
UP Global                                                                 55
Enventys Partners                                                         48
Documentary Organization of Canada                                        41
Agency | 2.0 - The Premier Crowdfunding Agency (Certified Campaign)       36
LaunchBoom                                                                33
Name: Partner Name, dtype: int64

Top 5 Partners Profile

  • Fractured Atlas has 444 projects; It helps artists and arts organizations raise money through grants and tax-deductible donations.
  • From the Heart Productions has 127 projects; it's a non-profit, helps filmmakers get their films funded through our grants, fiscal sponsorship, and film funding classes.
  • Backstage has 68 projects; It connects content-creators with the talent needed to take their projects to the next level.
  • GivingTuesday has 57 projects; It is a movement to create an international day of giving at the beginning of the Christmas and holiday season.
  • UP Global has 55 projects; It is a non-profit company dedicated to fostering entrepreneurship, grassroots leadership, and strong communities.

Indiegogo has this feature/program that is called Forever Funding. It means that backers can continuously fund a project without a deadline.

The Forever Funding model is for companies who’ve already met their funding goal and want to keep raising money.

What is their representation?

In [43]:
new_result['Forever Funding'].value_counts()
Out[43]:
False    64224
True      6267
Name: Forever Funding, dtype: int64

10% of all projects have this feature!

I just did an analysis on all projects that were on the platform Indiegogo in October this year.

I collected some interesting fidings:

  • The type of category that gets simultaneously more investment and more backers is Film. It is followed by Fashion in terms of investment and Phones and Accessories in terms of backers;
  • There are 5 currencies represented in this dataset and the most expressive one is US dollars with 80% of all projects;
  • There are 7,638.792 Backers in this dataset and each Backer gives on average 150 US Dollars for each Project;
  • Super Troopers 2 was the project that got more backers: 54554 Backers rushed into this investment.
  • 3 % of all projects have a partner, in which Fractured Atlas has more projects than any other partner.