An analysis on more than 200.000 Indiegogo Projects¶

Francisca Dias

Table of Contents¶

Introduction

Overview

Libraries

Data Cleaning

Data wrangling

Duplicates

Exploratory Data Analysis

Project Profile

Backers Profile

The Winner project is...

Partner Name

Forever Funding

Conclusions

Introduction ¶

In this report I will explore the Projects on Indiegogo's crowdfunding platform that were listed as of October 2017.

I will introduce some dynamics in visualization where you can hover the mouse over the graphics and get instant information about that particular item.

I would like to answer the following questions:

What type of category gets more investment? and more Backers?

What currencies are represented in this dataset?

How many backers are in this platform? What is their average contribution?

What projects got more backers?

Are there projects that have partnered with some corporation?

These and other questions will be answered in this report.

Overview ¶

Indiegogo is an international crowdfunding platform that offers funding. It allows people to solicit funds for an idea, charity, or start-up business.

Its business model is based on charging a 5% fee on contributions.

This dataset was taken from https://webrobots.io/indiegogo-dataset/.

The data collected is from 2017-10-15.

Below are the features that come in this dataset:

title: title of project
tagline: project's description
cached_collected_pledges_count: number of backers
balance: the investment made by backers so far
currency_code: the currency where the project is listed
amt_time_left: the time left for the project to be closed to funding
category_name: project's category
collected_percentage: the percentage collected regarding the project's goal
partner_name: the name of a corporation that has backed the project. It assumes null when there's none
in_forever_funding: some projects are continuously being funded

Libraries ¶

import pandas as pd
indie1 = pd.read_csv('Indiegogo001.csv')
indie2 = pd.read_csv('Indiegogo002.csv')
indie3 = pd.read_csv('Indiegogo003.csv')
indie4 = pd.read_csv('Indiegogo004.csv')
indie5 = pd.read_csv('Indiegogo005.csv')
indie6 = pd.read_csv('Indiegogo006.csv')
indie7 = pd.read_csv('Indiegogo007.csv')

frames = [indie1, indie2, indie3, indie4, indie5, indie6, indie7]

result = pd.concat(frames)

There are 216,283 rows in this dataset. We could assume that each represents a project, but as you will see in a while, it is not the case.

result.head()

Below are the data types.

Please notice that balance is considered as an object. This is because the currency symbol comes attached to the amount value. Therefore I will have to clean this feature and convert to numeric.

result.dtypes

id                                 int64
title                             object
nearest_five_percent               int64
tagline                           object
cached_collected_pledges_count     int64
igg_image_url                     object
compressed_image_url              object
balance                           object
currency_code                     object
amt_time_left                     object
url                               object
category_url                      object
category_name                     object
category_slug                     object
card_type                         object
collected_percentage              object
partner_name                      object
in_forever_funding                  bool
friend_contributors               object
friend_team_members               object
source_url                        object
dtype: object

Data Cleaning ¶

Much of the time spent analysing and working with a dataset comes from making sure the data is ready for analysis.

By ready I mean that there are no nulls, all column types are correct, no duplicates, selecting the columns I will be working with, and so on.

Below is a description of all steps I have taken to clean this dataset:

Concat all files
Strip Column Balance so to remove all non-numeric characters
Convert Balance to numeric type (before was object)
Get the exchange rates for the 5 currencies here represented
convert Balance amount to a single currency: USD
Select columns I will be using
Rename those Columns

Columns I will be working with

title
tagline
cached_collected_pledges_count
balance
currency_code
amt_time_left
category_name
collected_percentage
partner_name
in_forever_funding

Extract symbols from column balance

result['balance'] = result['balance'].map(lambda x: x.lstrip('/€$£ACEhNU,.'))

result['balance'] = result['balance'].str.replace(',', '')

result['balance'] = result['balance'].str.replace('.', '')

import numpy as np
result['balance'].astype(np.int64)

0         273
1         230
2          26
3        1135
4         326
5           0
6           0
7          15
8         290
9         104
10       7386
11        145
12        777
13        316
14        549
15        500
16        790
17         60
18       6624
19         31
20         20
21          0
22         10
23        120
24        200
25        450
26        290
27       2147
28         50
29        305
         ...
30354    3801
30355    3800
30356    3798
30357    3796
30358    3796
30359    3795
30360    3791
30361    3790
30362    3788
30363    3785
30364    3784
30365    3783
30366    3781
30367    3780
30368    3779
30369    3775
30370    3775
30371    3775
30372    3774
30373    3771
30374    3770
30375    3770
30376    3765
30377    3764
30378    3757
30379    3756
30380    3756
30381    3755
30382    3752
30383    3751
Name: balance, Length: 216283, dtype: int64

Data wrangling ¶

Steps I have taken here:

Make a new dataframe with the exchange rates as of November 26, 2017
Concat this new dataframe with the original one, matching the currency
Change the Balance data type, from object to numeric
Convert Balance to US Dollars so we normalize the amount for all projects
Select the colums I will be using
And rename those columns for better understanding

# exchange rates for each currency, converted amount in USD, as of November 26, 2017

currencies = ['USD', 'GBP', 'EUR', 'CAD', 'AUD']
conversion = [1.00, 1.33, 1.19, 0.78, 0.76 ]

exchange_rate = [ ('currency_code', currencies),
                     ('conversion_rate', conversion) ]
exchange_rate_table = pd.DataFrame.from_items(exchange_rate)
exchange_rate_table

new_result = result.merge(exchange_rate_table, on='currency_code')

new_result['balance'] = new_result['balance'].apply(pd.to_numeric, errors='coerce')

# Convert Balance to US Dollars

new_result['Amount Pledged USD'] = new_result.balance * new_result.conversion_rate

title
tagline
cached_collected_pledges_count
balance
currency_code
amt_time_left
category_name
collected_percentage
partner_name
in_forever_funding

cols_to_use = ['title', 'tagline', 'cached_collected_pledges_count',
            'currency_code', 'amt_time_left', 'category_name',
            'collected_percentage', 'partner_name', 'in_forever_funding', 'Amount Pledged USD']

new_result = new_result[cols_to_use]

new_result.rename(columns={'title': 'Title',
                    'tagline': 'Description',
                    'cached_collected_pledges_count': 'Numb of Backers',
                    'currency_code': 'Currency Code',
                    'amt_time_left' : 'Any time left',
                    'category_name': 'Category Name',
                    'collected_percentage' : 'Collected Percentage',
                    'partner_name' : 'Partner Name',
                    'in_forever_funding' : 'Forever Funding'}, inplace=True)

new_result.head()

Duplicates ¶

I decided to make a special case for this problem since most datasets don't come clean.

And since they don't come clean, any duplicates in this dataset can and will ruin the analysis.

Duplicates can jeopardize our estimates. We have to be very careful, otherwise We will do an analysis that is biased and wrong.

We should think that each ID is unique, that is, for each ID corresponds ONLY ONE project.

If we look at the Title though, we will see that there are titles that repeat, but the ID is different. But it´s the same project.

We could have assumed that we were dealing with more than 200.000 projects, but when we remove the duplicates, we are left with only 70.000 projects.

Let us see these duplicates.

If we sort the initial dataframe by the Number of Backers, we can see that the project "Super Troopers 2" repeats 4 times, and the information is the same for all these rows. Therefore we should consider this a duplicate.

check_duplicates = new_result.sort_values(['Numb of Backers'], ascending=[False])

check_duplicates.head()

Remove duplicates

new_result.drop_duplicates('Title', inplace=True)

# new_result.to_csv('new_result.csv')

agg_funcs = {'Amount Pledged USD':np.sum,
            'Numb of Backers':np.sum }

grouped = new_result.groupby('Category Name', as_index=False).agg(agg_funcs)

pd.set_option('display.float_format', lambda x: '%.3f' % x)

Create a new column with Amount Pledged per Million USD

grouped['Amount Pledged in Million USD'] = grouped['Amount Pledged USD'] / 1000000

grouped['Amount Pledged in Million USD'] = grouped['Amount Pledged in Million USD'].round()

Create a new column with Number of Backers per 100.000

grouped['Numb of Backers per 100.000'] = grouped['Numb of Backers'] / 100000

grouped['Numb of Backers per 100.000'] = grouped['Numb of Backers per 100.000'].round()

grouped.head()

# grouped.to_csv('grouped.csv')

currency_distribution = new_result.groupby(['Currency Code'])[['Currency Code']].count()

# currency_distribution.to_csv('currency.csv')

Exploratory Data Analysis ¶

Here I will answer the questions that I asked in the beginning.

I will also introduce some interactive visualizations, that will make easier for someone to interpret the results.

Below are the first 5 rows in our dataset:

new_result.head()

Project Profile ¶

Please notice that I will use here the dataset that has been modified to accomodate both Number of backers and amount pledge, respectively per 100.000 and per 1 US Million Dollars.

I also rounded the numbers, so we are left with integers.

from plotly.graph_objs import *
import plotly.offline as py
py.init_notebook_mode(connected=True)



trace1 = {
  "x": ["5.0", "16.0", "65.0", "27.0", "3.0", "2.0", "1.0", "1.0", "12.0", "44.0", "25.0", "9.0", "129.0", "133.0", "19.0", "99.0", "100.0", "2.0", "53.0", "35.0", "122.0", "3.0", "2.0", "24.0", "0.0", "7.0", "29.0", "48.0", "94.0", "20.0", "5.0", "1.0", "12.0"],
  "y": ["1.0", "2.0", "2.0", "1.0", "0.0", "0.0", "0.0", "0.0", "2.0", "4.0", "1.0", "1.0", "4.0", "17.0", "1.0", "5.0", "4.0", "0.0", "5.0", "5.0", "9.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "1.0", "3.0", "2.0", "1.0", "0.0", "2.0"],
  "marker": {
    "autocolorscale": False,
    "cauto": True,
    "cmax": 32,
    "cmin": 0,
    "color": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32"],
    "size": ["5.0", "16.0", "65.0", "27.0", "3.0", "2.0", "1.0", "1.0", "12.0", "44.0", "25.0", "9.0", "129.0", "133.0", "19.0", "99.0", "100.0", "2.0", "53.0", "35.0", "122.0", "3.0", "2.0", "24.0", "0.0", "7.0", "29.0", "48.0", "94.0", "20.0", "5.0", "1.0", "12.0"],
    "sizemode": "area",
    "sizeref": 0.11875,

  },
  "mode": "markers",
  "name": "Numb of Backers per 100.000",
  "text": ["Animal Rights", "Art", "Audio", "Camera Gear", "Comics", "Community Projects", "Creative Works", "Culture", "Dance & Theater", "Education", "Energy & Green Tech", "Environment", "Fashion & Wearables", "Film", "Food & Beverages", "Health & Fitness", "Home", "Human Rights", "Local Businesses", "Music", "Phones & Accessories", "Photography", "Podcasts, Blogs & Vlogs", "Productivity", "Spirituality", "Tabletop Games", "Tech & Innovation", "Transportation", "Travel & Outdoors", "Video Games", "Web Series & TV Shows", "Wellness", "Writing & Publishing"],
  "textsrc": "FranciscaDias:28:774b15",
  "type": "scatter",
  "uid": "8f8dce"
}
data = Data([trace1])
layout = {
  "autosize": True,
  "hovermode": "closest",
  "title": "Distribution of Amount Pledged and # Backers, per Category",
  "xaxis": {
    "autorange": True,
    "range": [-32.5795788669, 150],
    "title": "Amount Pledged in Million USD",
    "type": "linear"
  },
  "yaxis": {
    "autorange": False,
    "range": [-8.5, 30],
    "title": "Numb of Backers per 100.000",
    "type": "linear"
  }
}
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='bubble')

If you hover over the bubles, you can see not only the category for which they belong, but also the number of Backers per 100.000 and the project balance amount in US Million dollars.

The category with both more investment and more backers is Film.

It is followed by Fashion in terms of investment: it has the second highest investment.

Phones and Accessories also comes in second, but in terms of backers, with 900.000 backers.

The in between categories are Travel, Health, Audio, Local Business and Transportation.

The projects that receive less attention, both in terms of investment and backers are Spiritually, Welness and Podcasts.

Now let us take a look at the currency representation.

It is a good assumption that the projects listed in Canadian Dollars are probably from Canandian entrepreneurs.

Therefore I want to see what countries are reprented in this dataset and their distribution in terms of currency representation.

trace1 = {
  "labels": ["AUD", "CAD", "EUR", "GBP", "USD"],
  "labelssrc": "FranciscaDias:25:f35bf0",
  "marker": {"colors": ["rgb(255, 255, 204)", "rgb(161, 218, 180)", "rgb(65, 182, 196)", "rgb(44, 127, 184)", "rgb(8, 104, 172)", "rgb(37, 52, 148)"]},
  "name": "count",
  "textfont": {"size": 14},
  "type": "pie",
  "uid": "ac71bc",
  "values": ["912", "3673", "3958", "5403", "56545"],
  "valuessrc": "FranciscaDias:25:ca40ef"
}
data = Data([trace1])
layout = {
  "title": "Currency Distribution per Projects",
  "autosize": True,
  "hovermode": "closest"
}
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='pie')

80% of all projects are from the United States. Despite the fact that Indiegogo is and international crowdfunding website, 4 in 5 projects belong to the US

After the United States, comes Great Britain with 8% representation in Indiegogo

Then Euro, Canadian Dollars and Australian Dollars.

Backers Profile ¶

There are 7,638.792 Backers in this dataset

Each project has on average 108 Backers

Each Backers gives on average 150 US Dollars for each Project

print('There are',new_result['Numb of Backers'].sum(),'Backers in this dataset')
print('Each project has on average',round(new_result['Numb of Backers'].sum()/len(new_result)),'Backers')
print('Each Backers gives on average',round(new_result['Amount Pledged USD'].sum()/new_result['Numb of Backers'].sum()),'US Dollars for each Project')

There are 7638792 Backers in this dataset
Each project has on average 108 Backers
Each Backers gives on average 150 US Dollars for each Project

projects_more_backers = new_result.sort_values(['Numb of Backers'], ascending=[False])

projects_more_backers.head()

The Winner project is...¶

Super Troopers 2!

This project received more support in terms of backers than any other project.

54554 Backers rushed into this investment. According to Wikipedia, Super Troopers 2 is an upcoming American crime comedy mystery film.

What is also popular is this Protective Case for Iphone X: 53133 backers want this product!

import plotly.graph_objs as go
data = [go.Bar(
            x = ["Super Troopers 2", "Protective iPhone X", "Solar Roadways", "Con Man", "Nimuno Loops"],
            y = ["54554", "53133", "50162", "46992", "42368"]
    )]

layout = go.Layout(
    title='Projects with more Backers',

)

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='basic-bar')

Partner Name ¶

According to Indiegogo's webiste, “Partner” is a corporation or other legal entity that includes approved Campaigns on its dedicated page (“Partner Page”) using the Service.

Let's see how many of these projects are backed by a corporation or partner, and which partner has the most projects.

3 % of all projects have a partner:

print('The percentage of projects that has a partner is:'
      ,round((len(new_result[new_result['Partner Name']!='null'])/len(new_result))*100), '%')

The percentage of projects that has a partner is: 3 %

Who are they?

new_result['Partner Name'].value_counts().nlargest(10)

null                                                                   68268
Fractured Atlas                                                          444
From the Heart Productions                                               127
Backstage                                                                 68
#GivingTuesday                                                            57
UP Global                                                                 55
Enventys Partners                                                         48
Documentary Organization of Canada                                        41
Agency | 2.0 - The Premier Crowdfunding Agency (Certified Campaign)       36
LaunchBoom                                                                33
Name: Partner Name, dtype: int64

Top 5 Partners Profile

Fractured Atlas has 444 projects; It helps artists and arts organizations raise money through grants and tax-deductible donations.

From the Heart Productions has 127 projects; it's a non-profit, helps filmmakers get their films funded through our grants, fiscal sponsorship, and film funding classes.

Backstage has 68 projects; It connects content-creators with the talent needed to take their projects to the next level.

GivingTuesday has 57 projects; It is a movement to create an international day of giving at the beginning of the Christmas and holiday season.

UP Global has 55 projects; It is a non-profit company dedicated to fostering entrepreneurship, grassroots leadership, and strong communities.

Forever Funding ¶

Indiegogo has this feature/program that is called Forever Funding. It means that backers can continuously fund a project without a deadline.

The Forever Funding model is for companies who’ve already met their funding goal and want to keep raising money.

What is their representation?

new_result['Forever Funding'].value_counts()

False    64224
True      6267
Name: Forever Funding, dtype: int64

10% of all projects have this feature!

Conclusions ¶

I just did an analysis on all projects that were on the platform Indiegogo in October this year.

I collected some interesting fidings:

The type of category that gets simultaneously more investment and more backers is Film. It is followed by Fashion in terms of investment and Phones and Accessories in terms of backers;

There are 5 currencies represented in this dataset and the most expressive one is US dollars with 80% of all projects;

There are 7,638.792 Backers in this dataset and each Backer gives on average 150 US Dollars for each Project;

Super Troopers 2 was the project that got more backers: 54554 Backers rushed into this investment.

3 % of all projects have a partner, in which Fractured Atlas has more projects than any other partner.

	id	title	nearest_five_percent	tagline	cached_collected_pledges_count	igg_image_url	compressed_image_url	balance	currency_code	amt_time_left	...	category_url	category_name	category_slug	card_type	collected_percentage	partner_name	in_forever_funding	friend_contributors	friend_team_members	source_url
0	1561807	Alger - Anti Lost & Forget System	2	Never forget the things that are important to ...	15	https://c1.iggcdn.com/indiegogo-media-prod-cld...	https://c1.iggcdn.com/indiegogo-media-prod-cld...	$273	USD	No time left	...	/explore/travel-outdoors	Travel & Outdoors	travel-outdoors	project	1%	null	False	[]	[]	https://www.indiegogo.com/explore/all?project_...
1	1582714	Portfolio and Lookbook	20	I am reaching the end of my college journey. ...	4	https://c1.iggcdn.com/indiegogo-media-prod-cld...	https://c1.iggcdn.com/indiegogo-media-prod-cld...	$230	USD	No time left	...	/explore/fashion-wearables	Fashion & Wearables	fashion-wearables	project	23%	null	False	[]	[]	https://www.indiegogo.com/explore/all?project_...
2	1616114	dsfpejejejejejejejejejejejeje	2	Banana the potassium for every sport	4	https://c1.iggcdn.com/indiegogo-media-prod-cld...	https://c1.iggcdn.com/indiegogo-media-prod-cld...	£26	GBP	No time left	...	/explore/health-fitness	Health & Fitness	health-fitness	project	0%	null	False	[]	[]	https://www.indiegogo.com/explore/all?project_...
3	1767164	Med-West Legal Defense Fund: Part 2	2	Legal San Diego based medical cannabis company...	5	https://c1.iggcdn.com/indiegogo-media-prod-cld...	https://c1.iggcdn.com/indiegogo-media-prod-cld...	$1,135	USD	No time left	...	/explore/health-fitness	Health & Fitness	health-fitness	project	1%	null	False	[]	[]	https://www.indiegogo.com/explore/all?project_...
4	1776704	The Groggle: Always There Safety Eye Wear	2	An innovative safety eye wear product that bot...	5	https://c1.iggcdn.com/indiegogo-media-prod-cld...	https://c1.iggcdn.com/indiegogo-media-prod-cld...	$326	CAD	No time left	...	/explore/fashion-wearables	Fashion & Wearables	fashion-wearables	project	1%	null	False	[]	[]	https://www.indiegogo.com/explore/all?project_...

	Category Name	Amount Pledged USD	Numb of Backers	Amount Pledged in Million USD	Numb of Backers per 100.000
0	Animal Rights	4565619.210	75892	5.000	1.000
1	Art	15973945.170	175971	16.000	2.000
2	Audio	64673166.770	247954	65.000	2.000
3	Camera Gear	26924975.100	76843	27.000	1.000
4	Comics	2968013.730	46732	3.000	0.000

	Title	Description	Numb of Backers	Currency Code	Any time left	Category Name	Collected Percentage	Partner Name	Forever Funding	Amount Pledged USD
138239	Super Troopers 2	The #SuperTroopers2 campaign is over, but you ...	54554	USD	No time left	Film	208%	null	True	4615178.0
133967	Super Troopers 2	The #SuperTroopers2 campaign is over, but you ...	54554	USD	No time left	Film	208%	null	True	4615178.0
158050	Super Troopers 2	The #SuperTroopers2 campaign is over, but you ...	54554	USD	No time left	Film	208%	null	True	4615178.0
135876	Super Troopers 2	The #SuperTroopers2 campaign is over, but you ...	54554	USD	No time left	Film	208%	null	True	4615178.0
126159	Protective iPhone X and 8 Cases - Mous Limitless	LAST ORDERS October 16th with FREE Screen Prot...	53134	USD	No time left	Phones & Accessories	2,780%	null	True	2395193.0

	currency_code	conversion_rate
0	USD	1.00
1	GBP	1.33
2	EUR	1.19
3	CAD	0.78
4	AUD	0.76