Customer Segmentation Analysis¶
Introduction¶
In this section, we will analyze the customers of AdventureWorks Bike Sales to identify different customer segments based on their purchasing behavior, demographics, and other relevant factors. For example, the demographic segments that drive the highest sales will reveal the importance of customer education and income levels in our sales performance. Additionally, we will analyze customer commute preferences, financial situation, age groups and spend patterns, as well as the one-done and repeat customer segments. These all will reveal the insights about the potential customers in the future to target and the marketing strategies to adopt for each segment. By understanding our customers better, we can tailor our marketing efforts, improve customer satisfaction, and ultimately increase sales and profitability.
Customer Segmentation Analysis¶
Income and Education Segmentation¶
6. Which demographic segments (based on Income and Education) drive the highest overall revenue?
Solution Method: To answer this question, we will first calculate the minimum, average, maximum, standard deviation, and median of the YearlyIncome column from the Customers table. Then, we will use these statistics to segment customers into three groups: low-budget customers, mid-budget customers, and high-budget customers. Finally, we will analyze the revenue generated by each segment based on their education level to identify which demographic segments drive the highest overall revenue. This will allow us to group the customers based on dynamic thresholds derived from the data itself, rather than using arbitrary cutoffs, providing a more accurate and meaningful segmentation of our customer base.
Show SQL Code
-- define the thresholds for low, mid, and high budget customers based on the median and standard deviation of YearlyIncome
with median_cte as (
select YearlyIncome,
percentile_cont(0.5) within group (order by YearlyIncome)
over () as median_income
from Customers
)
select
min(YearlyIncome) as min_income,
avg(YearlyIncome) as avg_income,
max(YearlyIncome) as max_income,
stdev(YearlyIncome) as std_income,
max(median_income) as median_income
from median_cte;
-- MIN INCOME : 10,000
-- MAX INCOME : 170,000
-- AVG INCOME : 57,305
-- MEDIAN INCOME : 60,000
-- STD INCOME : 32,285
/* RULES IF-THEN:
- INCOME <= (MEDIAN - STD) -> LOW BUDGET
- INCOME BETWEEN (MEDIAN - STD) AND (MEDIAN + STD) -> MID BUDGET
- INCOME >= (MEDIAN + STD) -> HIGH BUDGET
*/
with customer_info as
(
select
c.CustomerKey as customer_key,
c.YearlyIncome as yearly_income,
c.education,
sum(fis.SalesAmount) as revenue,
count(distinct fis.SalesOrderNumber) as num_orders
from Customers c
left join FactInternetSales fis
on c.CustomerKey = fis.CustomerKey
group by
c.CustomerKey,
c.YearlyIncome,
c.education
),
customer_stats as (
select
cast(avg(YearlyIncome) as float) as avg_income,
cast(STDEV(YearlyIncome) as float) as std_income
from Customers
)
select
case
when yearly_income <= (avg_income - 0.7 * std_income)
then 'low_budget customer'
when yearly_income >= (avg_income + 0.8 * std_income)
then 'high_budget customer'
else 'mid-budget customer'
end as customer_budget_segmentation,
education,
count(distinct customer_key) as num_customers,
cast(sum(revenue) as float) as total_revenue,
cast(sum(num_orders) as float) as total_orders
from customer_info
cross join customer_stats
group by
case
when yearly_income <= (avg_income - 0.7 * std_income)
then 'low_budget customer'
when yearly_income >= (avg_income + 0.8 * std_income)
then 'high_budget customer'
else 'mid-budget customer'
end,
education
order by
total_revenue desc;
Results:
| customer_budget_segmentation | education | num_customers | total_revenue | total_orders |
|---|---|---|---|---|
| mid-budget customer | Bachelors | 3486 | 6018294.6094 | 5427.0 |
| mid-budget customer | Partial College | 2774 | 4108704.5055 | 4151.0 |
| mid-budget customer | Graduate Degree | 2376 | 3833801.3398 | 3515.0 |
| high_budget customer | Bachelors | 1056 | 2427726.4932 | 1758.0 |
| low_budget customer | Partial College | 1526 | 2050975.3804 | 2120.0 |
| mid-budget customer | High School | 1210 | 1768160.6359 | 1776.0 |
| low_budget customer | High School | 1513 | 1617237.6002 | 2043.0 |
| high_budget customer | Partial College | 764 | 1563862.9989 | 1237.0 |
| low_budget customer | Bachelors | 814 | 1454121.6545 | 1223.0 |
| high_budget customer | High School | 571 | 1252627.8325 | 952.0 |
| high_budget customer | Graduate Degree | 471 | 1176592.8865 | 835.0 |
| low_budget customer | Partial High School | 1014 | 756784.115 | 1289.0 |
| high_budget customer | Partial High School | 178 | 470206.8625 | 322.0 |
| low_budget customer | Graduate Degree | 342 | 450166.025 | 486.0 |
| mid-budget customer | Partial High School | 389 | 409414.2814 | 525.0 |
Insights: - The mid-budget customers with a Bachelor's degree contribute the highest overall revenue, followed by mid-budget customers with Partial College education and mid-budget customers with a Graduate Degree. This indicates that customers in the mid-budget segment with higher education levels are driving significant revenue for the company. - High-budget customers with a Bachelor's degree also contribute a substantial amount of revenue, indicating that there is a strong correlation between higher income levels and revenue generation, especially among those with a Bachelor's degree. - Low-budget customers with Partial College education and High School education also contribute a significant amount of revenue, suggesting that there is a sizable customer base in the lower income segment that is generating revenue for the company, particularly those with some level of college education. - Overall, the analysis reveals that both income and education levels play a crucial role in driving revenue for the company, with mid-budget customers with higher education levels being the most significant contributors to overall revenue. This insight can help the company tailor its marketing strategies and product offerings to target these high-value customer segments more effectively.
Commute Distance and Vehicle Ownership Effect¶
7. Does the customer's commute distance and vehicle ownership impact their lifetime value?
Solution Method: To answer this question, we will analyze the relationship between customers' commute distance, vehicle ownership, and their lifetime value (LTV). We will segment customers based on their commute distance (e.g., 0-1 Miles, 1-2 Miles, 2-5 Miles, 5-10 Miles, 10+ Miles) and vehicle ownership status (e.g., no car, 1 car, >1 car). We will then calculate the total revenue generated by each segment and the average revenue per customer to determine if there is a significant impact of commute distance and vehicle ownership on customer lifetime value.
Show SQL Code
with customer_info as (
select
cus.CustomerKey,
cus.CommuteDistance as commute_distance,
cus.NumberCarsOwned as num_cars,
coalesce(sum(fis.SalesAmount),0) as revenue,
coalesce(count(distinct fis.SalesOrderNumber),0) as num_orders
from Customer cus
left join FactInternetSales fis
on fis.CustomerKey = cus.CustomerKey
group by
cus.CustomerKey,
cus.CommuteDistance,
cus.NumberCarsOwned
)
select
case
when num_cars = 1 then '1 car'
when num_cars > 1 then '>1 car'
else 'no car'
end as 'car_ownership',
commute_distance,
count(distinct customerkey) as num_customers,
cast(sum(revenue) as decimal(20,3)) as total_revenue,
sum(num_orders) as total_orders,
cast(sum(revenue) / nullif(count(distinct customerkey),0) as decimal(10,3))
as avg_revenue_per_customer
from customer_info
group by
case
when num_cars = 1 then '1 car'
when num_cars > 1 then '>1 car'
else 'no car'
end,
commute_distance
order by
total_revenue desc,
car_ownership;
Results:
| car_ownership | commute_distance | num_customers | total_revenue | total_orders | avg_revenue_per_customer |
|---|---|---|---|---|---|
| no car | 0-1 Miles | 2708 | 5197358.990 | 4173 | 1919.261 |
| >1 car | 5-10 Miles | 2390 | 3553012.641 | 3619 | 1486.616 |
| >1 car | 10+ Miles | 2175 | 3439578.624 | 3387 | 1581.415 |
| 1 car | 0-1 Miles | 1541 | 3202421.394 | 2465 | 2078.145 |
| >1 car | 0-1 Miles | 2061 | 2807811.841 | 3030 | 1362.354 |
| >1 car | 1-2 Miles | 1794 | 2206559.472 | 2516 | 1229.966 |
| 1 car | 2-5 Miles | 1593 | 2154224.305 | 2188 | 1352.307 |
| >1 car | 2-5 Miles | 943 | 1494274.803 | 1402 | 1584.597 |
| no car | 2-5 Miles | 698 | 1317015.329 | 1030 | 1886.841 |
| 1 car | 5-10 Miles | 725 | 1199003.362 | 1156 | 1653.798 |
| no car | 1-2 Miles | 684 | 1191139.352 | 1053 | 1741.432 |
| 1 car | 1-2 Miles | 754 | 1143909.326 | 1097 | 1517.121 |
| 1 car | 10+ Miles | 270 | 270236.227 | 354 | 1000.875 |
| no car | 5-10 Miles | 99 | 141132.039 | 128 | 1425.576 |
| no car | 10+ Miles | 49 | 40999.518 | 61 | 836.725 |
Insights:
- Customers with no car and a commute distance of 0-1 Miles generate the highest total revenue and have the highest average revenue per customer, indicating that this segment has a strong lifetime value. It is maybe due to the fact that the customers with short distance and no car will likely purchase our bikes for commuting purposes, which may lead to higher frequency of purchases and higher overall spending on bike-related products and services.
- Customers with more than 1 car and a commute distance of 5-10 Miles also generate significant revenue, but their average revenue per customer is lower than the no car segment, suggesting that while they contribute to overall revenue, their individual lifetime value may be lower.
- Customers with 1 car and a commute distance of 0-1 Miles also have a high average revenue per customer, indicating that they are a valuable segment, although their total revenue is lower than the no car segment.
- Customers with longer commute distances (10+ Miles) tend to have lower average revenue per customer, regardless of car ownership, suggesting that longer commute distances may negatively impact customer lifetime value.
- Overall, the analysis indicates that customers with no car and shorter commute distances tend to have higher lifetime value, while those with longer commute distances and more cars may have lower lifetime value. This insight can help the company target marketing efforts towards segments with higher lifetime value and develop strategies to engage customers with longer commute distances to increase their lifetime value.
"Whale" Customers¶
8. Who are our "Whales" (Top 100 Most Valuable Customers), and what is their purchasing frequency?
*Solution Method: Solving this question is straightforward since we just need to calculate the total revenue generated by each customer and their purchasing frequency, then rank them based on their total revenue and identify the top 100 customers. We will also calculate the average number of orders per month for these top customers to understand their purchasing frequency. The Fact Internet Sales table contains the necessary information to perform this analysis, as it includes details about each customer's purchases, including the sales amount and order dates. Therefore, calculating the financials and then grouping by customer id will allow us to identify the top 100 most valuable customers and analyze their purchasing frequency effectively.
Show SQL Code
select top 100
cus.CustomerKey,
cus.FirstName,
cus.LastName,
sum(fis.salesamount) as revenue,
count(distinct fis.salesordernumber) as num_orders,
count(distinct cast(format(fis.orderdate, 'yyyy-MM') as varchar(20)) ) as 'num_months_purchased',
cast(count(distinct fis.salesordernumber) * 1.0 /
nullif(count(distinct format(fis.orderdate, 'yyyy-MM')),0) as decimal(10,3))
as avg_orders_per_month
from Customer cus
left join FactInternetSales fis
on fis.customerkey = cus.customerkey
group by
cus.CustomerKey,
cus.FirstName,
cus.LastName
order by
revenue desc;
Results:
| CustomerKey | FirstName | LastName | revenue | num_orders | num_months_purchased | avg_orders_per_month |
|---|---|---|---|---|---|---|
| 12301 | Nichole | Nara | 13295.3800 | 5 | 5 | 1.000 |
| 12132 | Kaitlyn | Henderson | 13294.2700 | 5 | 5 | 1.000 |
| 12308 | Margaret | He | 13269.2700 | 5 | 5 | 1.000 |
| 12131 | Randall | Dominguez | 13265.9900 | 5 | 4 | 1.250 |
| 12300 | Adriana | Gonzalez | 13242.7000 | 5 | 5 | 1.000 |
| 12321 | Rosa | Hu | 13215.6500 | 5 | 4 | 1.250 |
| 12124 | Brandi | Gill | 13195.6400 | 5 | 5 | 1.000 |
| 12307 | Brad | She | 13173.1900 | 5 | 5 | 1.000 |
| 12296 | Francisco | Sara | 13164.6400 | 5 | 5 | 1.000 |
| 11433 | Maurice | Shan | 12909.6682 | 6 | 5 | 1.200 |
| 11439 | Janet | Munoz | 12489.1696 | 6 | 4 | 1.500 |
| 11241 | Lisa | Cai | 11469.1882 | 7 | 7 | 1.000 |
| 11417 | Lacey | Zheng | 11248.4582 | 7 | 7 | 1.000 |
| 11420 | Jordan | Turner | 11200.7696 | 7 | 6 | 1.167 |
| 11242 | Larry | Munoz | 11068.0082 | 7 | 7 | 1.000 |
| 12655 | Larry | Vazquez | 10899.6200 | 4 | 2 | 2.000 |
| 13263 | Kate | Anand | 10872.0600 | 4 | 3 | 1.333 |
| 12323 | Lawrence | Alonso | 10836.9000 | 4 | 3 | 1.333 |
| 12333 | Terrance | Rodriguez | 10829.2200 | 4 | 4 | 1.000 |
| 12650 | Aaron | Wright | 10813.6300 | 4 | 2 | 2.000 |
| 12631 | Clarence | Gao | 10799.5200 | 4 | 3 | 1.333 |
| 12632 | Bonnie | Nath | 10793.2700 | 4 | 3 | 1.333 |
| 12332 | Andres | Nara | 10789.5300 | 4 | 4 | 1.000 |
| 13405 | Ethan | Bryant | 10778.6100 | 4 | 3 | 1.333 |
| 11245 | Ricky | Vazquez | 10580.3500 | 4 | 4 | 1.000 |
| 11246 | Latasha | Rubio | 10575.3300 | 4 | 4 | 1.000 |
| 11237 | Clarence | Anand | 10566.3800 | 4 | 4 | 1.000 |
| 11425 | Ariana | Gray | 10528.6282 | 6 | 6 | 1.000 |
| 11429 | Marco | Lopez | 10468.3796 | 6 | 4 | 1.500 |
| 11428 | Deanna | Perez | 9954.2600 | 4 | 4 | 1.000 |
| 11427 | Desiree | Dominguez | 9917.9900 | 4 | 4 | 1.000 |
| 11431 | Bryant | Garcia | 9912.9800 | 4 | 3 | 1.333 |
| 11423 | Jasmine | Stewart | 9904.9700 | 4 | 4 | 1.000 |
| 11249 | Cindy | Patel | 9890.2700 | 4 | 4 | 1.000 |
| 11412 | Sydney | Bryant | 9879.9600 | 4 | 4 | 1.000 |
| 14186 | Katrina | Tang | 9796.3200 | 4 | 4 | 1.000 |
| 11421 | Amy | Sun | 9780.0400 | 4 | 4 | 1.000 |
| 13592 | Gabriella | Collins | 9695.7300 | 4 | 3 | 1.333 |
| 13605 | Gerrit | Straatsma | 9614.7300 | 4 | 4 | 1.000 |
| 14185 | Frank | Vazquez | 9523.0025 | 4 | 3 | 1.333 |
| 13575 | Alicia | Shen | 9458.1825 | 4 | 3 | 1.333 |
| 14192 | Ronald | Kapoor | 9451.9125 | 4 | 4 | 1.000 |
| 13577 | Theodore | Torres | 9450.3225 | 4 | 3 | 1.333 |
| 14200 | Colleen | Goel | 9421.9125 | 4 | 4 | 1.000 |
| 13595 | Cole | Stewart | 9390.5325 | 4 | 4 | 1.000 |
| 11432 | Dominique | Prasad | 9330.1996 | 6 | 6 | 1.000 |
| 13257 | Jon | Chander | 8941.2800 | 4 | 4 | 1.000 |
| 13258 | Anne | Dominguez | 8886.5300 | 4 | 4 | 1.000 |
| 12128 | Kristy | Munoz | 8681.1082 | 4 | 4 | 1.000 |
| 12125 | Diana | Ortega | 8595.4096 | 4 | 3 | 1.333 |
| 12129 | Wendy | Alvarez | 8557.4096 | 4 | 4 | 1.000 |
| 13600 | Marie | Sanz | 8460.2500 | 3 | 3 | 1.000 |
| 14427 | Emmanuel | Patel | 8451.2600 | 3 | 3 | 1.000 |
| 15106 | Shannon | Navarro | 8421.2800 | 3 | 3 | 1.000 |
| 14775 | Bonnie | Xie | 8404.2800 | 3 | 3 | 1.000 |
| 14429 | Lacey | He | 8394.2900 | 3 | 3 | 1.000 |
| 15692 | Ryan | Garcia | 8392.3000 | 3 | 2 | 1.500 |
| 15097 | Morgan | Bennett | 8377.0900 | 3 | 3 | 1.000 |
| 13602 | Virginia | Mehta | 8376.2800 | 3 | 2 | 1.500 |
| 13583 | Blake | Butler | 8375.2900 | 3 | 3 | 1.000 |
| 13581 | Albert | Blanco | 8374.7900 | 3 | 3 | 1.000 |
| 13585 | Savannah | Morris | 8368.2900 | 3 | 3 | 1.000 |
| 14940 | Carmen | Rana | 8367.5500 | 3 | 3 | 1.000 |
| 14939 | Cory | Kapoor | 8367.3000 | 3 | 3 | 1.000 |
| 14207 | José | Saraiva | 8361.3000 | 3 | 3 | 1.000 |
| 13584 | Isaiah | Cox | 8361.2900 | 3 | 3 | 1.000 |
| 13591 | Latasha | Alonso | 8355.2700 | 3 | 3 | 1.000 |
| 14830 | Isabella | Ward | 8346.2800 | 3 | 3 | 1.000 |
| 15691 | Julian | Henderson | 8341.2900 | 3 | 3 | 1.000 |
| 14425 | Victor | Carlson | 8332.3000 | 3 | 3 | 1.000 |
| 13576 | Tamara | Lal | 8330.7900 | 3 | 3 | 1.000 |
| 14181 | Adriana | Chandra | 8326.3100 | 3 | 3 | 1.000 |
| 15695 | Bruce | Gomez | 8324.2400 | 3 | 3 | 1.000 |
| 14428 | Cassie | Andersen | 8321.3100 | 3 | 3 | 1.000 |
| 14183 | Robyn | Carlson | 8321.3100 | 3 | 3 | 1.000 |
| 11767 | Meagan | Madan | 8319.5100 | 3 | 3 | 1.000 |
| 14426 | Madison | Hughes | 8319.3000 | 3 | 3 | 1.000 |
| 14189 | Krista | Martin | 8317.3200 | 3 | 3 | 1.000 |
| 14195 | Clinton | Blanco | 8316.2900 | 3 | 3 | 1.000 |
| 14198 | Daisy | Romero | 8308.5900 | 3 | 3 | 1.000 |
| 13590 | Louis | Xie | 8306.3000 | 3 | 3 | 1.000 |
| 15080 | Evelyn | Chandra | 8305.3000 | 3 | 3 | 1.000 |
| 15118 | Nelson | Ortega | 8301.3100 | 3 | 3 | 1.000 |
| 14191 | Derrick | Martin | 8301.3100 | 3 | 3 | 1.000 |
| 15354 | Corey | Kumar | 8297.2900 | 3 | 3 | 1.000 |
| 11112 | Crystal | Wang | 8295.0200 | 3 | 3 | 1.000 |
| 11766 | Candace | Raman | 8280.4800 | 3 | 3 | 1.000 |
| 12338 | Monica | Vance | 8265.0000 | 3 | 3 | 1.000 |
| 11101 | Abby | Sai | 8262.5200 | 3 | 3 | 1.000 |
| 11900 | Byron | Carlson | 8257.0100 | 3 | 3 | 1.000 |
| 12003 | Audrey | Munoz | 8256.5000 | 3 | 3 | 1.000 |
| 11451 | Ruben | Muñoz | 8249.4900 | 3 | 3 | 1.000 |
| 11995 | Kelvin | Carson | 8249.0100 | 3 | 3 | 1.000 |
| 11000 | Jon | Yang | 8248.9900 | 3 | 3 | 1.000 |
| 11058 | Marc | Diaz | 8248.0200 | 3 | 3 | 1.000 |
| 11901 | Stacy | Alvarez | 8248.0100 | 3 | 3 | 1.000 |
| 11986 | Max | Alvarez | 8245.2300 | 3 | 3 | 1.000 |
| 11109 | Ruben | Kapoor | 8245.0100 | 3 | 3 | 1.000 |
| 11120 | Beth | Jiménez | 8242.4900 | 3 | 3 | 1.000 |
| 11446 | Bethany | Chander | 8237.0100 | 3 | 3 | 1.000 |
Insights:
- The top 100 most valuable customers, or "Whales," generate a significant amount of revenue for the company, with the highest revenue being
$13,295.38and the lowest being$8,237.01among the top 100 customers. This indicates that these customers are highly valuable to the company and contribute significantly to its overall revenue. - The purchasing frequency of these top customers varies, with some customers making purchases consistently over several months, while others have a more sporadic purchasing pattern. For example, some customers have an average of 1 order per month, while others have an average of 1.5 or more orders per month. This suggests that while all of these customers are valuable, their purchasing behavior may differ, and the company may need to tailor its marketing strategies to engage with these customers effectively based on their purchasing frequency.
- The majority of the top 100 customers have a consistent purchasing pattern, with many of them making purchases in multiple months, indicating that they are likely loyal customers who have a strong affinity for the company's products. If we are somehow able to make these customers even more loyal, we will be able to increase their lifetime value and further boost the company's revenue.
Age Groups and Spend Analysis¶
9. What is our generational spend distribution (Age Brackets)?
Solution Method: To do accurate analysis I did not consider only the age groups but also the gender and marital status of the customers since the age groups alone will not give us the complete picture of the generational spend distribution. By segmenting customers based on their age, gender and marital status will allow us to understand the purchasing patterns of each group. This is a crucial step as the 30 years old single man will definitely have different purchasing behavior than the 30 years old married man.
To calculate the age of the customers, I have taken the last order date in the sales table as the reference point and then calculated the age of the customers using the DATEDIFF() function which find the difference in years between the birth date and the reference date. This approach allows us to determine the age of the customers at the time of their most recent purchase, which is more relevant for analyzing their spending behavior. Then, I have segmented the customers into different age groups (e.g., <25, 25-35, 36-45, etc.), gender and marital status groups and calculated the total revenue, total orders, and total products purchased for each segment. Segmentation based on the age groups, gender and marital status require just simple, straightforward CASE-WHEN sQL statements, and then we can group by these segments to get the desired results. This analysis will provide insights into which generational segments are driving the most revenue and how their purchasing behavior
Show SQL Code
with get_max_date as (
select
max(OrderDate) as reference_day
from FactInternetSales
),
customer_info as (
select
cus.CustomerKey,
-- we should find age using datediff() function
DATEDIFF(year, cus.BirthDate, reference_day) as customer_age,
cus.Gender as customer_gender,
cus.MaritalStatus as customer_marital_status,
sum(fis.SalesAmount) as customer_revenue,
count(distinct fis.SalesOrderNumber) as customer_total_orders,
sum(fis.OrderQuantity) as customer_total_products
from Customer cus
left join FactInternetSales fis
on cus.CustomerKey = fis.CustomerKey
cross join get_max_date
group by
cus.CustomerKey,
DATEDIFF(year, cus.BirthDate, reference_day),
cus.Gender,
cus.MaritalStatus
)
select
-- segment the customers into ages
case
when customer_age < 25 then '<25'
when Customer_age between 25 and 35 then '25-35'
when customer_age between 36 and 45 then '36-45'
when customer_age between 46 and 55 then '46-55'
when customer_age between 56 and 65 then '56-65'
else '66+'
end as age_group,
case
when customer_gender = 'F' then 'Female'
when customer_gender = 'M' then 'Male'
else 'Other'
end as gender,
case
when customer_marital_status = 'S' then 'Single'
when customer_marital_status = 'M' then 'Married'
else 'Other'
end as marital_status,
count(distinct CustomerKey) as num_customers,
coalesce(sum(customer_revenue),0) as total_revenue,
coalesce(sum(customer_total_orders),0) as total_orders,
coalesce(sum(customer_total_products),0) as total_products
from customer_info
group by
case
when customer_age < 25 then '<25'
when Customer_age between 25 and 35 then '25-35'
when customer_age between 36 and 45 then '36-45'
when customer_age between 46 and 55 then '46-55'
when customer_age between 56 and 65 then '56-65'
else '66+'
end,
case
when customer_gender = 'F' then 'Female'
when customer_gender = 'M' then 'Male'
else 'Other'
end,
customer_marital_status,
case
when customer_marital_status = 'S' then 'Single'
when customer_marital_status = 'M' then 'Married'
else 'Other'
end
order by
total_revenue desc;
*Results:
| age_group | gender | marital_status | num_customers | total_revenue | total_orders | total_products |
|---|---|---|---|---|---|---|
| 36-45 | Female | Single | 1570 | 3220642.3561 | 2450 | 5214 |
| 36-45 | Male | Married | 1653 | 2690691.2759 | 2479 | 5369 |
| 36-45 | Male | Single | 1496 | 2636742.2451 | 2197 | 4850 |
| 36-45 | Female | Married | 1519 | 2540527.0699 | 2400 | 5302 |
| 46-55 | Male | Married | 1431 | 2177651.6181 | 2254 | 4992 |
| 46-55 | Female | Married | 1222 | 2031548.5099 | 1866 | 4071 |
| 25-35 | Female | Single | 1341 | 1868358.5366 | 1960 | 4257 |
| 25-35 | Male | Single | 1267 | 1853233.8674 | 1880 | 4036 |
| 46-55 | Female | Single | 841 | 1663352.2746 | 1265 | 2773 |
| 25-35 | Male | Married | 946 | 1478339.5669 | 1436 | 3060 |
| 46-55 | Male | Single | 785 | 1423610.6286 | 1120 | 2375 |
| 25-35 | Female | Married | 834 | 1300527.9517 | 1260 | 2705 |
| 56-65 | Male | Married | 838 | 1181146.1335 | 1238 | 2740 |
| 56-65 | Female | Married | 790 | 1052145.0857 | 1152 | 2569 |
| 56-65 | Female | Single | 446 | 569557.8937 | 651 | 1462 |
| 56-65 | Male | Single | 353 | 495231.9280 | 489 | 1089 |
| 66+ | Male | Married | 398 | 392528.7789 | 582 | 1342 |
| 66+ | Female | Married | 380 | 342269.7483 | 491 | 1123 |
| 66+ | Female | Single | 190 | 224689.2487 | 247 | 541 |
| 66+ | Male | Single | 184 | 215882.5031 | 242 | 528 |
Insights:
- The age group of 36-45 years old contributes the highest total revenue, with both single and married customers leading in terms of spending. This suggests that this age group is a key demographic for the company, and marketing efforts should be focused on targeting both single and married customers within this age range.
- The 46-55 age group also contributes significantly to total revenue, with married customers leading in terms of spending. This indicates that married customers in this age group are an important segment for the company, and marketing strategies should be tailored to appeal to their preferences and needs.
- The 25-35 age group shows a strong presence of single customers contributing to total revenue, indicating that this segment may have different purchasing behavior compared to married customers. Marketing efforts targeting single customers in this age group may be effective in driving sales.
- The 56-65 and 66+ age groups contribute less to total revenue compared to the younger age groups, but married customers in these age groups still show a significant contribution. This suggests that while these age groups may not be the primary target for the company, there is still potential for growth by targeting married customers within these age ranges.
- Overall, the analysis reveals that the 36-45 age group, particularly single and married customers, is a key demographic for the company in terms of revenue generation. Marketing strategies should be tailored to target this age group effectively, while also considering the preferences and needs of other age groups to maximize overall revenue.
Customer Retention Rate Analysis¶
10. What is our Customer Retention Rate (One-and-Done vs. Repeat Buyers)?
Solution Method: To calculate the customer retention rate, we will first categorize customers into two groups based on their purchasing behavior: "One-and-Done" customers who have made only one purchase and "Repeat Buyers" who have made more than one purchase. We will then calculate the total number of customers in each group and the total revenue generated by each group. Finally, we will calculate the retention rate for each group by dividing the number of customers in that group by the total number of customers and multiplying by 100 to get a percentage. This analysis will provide insights into the proportion of customers who are repeat buyers versus those who are one-and-done, which can help the company understand its customer retention dynamics and identify opportunities for improving customer loyalty. There is an important consideration in the calculation of retention rate, which is to ensure that we are not dividing by zero when calculating the retention rate percentage. To address this, we will use the NULLIF function in SQL to handle cases where the total number of customers is zero, preventing any division by zero errors and ensuring that our retention rate calculation is accurate and robust.
Show SQL Code
with customer_info as
(
select
cus.CustomerKey,
coalesce(count(distinct fis.SalesOrderNumber),0) as num_orders,
sum(fis.SalesAmount) as revenue
from customer cus
left join FactInternetSales fis
on fis.CustomerKey = cus.CustomerKey
group by cus.CustomerKey
),
retention_groups as (
select
case
when num_orders > 1 then 'repeat-buyers'
when num_orders = 1 then 'one-and-done'
else 'no order'
end as retention_group,
count(distinct CustomerKey) as num_customers ,
sum(revenue) as total_revenue
from customer_info
group by
case
when num_orders > 1 then 'repeat-buyers'
when num_orders = 1 then 'one-and-done'
else 'no order'
end
)
select
retention_group,
num_customers,
total_revenue,
cast(num_customers * 1.0 / nullif(sum(num_customers) over (),0) as decimal(10,3))
as retention_rate_pct
from retention_groups
order by
total_revenue desc;
Results:
| retention_group | num_customers | total_revenue | retention_rate_pct |
|---|---|---|---|
| repeat-buyers | 6865 | 22610711.7164 | 0.371 |
| one-and-done | 11619 | 6747965.5043 | 0.629 |
Insights:
- The majority of customers (62.9%) are "One-and-Done" buyers, meaning they have made only one purchase, while 37.1% of customers are "Repeat Buyers" who have made more than one purchase. This indicates that a significant portion of the customer base consists of customers who do not return for additional purchases, which may suggest that there are opportunities to improve customer retention and encourage repeat business.
- Despite being a smaller percentage of the customer base, "Repeat Buyers" contribute a significantly higher total revenue compared to "One-and-Done" buyers. This suggests that "Repeat Buyers" are more valuable to the company in terms of revenue generation, and efforts to increase the retention rate and convert "One-and-Done" buyers into "Repeat Buyers" could have a substantial impact on overall revenue.
- The retention rate percentage indicates that there is room for improvement in customer retention, as a large portion of customers are not returning for additional purchases. The company may want to consider implementing strategies such as loyalty programs, personalized marketing, or improved customer service to encourage repeat purchases and increase the retention rate, ultimately driving more revenue from the existing customer base.
Conclusion¶
In this domain we obtained the following key insights:
- Initially, we analyzed the purchasing behavior of customers segmented by education and income. We have found out that customers with mid-level income and higher education levels (Bachelor's and Graduate Degrees) contribute the most to the overall revenue, indicating that this segment is a key demographic for the company.
- Then, we considered the effect of commmute distance and vehicle ownerships on the purchasing behavior of customers. We have clearly found out that the customers with short distance and no car are our main revenue drivers. That's completely logical as they need bikes for commuting short distances, so they prefer purchasing bikes from our company.
- Moreover, we analyzed the order volume, lifetime value and retention rate of the whale customers - the customers that are in the top 100 based on the money spent for our products. We have found out that there is not a straight pattern among these as some of them purchase frequently but some purchase less. We should aim to make them more loyal and hence our main targeted customers are those in this case.
- Next, we analyzed the generational spend distribution by segmenting customers based on their age, gender and marital status. We have found out that the 36-45 age group, particularly single and married customers, is a key demographic for the company in terms of revenue generation. Marketing strategies should be tailored to target this age group effectively, while also considering the preferences and needs of other age groups to maximize overall revenue.
- Finally, we analyzed the customer retention rate by categorizing customers into "One-and-Done" and "Repeat Buyers". We have found out that the majority of customers are "One-and-Done" buyers, but "Repeat Buyers" contribute significantly higher total revenue. This indicates that there are opportunities to improve customer retention and encourage repeat business, which could have a substantial impact on overall revenue.
In the next section, we will analyze our products and determine which products are the best sellers, which products have the highest profit margins, and how product performance varies across different customer segments. This analysis will help us optimize our product offerings and marketing strategies to maximize revenue and profitability.