Skip to content

Customer Segmentation Analysis

Introduction

In this section, we will analyze the customers of AdventureWorks Bike Sales to identify different customer segments based on their purchasing behavior, demographics, and other relevant factors. For example, the demographic segments that drive the highest sales will reveal the importance of customer education and income levels in our sales performance. Additionally, we will analyze customer commute preferences, financial situation, age groups and spend patterns, as well as the one-done and repeat customer segments. These all will reveal the insights about the potential customers in the future to target and the marketing strategies to adopt for each segment. By understanding our customers better, we can tailor our marketing efforts, improve customer satisfaction, and ultimately increase sales and profitability.

Customer Segmentation Analysis

Income and Education Segmentation

6. Which demographic segments (based on Income and Education) drive the highest overall revenue?

Solution Method: To answer this question, we will first calculate the minimum, average, maximum, standard deviation, and median of the YearlyIncome column from the Customers table. Then, we will use these statistics to segment customers into three groups: low-budget customers, mid-budget customers, and high-budget customers. Finally, we will analyze the revenue generated by each segment based on their education level to identify which demographic segments drive the highest overall revenue. This will allow us to group the customers based on dynamic thresholds derived from the data itself, rather than using arbitrary cutoffs, providing a more accurate and meaningful segmentation of our customer base.

Show SQL Code
-- define the thresholds for low, mid, and high budget customers based on the median and standard deviation of YearlyIncome

with median_cte as (
    select YearlyIncome,
           percentile_cont(0.5) within group (order by YearlyIncome) 
               over () as median_income
    from Customers
)
select 
    min(YearlyIncome) as min_income,
    avg(YearlyIncome) as avg_income,
    max(YearlyIncome) as max_income,
    stdev(YearlyIncome) as std_income,
    max(median_income) as median_income
from median_cte;


-- MIN INCOME : 10,000
-- MAX INCOME : 170,000
-- AVG INCOME : 57,305
-- MEDIAN INCOME : 60,000
-- STD INCOME : 32,285

/* RULES IF-THEN:
- INCOME <= (MEDIAN - STD) -> LOW BUDGET
- INCOME BETWEEN (MEDIAN - STD) AND (MEDIAN + STD) -> MID BUDGET
- INCOME >= (MEDIAN + STD) -> HIGH BUDGET
*/

with customer_info as 
(
    select 
        c.CustomerKey as customer_key,
        c.YearlyIncome as yearly_income,
        c.education,
        sum(fis.SalesAmount) as revenue,
        count(distinct fis.SalesOrderNumber) as num_orders
    from Customers c
    left join FactInternetSales fis
        on c.CustomerKey = fis.CustomerKey
    group by
        c.CustomerKey,
        c.YearlyIncome,
        c.education
),

customer_stats as (
    select 
        cast(avg(YearlyIncome) as float) as avg_income,
        cast(STDEV(YearlyIncome) as float) as std_income
    from Customers
)

select
    case 
        when yearly_income <= (avg_income - 0.7 * std_income) 
            then 'low_budget customer'
        when yearly_income >= (avg_income + 0.8 * std_income) 
            then 'high_budget customer'
        else 'mid-budget customer'
    end as customer_budget_segmentation, 

    education, 
    count(distinct customer_key) as num_customers,
    cast(sum(revenue) as float) as total_revenue, 
    cast(sum(num_orders) as float) as total_orders
from customer_info 
cross join customer_stats
group by
    case 
        when yearly_income <= (avg_income - 0.7 * std_income) 
            then 'low_budget customer'
        when yearly_income >= (avg_income + 0.8 * std_income) 
            then 'high_budget customer'
        else 'mid-budget customer'
    end,
    education
order by 
    total_revenue desc;

Results:

customer_budget_segmentation education num_customers total_revenue total_orders
mid-budget customer Bachelors 3486 6018294.6094 5427.0
mid-budget customer Partial College 2774 4108704.5055 4151.0
mid-budget customer Graduate Degree 2376 3833801.3398 3515.0
high_budget customer Bachelors 1056 2427726.4932 1758.0
low_budget customer Partial College 1526 2050975.3804 2120.0
mid-budget customer High School 1210 1768160.6359 1776.0
low_budget customer High School 1513 1617237.6002 2043.0
high_budget customer Partial College 764 1563862.9989 1237.0
low_budget customer Bachelors 814 1454121.6545 1223.0
high_budget customer High School 571 1252627.8325 952.0
high_budget customer Graduate Degree 471 1176592.8865 835.0
low_budget customer Partial High School 1014 756784.115 1289.0
high_budget customer Partial High School 178 470206.8625 322.0
low_budget customer Graduate Degree 342 450166.025 486.0
mid-budget customer Partial High School 389 409414.2814 525.0

Insights: - The mid-budget customers with a Bachelor's degree contribute the highest overall revenue, followed by mid-budget customers with Partial College education and mid-budget customers with a Graduate Degree. This indicates that customers in the mid-budget segment with higher education levels are driving significant revenue for the company. - High-budget customers with a Bachelor's degree also contribute a substantial amount of revenue, indicating that there is a strong correlation between higher income levels and revenue generation, especially among those with a Bachelor's degree. - Low-budget customers with Partial College education and High School education also contribute a significant amount of revenue, suggesting that there is a sizable customer base in the lower income segment that is generating revenue for the company, particularly those with some level of college education. - Overall, the analysis reveals that both income and education levels play a crucial role in driving revenue for the company, with mid-budget customers with higher education levels being the most significant contributors to overall revenue. This insight can help the company tailor its marketing strategies and product offerings to target these high-value customer segments more effectively.

Commute Distance and Vehicle Ownership Effect

7. Does the customer's commute distance and vehicle ownership impact their lifetime value?

Solution Method: To answer this question, we will analyze the relationship between customers' commute distance, vehicle ownership, and their lifetime value (LTV). We will segment customers based on their commute distance (e.g., 0-1 Miles, 1-2 Miles, 2-5 Miles, 5-10 Miles, 10+ Miles) and vehicle ownership status (e.g., no car, 1 car, >1 car). We will then calculate the total revenue generated by each segment and the average revenue per customer to determine if there is a significant impact of commute distance and vehicle ownership on customer lifetime value.

Show SQL Code
with customer_info as (
    select
        cus.CustomerKey,
        cus.CommuteDistance as commute_distance,
        cus.NumberCarsOwned as num_cars,
        coalesce(sum(fis.SalesAmount),0) as revenue, 
        coalesce(count(distinct fis.SalesOrderNumber),0) as num_orders
    from Customer cus
    left join FactInternetSales fis
        on fis.CustomerKey = cus.CustomerKey
    group by 
        cus.CustomerKey,
        cus.CommuteDistance,
        cus.NumberCarsOwned
)
select
    case  
        when num_cars = 1 then '1 car'
        when num_cars > 1 then '>1 car'
        else 'no car'
    end as 'car_ownership',
    commute_distance,
    count(distinct customerkey) as num_customers,
    cast(sum(revenue) as decimal(20,3)) as total_revenue,
    sum(num_orders) as total_orders,
    cast(sum(revenue) / nullif(count(distinct customerkey),0) as decimal(10,3)) 
        as avg_revenue_per_customer
from customer_info
group by 
    case  
        when num_cars = 1 then '1 car'
        when num_cars > 1 then '>1 car'
        else 'no car'
    end,
    commute_distance
order by
    total_revenue desc,
    car_ownership;

Results:

car_ownership commute_distance num_customers total_revenue total_orders avg_revenue_per_customer
no car 0-1 Miles 2708 5197358.990 4173 1919.261
>1 car 5-10 Miles 2390 3553012.641 3619 1486.616
>1 car 10+ Miles 2175 3439578.624 3387 1581.415
1 car 0-1 Miles 1541 3202421.394 2465 2078.145
>1 car 0-1 Miles 2061 2807811.841 3030 1362.354
>1 car 1-2 Miles 1794 2206559.472 2516 1229.966
1 car 2-5 Miles 1593 2154224.305 2188 1352.307
>1 car 2-5 Miles 943 1494274.803 1402 1584.597
no car 2-5 Miles 698 1317015.329 1030 1886.841
1 car 5-10 Miles 725 1199003.362 1156 1653.798
no car 1-2 Miles 684 1191139.352 1053 1741.432
1 car 1-2 Miles 754 1143909.326 1097 1517.121
1 car 10+ Miles 270 270236.227 354 1000.875
no car 5-10 Miles 99 141132.039 128 1425.576
no car 10+ Miles 49 40999.518 61 836.725

Insights:

  • Customers with no car and a commute distance of 0-1 Miles generate the highest total revenue and have the highest average revenue per customer, indicating that this segment has a strong lifetime value. It is maybe due to the fact that the customers with short distance and no car will likely purchase our bikes for commuting purposes, which may lead to higher frequency of purchases and higher overall spending on bike-related products and services.
  • Customers with more than 1 car and a commute distance of 5-10 Miles also generate significant revenue, but their average revenue per customer is lower than the no car segment, suggesting that while they contribute to overall revenue, their individual lifetime value may be lower.
  • Customers with 1 car and a commute distance of 0-1 Miles also have a high average revenue per customer, indicating that they are a valuable segment, although their total revenue is lower than the no car segment.
  • Customers with longer commute distances (10+ Miles) tend to have lower average revenue per customer, regardless of car ownership, suggesting that longer commute distances may negatively impact customer lifetime value.
  • Overall, the analysis indicates that customers with no car and shorter commute distances tend to have higher lifetime value, while those with longer commute distances and more cars may have lower lifetime value. This insight can help the company target marketing efforts towards segments with higher lifetime value and develop strategies to engage customers with longer commute distances to increase their lifetime value.

"Whale" Customers

8. Who are our "Whales" (Top 100 Most Valuable Customers), and what is their purchasing frequency?

*Solution Method: Solving this question is straightforward since we just need to calculate the total revenue generated by each customer and their purchasing frequency, then rank them based on their total revenue and identify the top 100 customers. We will also calculate the average number of orders per month for these top customers to understand their purchasing frequency. The Fact Internet Sales table contains the necessary information to perform this analysis, as it includes details about each customer's purchases, including the sales amount and order dates. Therefore, calculating the financials and then grouping by customer id will allow us to identify the top 100 most valuable customers and analyze their purchasing frequency effectively.

Show SQL Code
select top 100

    cus.CustomerKey,
    cus.FirstName,
    cus.LastName,
    sum(fis.salesamount) as revenue, 
    count(distinct fis.salesordernumber) as num_orders,
    count(distinct cast(format(fis.orderdate, 'yyyy-MM') as varchar(20)) ) as 'num_months_purchased',
    cast(count(distinct fis.salesordernumber) * 1.0 / 
        nullif(count(distinct format(fis.orderdate, 'yyyy-MM')),0) as decimal(10,3))
        as avg_orders_per_month

from Customer cus
    left join FactInternetSales fis 
        on fis.customerkey = cus.customerkey
group by 
    cus.CustomerKey,
    cus.FirstName,
    cus.LastName
order by 
    revenue desc;

Results:

CustomerKey FirstName LastName revenue num_orders num_months_purchased avg_orders_per_month
12301 Nichole Nara 13295.3800 5 5 1.000
12132 Kaitlyn Henderson 13294.2700 5 5 1.000
12308 Margaret He 13269.2700 5 5 1.000
12131 Randall Dominguez 13265.9900 5 4 1.250
12300 Adriana Gonzalez 13242.7000 5 5 1.000
12321 Rosa Hu 13215.6500 5 4 1.250
12124 Brandi Gill 13195.6400 5 5 1.000
12307 Brad She 13173.1900 5 5 1.000
12296 Francisco Sara 13164.6400 5 5 1.000
11433 Maurice Shan 12909.6682 6 5 1.200
11439 Janet Munoz 12489.1696 6 4 1.500
11241 Lisa Cai 11469.1882 7 7 1.000
11417 Lacey Zheng 11248.4582 7 7 1.000
11420 Jordan Turner 11200.7696 7 6 1.167
11242 Larry Munoz 11068.0082 7 7 1.000
12655 Larry Vazquez 10899.6200 4 2 2.000
13263 Kate Anand 10872.0600 4 3 1.333
12323 Lawrence Alonso 10836.9000 4 3 1.333
12333 Terrance Rodriguez 10829.2200 4 4 1.000
12650 Aaron Wright 10813.6300 4 2 2.000
12631 Clarence Gao 10799.5200 4 3 1.333
12632 Bonnie Nath 10793.2700 4 3 1.333
12332 Andres Nara 10789.5300 4 4 1.000
13405 Ethan Bryant 10778.6100 4 3 1.333
11245 Ricky Vazquez 10580.3500 4 4 1.000
11246 Latasha Rubio 10575.3300 4 4 1.000
11237 Clarence Anand 10566.3800 4 4 1.000
11425 Ariana Gray 10528.6282 6 6 1.000
11429 Marco Lopez 10468.3796 6 4 1.500
11428 Deanna Perez 9954.2600 4 4 1.000
11427 Desiree Dominguez 9917.9900 4 4 1.000
11431 Bryant Garcia 9912.9800 4 3 1.333
11423 Jasmine Stewart 9904.9700 4 4 1.000
11249 Cindy Patel 9890.2700 4 4 1.000
11412 Sydney Bryant 9879.9600 4 4 1.000
14186 Katrina Tang 9796.3200 4 4 1.000
11421 Amy Sun 9780.0400 4 4 1.000
13592 Gabriella Collins 9695.7300 4 3 1.333
13605 Gerrit Straatsma 9614.7300 4 4 1.000
14185 Frank Vazquez 9523.0025 4 3 1.333
13575 Alicia Shen 9458.1825 4 3 1.333
14192 Ronald Kapoor 9451.9125 4 4 1.000
13577 Theodore Torres 9450.3225 4 3 1.333
14200 Colleen Goel 9421.9125 4 4 1.000
13595 Cole Stewart 9390.5325 4 4 1.000
11432 Dominique Prasad 9330.1996 6 6 1.000
13257 Jon Chander 8941.2800 4 4 1.000
13258 Anne Dominguez 8886.5300 4 4 1.000
12128 Kristy Munoz 8681.1082 4 4 1.000
12125 Diana Ortega 8595.4096 4 3 1.333
12129 Wendy Alvarez 8557.4096 4 4 1.000
13600 Marie Sanz 8460.2500 3 3 1.000
14427 Emmanuel Patel 8451.2600 3 3 1.000
15106 Shannon Navarro 8421.2800 3 3 1.000
14775 Bonnie Xie 8404.2800 3 3 1.000
14429 Lacey He 8394.2900 3 3 1.000
15692 Ryan Garcia 8392.3000 3 2 1.500
15097 Morgan Bennett 8377.0900 3 3 1.000
13602 Virginia Mehta 8376.2800 3 2 1.500
13583 Blake Butler 8375.2900 3 3 1.000
13581 Albert Blanco 8374.7900 3 3 1.000
13585 Savannah Morris 8368.2900 3 3 1.000
14940 Carmen Rana 8367.5500 3 3 1.000
14939 Cory Kapoor 8367.3000 3 3 1.000
14207 José Saraiva 8361.3000 3 3 1.000
13584 Isaiah Cox 8361.2900 3 3 1.000
13591 Latasha Alonso 8355.2700 3 3 1.000
14830 Isabella Ward 8346.2800 3 3 1.000
15691 Julian Henderson 8341.2900 3 3 1.000
14425 Victor Carlson 8332.3000 3 3 1.000
13576 Tamara Lal 8330.7900 3 3 1.000
14181 Adriana Chandra 8326.3100 3 3 1.000
15695 Bruce Gomez 8324.2400 3 3 1.000
14428 Cassie Andersen 8321.3100 3 3 1.000
14183 Robyn Carlson 8321.3100 3 3 1.000
11767 Meagan Madan 8319.5100 3 3 1.000
14426 Madison Hughes 8319.3000 3 3 1.000
14189 Krista Martin 8317.3200 3 3 1.000
14195 Clinton Blanco 8316.2900 3 3 1.000
14198 Daisy Romero 8308.5900 3 3 1.000
13590 Louis Xie 8306.3000 3 3 1.000
15080 Evelyn Chandra 8305.3000 3 3 1.000
15118 Nelson Ortega 8301.3100 3 3 1.000
14191 Derrick Martin 8301.3100 3 3 1.000
15354 Corey Kumar 8297.2900 3 3 1.000
11112 Crystal Wang 8295.0200 3 3 1.000
11766 Candace Raman 8280.4800 3 3 1.000
12338 Monica Vance 8265.0000 3 3 1.000
11101 Abby Sai 8262.5200 3 3 1.000
11900 Byron Carlson 8257.0100 3 3 1.000
12003 Audrey Munoz 8256.5000 3 3 1.000
11451 Ruben Muñoz 8249.4900 3 3 1.000
11995 Kelvin Carson 8249.0100 3 3 1.000
11000 Jon Yang 8248.9900 3 3 1.000
11058 Marc Diaz 8248.0200 3 3 1.000
11901 Stacy Alvarez 8248.0100 3 3 1.000
11986 Max Alvarez 8245.2300 3 3 1.000
11109 Ruben Kapoor 8245.0100 3 3 1.000
11120 Beth Jiménez 8242.4900 3 3 1.000
11446 Bethany Chander 8237.0100 3 3 1.000

Insights:

  • The top 100 most valuable customers, or "Whales," generate a significant amount of revenue for the company, with the highest revenue being $13,295.38 and the lowest being $8,237.01 among the top 100 customers. This indicates that these customers are highly valuable to the company and contribute significantly to its overall revenue.
  • The purchasing frequency of these top customers varies, with some customers making purchases consistently over several months, while others have a more sporadic purchasing pattern. For example, some customers have an average of 1 order per month, while others have an average of 1.5 or more orders per month. This suggests that while all of these customers are valuable, their purchasing behavior may differ, and the company may need to tailor its marketing strategies to engage with these customers effectively based on their purchasing frequency.
  • The majority of the top 100 customers have a consistent purchasing pattern, with many of them making purchases in multiple months, indicating that they are likely loyal customers who have a strong affinity for the company's products. If we are somehow able to make these customers even more loyal, we will be able to increase their lifetime value and further boost the company's revenue.

Age Groups and Spend Analysis

9. What is our generational spend distribution (Age Brackets)?

Solution Method: To do accurate analysis I did not consider only the age groups but also the gender and marital status of the customers since the age groups alone will not give us the complete picture of the generational spend distribution. By segmenting customers based on their age, gender and marital status will allow us to understand the purchasing patterns of each group. This is a crucial step as the 30 years old single man will definitely have different purchasing behavior than the 30 years old married man.

To calculate the age of the customers, I have taken the last order date in the sales table as the reference point and then calculated the age of the customers using the DATEDIFF() function which find the difference in years between the birth date and the reference date. This approach allows us to determine the age of the customers at the time of their most recent purchase, which is more relevant for analyzing their spending behavior. Then, I have segmented the customers into different age groups (e.g., <25, 25-35, 36-45, etc.), gender and marital status groups and calculated the total revenue, total orders, and total products purchased for each segment. Segmentation based on the age groups, gender and marital status require just simple, straightforward CASE-WHEN sQL statements, and then we can group by these segments to get the desired results. This analysis will provide insights into which generational segments are driving the most revenue and how their purchasing behavior

Show SQL Code
with get_max_date as (
    select
        max(OrderDate) as reference_day
    from FactInternetSales
),

customer_info as (
    select
        cus.CustomerKey,
        -- we should find age using datediff() function
        DATEDIFF(year, cus.BirthDate, reference_day) as customer_age,
        cus.Gender as customer_gender,
        cus.MaritalStatus as customer_marital_status,
        sum(fis.SalesAmount) as customer_revenue,
        count(distinct fis.SalesOrderNumber) as customer_total_orders,
        sum(fis.OrderQuantity) as customer_total_products
    from Customer cus
    left join FactInternetSales fis
        on cus.CustomerKey = fis.CustomerKey
    cross  join get_max_date
    group by 
        cus.CustomerKey,
        DATEDIFF(year, cus.BirthDate, reference_day),
        cus.Gender,
        cus.MaritalStatus
)
select
    -- segment the customers into ages
    case  
        when customer_age < 25 then '<25'
        when Customer_age between 25 and 35  then '25-35'
        when customer_age between 36 and 45 then '36-45'
        when customer_age between 46 and 55 then '46-55'
        when customer_age between 56 and 65 then '56-65'
        else '66+'
    end as age_group,

    case 
        when customer_gender = 'F' then 'Female'
        when customer_gender = 'M' then 'Male'
        else 'Other'
    end as gender,
    case    
        when customer_marital_status = 'S' then 'Single' 
        when customer_marital_status = 'M' then 'Married'
        else 'Other'
    end as marital_status,
    count(distinct CustomerKey) as num_customers,
    coalesce(sum(customer_revenue),0) as total_revenue,
    coalesce(sum(customer_total_orders),0) as total_orders,
    coalesce(sum(customer_total_products),0) as total_products
from customer_info
group by 
    case  
        when customer_age < 25 then '<25'
        when Customer_age between 25 and 35  then '25-35'
        when customer_age between 36 and 45 then '36-45'
        when customer_age between 46 and 55 then '46-55'
        when customer_age between 56 and 65 then '56-65'
        else '66+'
    end,

    case 
        when customer_gender = 'F' then 'Female'
        when customer_gender = 'M' then 'Male'
        else 'Other'
    end,

    customer_marital_status,
    case    
        when customer_marital_status = 'S' then 'Single' 
        when customer_marital_status = 'M' then 'Married'
        else 'Other'
    end
order by
    total_revenue desc;

*Results:

age_group gender marital_status num_customers total_revenue total_orders total_products
36-45 Female Single 1570 3220642.3561 2450 5214
36-45 Male Married 1653 2690691.2759 2479 5369
36-45 Male Single 1496 2636742.2451 2197 4850
36-45 Female Married 1519 2540527.0699 2400 5302
46-55 Male Married 1431 2177651.6181 2254 4992
46-55 Female Married 1222 2031548.5099 1866 4071
25-35 Female Single 1341 1868358.5366 1960 4257
25-35 Male Single 1267 1853233.8674 1880 4036
46-55 Female Single 841 1663352.2746 1265 2773
25-35 Male Married 946 1478339.5669 1436 3060
46-55 Male Single 785 1423610.6286 1120 2375
25-35 Female Married 834 1300527.9517 1260 2705
56-65 Male Married 838 1181146.1335 1238 2740
56-65 Female Married 790 1052145.0857 1152 2569
56-65 Female Single 446 569557.8937 651 1462
56-65 Male Single 353 495231.9280 489 1089
66+ Male Married 398 392528.7789 582 1342
66+ Female Married 380 342269.7483 491 1123
66+ Female Single 190 224689.2487 247 541
66+ Male Single 184 215882.5031 242 528

Insights:

  • The age group of 36-45 years old contributes the highest total revenue, with both single and married customers leading in terms of spending. This suggests that this age group is a key demographic for the company, and marketing efforts should be focused on targeting both single and married customers within this age range.
  • The 46-55 age group also contributes significantly to total revenue, with married customers leading in terms of spending. This indicates that married customers in this age group are an important segment for the company, and marketing strategies should be tailored to appeal to their preferences and needs.
  • The 25-35 age group shows a strong presence of single customers contributing to total revenue, indicating that this segment may have different purchasing behavior compared to married customers. Marketing efforts targeting single customers in this age group may be effective in driving sales.
  • The 56-65 and 66+ age groups contribute less to total revenue compared to the younger age groups, but married customers in these age groups still show a significant contribution. This suggests that while these age groups may not be the primary target for the company, there is still potential for growth by targeting married customers within these age ranges.
  • Overall, the analysis reveals that the 36-45 age group, particularly single and married customers, is a key demographic for the company in terms of revenue generation. Marketing strategies should be tailored to target this age group effectively, while also considering the preferences and needs of other age groups to maximize overall revenue.

Customer Retention Rate Analysis

10. What is our Customer Retention Rate (One-and-Done vs. Repeat Buyers)?

Solution Method: To calculate the customer retention rate, we will first categorize customers into two groups based on their purchasing behavior: "One-and-Done" customers who have made only one purchase and "Repeat Buyers" who have made more than one purchase. We will then calculate the total number of customers in each group and the total revenue generated by each group. Finally, we will calculate the retention rate for each group by dividing the number of customers in that group by the total number of customers and multiplying by 100 to get a percentage. This analysis will provide insights into the proportion of customers who are repeat buyers versus those who are one-and-done, which can help the company understand its customer retention dynamics and identify opportunities for improving customer loyalty. There is an important consideration in the calculation of retention rate, which is to ensure that we are not dividing by zero when calculating the retention rate percentage. To address this, we will use the NULLIF function in SQL to handle cases where the total number of customers is zero, preventing any division by zero errors and ensuring that our retention rate calculation is accurate and robust.

Show SQL Code
with customer_info as 
(
    select
        cus.CustomerKey,
        coalesce(count(distinct fis.SalesOrderNumber),0) as num_orders,
        sum(fis.SalesAmount) as revenue
    from customer cus
    left join FactInternetSales fis 
        on fis.CustomerKey = cus.CustomerKey
    group by cus.CustomerKey

),

retention_groups as (
    select
        case 
            when num_orders > 1 then 'repeat-buyers' 
            when num_orders = 1 then 'one-and-done'
            else 'no order'
        end as retention_group,

        count(distinct CustomerKey) as num_customers ,
        sum(revenue) as total_revenue
    from customer_info
    group by 
        case 
            when num_orders > 1 then 'repeat-buyers' 
            when num_orders = 1 then 'one-and-done'
            else 'no order'
        end
)

select
    retention_group,
    num_customers,
    total_revenue,
    cast(num_customers * 1.0 / nullif(sum(num_customers) over (),0) as decimal(10,3)) 
        as retention_rate_pct
from retention_groups
order by
    total_revenue desc;

Results:

retention_group num_customers total_revenue retention_rate_pct
repeat-buyers 6865 22610711.7164 0.371
one-and-done 11619 6747965.5043 0.629

Insights:

  • The majority of customers (62.9%) are "One-and-Done" buyers, meaning they have made only one purchase, while 37.1% of customers are "Repeat Buyers" who have made more than one purchase. This indicates that a significant portion of the customer base consists of customers who do not return for additional purchases, which may suggest that there are opportunities to improve customer retention and encourage repeat business.
  • Despite being a smaller percentage of the customer base, "Repeat Buyers" contribute a significantly higher total revenue compared to "One-and-Done" buyers. This suggests that "Repeat Buyers" are more valuable to the company in terms of revenue generation, and efforts to increase the retention rate and convert "One-and-Done" buyers into "Repeat Buyers" could have a substantial impact on overall revenue.
  • The retention rate percentage indicates that there is room for improvement in customer retention, as a large portion of customers are not returning for additional purchases. The company may want to consider implementing strategies such as loyalty programs, personalized marketing, or improved customer service to encourage repeat purchases and increase the retention rate, ultimately driving more revenue from the existing customer base.

Conclusion

In this domain we obtained the following key insights:

  • Initially, we analyzed the purchasing behavior of customers segmented by education and income. We have found out that customers with mid-level income and higher education levels (Bachelor's and Graduate Degrees) contribute the most to the overall revenue, indicating that this segment is a key demographic for the company.
  • Then, we considered the effect of commmute distance and vehicle ownerships on the purchasing behavior of customers. We have clearly found out that the customers with short distance and no car are our main revenue drivers. That's completely logical as they need bikes for commuting short distances, so they prefer purchasing bikes from our company.
  • Moreover, we analyzed the order volume, lifetime value and retention rate of the whale customers - the customers that are in the top 100 based on the money spent for our products. We have found out that there is not a straight pattern among these as some of them purchase frequently but some purchase less. We should aim to make them more loyal and hence our main targeted customers are those in this case.
  • Next, we analyzed the generational spend distribution by segmenting customers based on their age, gender and marital status. We have found out that the 36-45 age group, particularly single and married customers, is a key demographic for the company in terms of revenue generation. Marketing strategies should be tailored to target this age group effectively, while also considering the preferences and needs of other age groups to maximize overall revenue.
  • Finally, we analyzed the customer retention rate by categorizing customers into "One-and-Done" and "Repeat Buyers". We have found out that the majority of customers are "One-and-Done" buyers, but "Repeat Buyers" contribute significantly higher total revenue. This indicates that there are opportunities to improve customer retention and encourage repeat business, which could have a substantial impact on overall revenue.

In the next section, we will analyze our products and determine which products are the best sellers, which products have the highest profit margins, and how product performance varies across different customer segments. This analysis will help us optimize our product offerings and marketing strategies to maximize revenue and profitability.