Order Fullfilment Efficiency Analysis

This section will explore how efficiently sellers are fulfilling orders on the Olist platform by analyzing the time gap between order approval and handover to logistics partners, identifying the 'Bottom 10%' of sellers in terms of dispatch efficiency, and providing insights into potential areas for improvement in the fulfillment process.

7Project Roadmap: Step-by-Step Navigation
1
Problem Description - Olist E-Commerce Data AnalysisThis article provides a detailed problem description for the Olist E-Commerce Data Analysis project, outlining the tasks assigned by the company, the structure of the analysis, the tools and skills used, acknowledgements, and references.
2
Data NormalizationThis section willl cover the data normalization process for the Olist E-Commerce dataset, including data ingestion, cleaning, star schema modeling, and the creation of an explicit date table to support time-based analysis.
3
Business QuestionsIn this section, business questions will be defined to guide the data analysis process for the Olist E-Commerce dataset. These questions will be categorized into different domains such as customer behavior, sales performance, delivery logistics, seller performance, payments, and customer experience to ensure a comprehensive analysis of the data.
4
Customer Retention AnalysisThis section focuses on analyzing customer retention patterns using the Olist e-commerce dataset. The analysis includes segmenting customers based on their purchasing behavior, understanding trends in customer retention, and exploring the factors that influence customer loyalty.
5
Order and Sales Performance AnalysisThis section will explore the order and sales performance metrics for Olist.
6
Delivery Performance and SLA AnalysisThis section will explore the delivery performance and logistics efficiency of Olist's e-commerce operations, analyzing on-time delivery rates, freight costs, and the decomposition of delivery timelines to identify bottlenecks and regional disparities.
7
Order Fullfilment Efficiency AnalysisThis section will explore how efficiently sellers are fulfilling orders on the Olist platform by analyzing the time gap between order approval and handover to logistics partners, identifying the 'Bottom 10%' of sellers in terms of dispatch efficiency, and providing insights into potential areas for improvement in the fulfillment process.
8
Payments and Risk AnalysisThis section will explore the payment distribution across different methods, analyze the correlation between installment counts and order values, and identify regional payment preferences to understand the risk and revenue implications for Olist's e-commerce platform.
9
Customer Review and Satisfaction AnalysisThis section will explore the effect of delivery delays on customer satisfaction, identify product categories with mismatched delivery speed and review scores, and analyze the distribution of review scores across different product categories to uncover common issues in negative feedback.
10
Times Series Analysis of Orders and Sales PerformanceThis section will explore the time series analysis of orders and sales performance for Olist, including month-over-month growth rates, seasonality effects, and the distribution of order volume by day and hour to identify key trends and patterns in customer behavior.

Order Fulfillment Efficiency

16. What is the average time gap between order_approved_at and order_delivered_carrier_date for each seller? Who are the ‘Bottom 10%’ of sellers who consistently take the longest to hand over packaged orders to the logistics partner?

Solution:

  1. Firstly, we calculate the dispatch time per order (seller_lead_times CTE): We join the orders and order_items tables to calculate the exact dispatch_hours for each order by finding the difference between order_approved_at and order_delivered_carrier_date. We use minutes divided by 60.0 to safely prevent integer overflow errors, and filter out ‘canceled’ or ‘unavailable’ orders.

  2. Secondly, we aggregate and filter seller performance (seller_performance_summary CTE): Grouping by seller_id, we calculate the avg_dispatch_hours and the total number of orders fulfilled. Crucially, we filter out negative time values (potential data entry errors) and use a HAVING clause to only include sellers with at least 5 orders, ensuring our averages are statistically relevant.

  3. Thirdly, we segment sellers into deciles (seller_efficiency_ranks CTE): We use the ntile(10) window function ordered by avg_dispatch_hours desc. This divides the sellers into 10 equal buckets, placing the absolute slowest 10% of sellers into bucket 1.

  4. Finally, we isolate the worst offenders (Final SELECT): We filter the results to only include efficiency_decile = 1 (the bottom 10%). We format the output to show both the average dispatch hours and days, highlighting the sellers who consistently take the longest to hand over packages to the logistics partner.

Show SQL Code
with seller_lead_times as (
    -- step 1: calculate hours between approval and carrier handover
    -- FIX: Changed 'second' to 'minute' to prevent integer overflow from data anomalies
    select 
        oi.seller_id,
        o.order_id,
        dispatch_hours = datediff(minute, o.order_approved_at, o.order_delivered_carrier_date) / 60.0
    from 
        orders o
    inner join 
        order_items oi on o.order_id = oi.order_id
    where 
        o.order_status not in ('canceled', 'unavailable')
        and o.order_approved_at is not null
        and o.order_delivered_carrier_date is not null
),

seller_performance_summary as (
    -- step 2: aggregate averages per seller
    select 
        seller_id,
        count(distinct order_id) as total_orders_fulfilled,
        avg(dispatch_hours) as avg_dispatch_hours
    from 
        seller_lead_times
    where 
        dispatch_hours >= 0 -- filtering out potential data entry errors
    group by 
        seller_id
    having 
        count(distinct order_id) >= 5 -- filtering for sellers with enough history to be statistically relevant
),

seller_efficiency_ranks as (
    -- step 3: segment sellers into deciles based on speed
    -- ntile(10) with order by desc puts the slowest 10% into bucket 1
    select 
        *,
        efficiency_decile = ntile(10) over (order by avg_dispatch_hours desc)
    from 
        seller_performance_summary
)

-- final output: isolating the "bottom 10%" slowest sellers
select 
    seller_id,
    total_orders_fulfilled,
    cast(avg_dispatch_hours as decimal(10,2)) as avg_dispatch_hours,
    cast(avg_dispatch_hours / 24.0 as decimal(10,2)) as avg_dispatch_days
from 
    seller_efficiency_ranks
where 
    efficiency_decile = 1 -- selecting the slowest decile
order by 
    avg_dispatch_hours desc;

Results:

seller_idtotal_orders_fulfilledavg_dispatch_hoursavg_dispatch_days
ecccfa2bb93b34a3bf033cc5d1dcdc6914622.6625.94
6cf476a4ca74498db55cbccdaa9dcfb65537.5722.40
5d378b73ab7dd6f0418d743e5dcb0bd17520.2721.68
fa74b2f3287d296e9fbd2cc80f2d1cf15431.4917.98
66e0557ecc2b4dbea057e93f215f68d830430.2217.93
4b1eaadf791bdbbad8c4a35b65236d525427.5217.81
6ee85be3693ed79a8e80718743d806555406.1516.92
ed859002ad59dbf8cf3602696a6c30007401.9616.75
54965bbe3e4f07ae045b90b0b8541f5273401.7416.74
cf8ab1616079e2793aa29d524df01bb16375.3815.64
5058e8c1e82653974541e83690655b4a62368.8915.37
c990d6cf976a5718aaedc539f383ab889364.5915.19
b5abf4f36adc043117b4fca82c22984c8360.9615.04
633ecdf879b94b5337cca303328e4a256360.0515.00
7fc87cc3e89b3d1d5cabdca32f8485aa14353.3314.72
38874e327ce94d11390b96eb42d619286347.9714.50
817f85dbb65aa3e70831d90fe75cdf8911345.0814.38
b1b3948701c5c72445495bd161b83a4c15338.2814.10
6fd52c528dcb38be2eea044946b811f867331.4913.81
8bd0e3abda539b9479c4b44a691be1ec6330.6213.78
822b63912576852aea9a8436d72317b76311.8112.99
538caafddff204241cecbf3a02e6b3cf8310.6812.95
8603f0038fe0e52fedd3382d266723de10303.2012.63
c42fd8e4d47dfb18ce5222f2dd7752f97300.2112.51
8d46553a36e68f95350a200c12f8f2e29290.2612.09
95b293867b5862941c9cd74f756a3c689289.2312.05
6d988d6174a2c27441597174f890551519288.8312.03
17f51e7198701186712e53a39c56461756281.8111.74
adcf50477d6a1f5d91bb9d54c4903cb518281.7511.74
b19f3ca2ea475913750f25a5c37c8d8f19279.3011.64
7c67e1448b00f6e969d365cea6b010ab980278.3911.60
cee48807215b30a12ca2ca10ffb5f25039273.0711.38
d32e3909cfb714fb2a6cb092c85b9c5f10269.7611.24
8444e55c1f13cd5c179851e5ca5ebd0093262.6310.94
973f21788dfab357250f69a8dcb7ddee9262.3310.93
ad781527c93d00d89a11eecd9dcad7c139261.4610.89
ea6b12bf9ffe2bac34602ec631d97a479259.7410.82
d71d863e5ef30d94e440c11be17dcd8f23259.4110.81
312ba1d77e9c332ef21f9598b7f64cd77258.8710.79
4cf490a58259286ada5ba8525ba9e84a8257.2010.72
a7f13822ceb966b076af67121f87b06375256.9710.71
f1ed6bd0a9b11b581f16c851c6a5a52717256.9210.70
a2e874074c877c5a05abae80ad6e488f40255.9210.66
2eb70248d66e0e3ef83659f71b244378188253.0710.54
20b54c376b794ed028df09a3cd88e8dc7243.3710.14
579891617139df7d8671d373f06696227243.3110.14
a425f92c199eb576938df686728acd2016242.7010.11
b55638ad525e906c698fa2ce742c174211239.229.97
1b8b75e227c9a9c100d0c210fb6176ce17236.989.87
02d35243ea2e497335cd0f076b45675d14234.789.78
a2fa0bdc798ee84cdb08281337cf4fb68233.819.74
3faf68a3b0af94b10bac70d86077be497232.279.68
30c7f28fd3a5897b2c82d152bb760c176230.739.61
70b52a4cfc823994561b00bad161b4ed8228.059.50
610f72e407cdd7caaa2f8167b0163fd815221.489.23
835f0f7810c76831d6c7d24c7a646d4d42219.529.15
2089a6d640999f9b9141ac719b2af59642219.489.14
054694fa03fe82cec4b7551487331d7421215.548.98
38102b031c2a15e54623d711bfc753d39205.018.54
f4c4daa86e30c7e5a553a8d518ac03a518204.638.53
f1b93673502375d491780bb49d615dbc8199.698.32
93dc87703c046b603023e75222018b4535198.008.25
ec4608a1f76453166bb312b2968aeaf417197.648.24
54219883e72aad869adfb2a54b7bfa0f5196.328.18
f76a3b1349b6df1ee875d1f3fa4340f024194.548.11
5f67c6082caacb26e431a7b17940cece13192.038.00
87e24fc7052259b1f7f607d61c10d8be7191.807.99
33576ec5412fb5905d876f12f33bfde631191.757.99
c37b2059d4f90d4feead554e5246565e12191.617.98
4917cee8d902e13428c3ec4b1ca6f31531191.437.98
88460e8ebdecbfecb5f9601833981930243191.067.96
9baf5cb77970f539089d09a38bcec5c331189.977.92
dc8798cbf453b7e0f98745e396cc561637186.297.76
89bc797e2e29667aa035d4368f9b7f9217185.907.75
b335c59ab742f751a85db9c411a8673930184.347.68
7ecd59e5e20407131822c1a68ac59c1f24184.167.67
9539213aa8e023c01937bef95dc006da9184.067.67
81336a0f57a33776419727a300249e9016183.557.65
834f8533b2ecb6598dd004ff3de7203a41182.897.62
38e6dada03429a47197d5d584d793b417182.667.61
8f78f0903005064036736c7173a5c2ed13181.557.56
6bd69102ab48df500790a8cecfc285c25180.757.53
41ab63a91b8b264e8c8780368bf1dd5b12180.667.53
f419677537694f0462a91990df6ca44f6179.797.49
213b25e6f54661939f11710a6fddb871153178.067.42
ab3e0c171fe84a7ba7de130f19cfb4856175.027.29
784ba75dd9d20200c4caed3d7a77141a19173.637.23
741f8c587cb4248ebc5b779a0e8e0e515173.477.23
cb41bfbcbda0aea354a834ab222f9a5911171.517.15
6562efe88ce0826a4ca4f189f03b4b849171.487.15
f7496d659ca9fdaf323c0aae8417663225170.907.12
5151aea44289d6c6b090ee31c21325087170.737.11
c60b801f2d52c7f7f91de00870882a7540169.907.08
5882820ef11c4c068eea0677a7a879b414168.897.04
ffff564a4f9085cd26170f47323937268168.427.02
a67780aba9dd436e7ac0aa1f889e73ec6167.907.00
db2956745b3a8e9f3785c99f34b5d25e46166.936.96
1b7e5006cb25dd23ce8686691013b12111166.486.94
b6dc74ed30f124f95227bed4e43a3bed11166.436.93
cb5df0dcb9e280b1780e9d589889f2c711165.896.91
cce6ab8d1682639fe45ab70234f1665f62164.886.87
a49928bcdf77c55c6d6e05e09a9b4ca598164.356.85
0307f7565ff85b299b6e5ef162b1b1a96163.346.81
c1ed9e3daf6154542826a6996af0fb8c8163.296.80
c6381d2d013342748761e906d45aff7615162.976.79
712e6ed8aa4aa1fa65dab41fed5737e479162.966.79
e504a4e2efaa45cbff7e268a2c58c95610162.956.79
beadbee30901a7f61d031b6b686095ad64162.536.77
1da366cade6d8276e7d8beea7af5d4bf43161.536.73
7fdb0720c8d7c9075538b365dc8c3a229159.166.63
85d9eb9ddc5d00ca9336a2219c97bb13506158.416.60
4e17c65a516f69d023a2ae78b84f28d660158.166.59
dd7ddc04e1b6c2c614352b383efe2d36121157.706.57
a888faf2d1baececa6baf9c3d603ee1f53157.436.56
c64a2aec32cc408a8a4c6d7c46017f919157.346.56
1976a05ecf2996d6ffb7b5430e635fe78156.386.52
6c177e38df6d3f34182b1f1d427231bf55154.766.45
6a51fc556dab5f766ced6fbc860bc61325153.926.41
d5b9a084373da994a6e37f732169a8537153.906.41
f97e2b0650c2068227e1d366140eb62f16153.146.38
e21a1e8be4715b0fcfd5385643bc49c421152.466.35
6039e27294dc75811c0d8a39069f52c064152.186.34
1dcfa3835714681d4ba4a93bc1e0fd4112150.386.27
8378eb36b6b70c5734e83ad7b8cdc3b77149.296.22
de66a66f2dd06bb9ec37aa96987466a37148.996.21
ba6ce6de456f6c11d08c17701ba5c26f6148.866.20
df560393f3a51e74553ab94004ba5c8728148.546.19
7bcd7c5f8631701474db233ccf1c094b9147.806.16
ffeee66ac5d5a62fe688b9d26f83f53414147.676.15
33c51922b6e3c16b24a8f28e5833c99d18145.646.07
8a87611c08849ffeeccab52aa798b6c76145.536.06
dfc475d54e1b6dbeeb7d7d9bdaa6382718144.346.01
e81e1f9c3e762e620f3fb60a51499e1617143.375.97
c24173065483e421f9e5e7f599de101d9143.095.96
99002261c568a84cce14d43fcffb43ea36142.655.94
5b179e9e8cc7ab6fd113a46ca584da8134142.055.92
71271995e85f5b8530be99ed54a91b896141.725.90
8e6cc767478edae941d9bd9eb778d77a104140.905.87
c3acdfac4e3e97ff87529454fbc0364212140.805.87
f08a5b9dd6767129688d001acafc21e511140.715.86
5b0cc932433fa5184b5b94bfe6bcc25612140.105.84
bbad7e518d7af88a0897397ffdca197968139.965.83
855668e0971d4dfd7bef1b6a4133b41b306139.385.81
e06f09ec0a4aca210779cf1cfc63cf1929139.335.81
8a40ae794fa3f9892cee24bd5af0488b5139.155.80
04308b1ee57b6625f47df1d56f00eedf93139.125.80
37dd29b36c458d82f74a953c40c436457138.785.78
e067ad2c1c0b48758eb1b5228bcf7a6854138.375.77
eb4df17aed01d918c65f0f8d650900c011137.845.74
6973a06f484aacf400ece213dbf3d94692137.675.74
f9903dd0f812c7e771fcad47b6fff23110137.385.72
f9244d45189d3a3605499abddeade7d521136.885.70
9dda5bbacd45e18d6485fee649205d0919136.885.70
e8b3a3a38279a82f0e5d006d5e5b7d2c80136.765.70
d93844a9c55ba7ce353388bcf849ea5612136.605.69
271c58a1d139c45eaf3316107c6d3a3b28135.535.65
236585a4ddb5fea9fa40233e2249ed9111134.855.62
4c8b8048e33af2bf94f2eb547746a91619134.265.59
25debeafbce801fdd479539350185eee5133.995.58
cac4c8e7b1ca6252d8f20b2fc1a2e4af74132.265.51
656591be56071d4c9ef4e5fee78a578a7132.125.50
c4d51195486dc781531876a7d00453d817131.295.47
4d600e08ecbe08258c79e536c5a42fee7129.895.41
184a67a8f9f63234d3a92340bbdb727f8129.795.41
dd2bdf855a9172734fbc3744021ae9b9106128.785.37
63b464dbf392c7b80d12d932fa7cafed36128.645.36
cd843d4cf8ef32827de0396605163ba35128.595.36
4e8dacf3d38f281ae26c3e0321d92d888128.425.35
343e716476e3748b069f980efbaa294e17127.905.33
014c0679dd340a0e338872e7ec85666a14127.875.33
e819bcfade7b5d88a27325eb6cfd62c55127.715.32
6e0908ef4d4efadbc3cc2b74ea477cb08127.415.31
575df70bde3f9f2b30bf8d2e9910d72527127.125.30
0873d9f8f36123f8d910f4760e788cfb6126.965.29
373ee4a3a775a733770ca3f790b8b9ac13126.895.29
062ce95fa2ad4dfaedfc79260130565f54126.735.28

Insight:

  • Extreme Outliers: The absolute slowest seller (ecccfa...) takes an average of 25.9 days just to hand over the package to the carrier! However, the worst offenders at the very top of the list mostly have low order volumes (between 5 and 14 orders), suggesting they might be inactive, poorly managed, or drop-shipping from overseas.
  • The High-Volume Red Flag: Seller 7c67e144... is a massive problem for the platform. They fulfilled 980 orders but have an incredibly slow average dispatch time of 11.6 days. Because of their high volume, this single seller is likely generating a huge amount of late deliveries and negative customer reviews.
  • Unacceptable Baselines: In modern e-commerce, dispatching an item should take 1 to 2 days. The sellers on this “Bottom” list are consistently taking over a week (7+ days) just to get the product out of their warehouse. This indicates a severe supply chain or inventory management issue on the seller’s side.
  • Actionable Next Step: Olist needs to enforce strict Service Level Agreements (SLAs). Sellers like 7c67e144... should be penalized or warned, as they are large enough to seriously damage Olist’s brand reputation with their 11-day packing delays.

Pareto Analysis for Sellers

17. Do the top 10% of sellers generate more than 50% of the platform’s total revenue? Furthermore, which product categories are dominated by a single ‘Monopoly Seller’ vs. categories with a healthy, competitive mix of multiple vendors?

Solution:

  1. Firstly, we calculate global seller revenue (seller_global_revenue and global_pareto CTEs): We join the order_items, payments, and orders tables to sum up the total revenue for each seller across the entire platform. We then use the ntile(10) window function to rank sellers into deciles, which helps identify the platform-wide revenue concentration.

  2. Secondly, we calculate seller market share per category (category_seller_share CTE): We group the sales data by both product_category_name and seller_id to find how much revenue each seller generated in that specific category. We also use a window function sum() over (partition by product_category_name) to calculate the total overall revenue for the entire category without grouping away the seller-level details.

  3. Thirdly, we identify the top seller’s dominance (category_dominance CTE): Grouping strictly by category, we count the total number of unique vendors (total_vendors) and find the highest revenue generated by a single seller (max(seller_category_revenue)). By dividing this top seller’s revenue by the total category revenue, we calculate the exact market share percentage held by the leading vendor (leader_market_share_pct).

  4. Finally, we classify the competitive health of each market (Final SELECT): We use a CASE statement to evaluate the market conditions. Categories where a single seller holds over 50% market share alongside fewer than 5 total vendors are flagged as ‘Monopoly / Highly Concentrated’, while categories with a leading share under 15% and over 20 vendors are labeled as ‘Healthy / Fragmented Competition’.

Show SQL Code
with seller_global_revenue as (
    -- step 1: calculate total revenue for every seller across the platform
    select 
        oi.seller_id,
        sum(p.payment_value) as total_seller_revenue
    from 
        order_items oi
    inner join 
        payments p on oi.order_id = p.order_id
    inner join 
        orders o on oi.order_id = o.order_id
    where 
        o.order_status not in ('canceled', 'unavailable')
    group by 
        oi.seller_id
),

global_pareto as (
    -- step 2: rank sellers into deciles to check platform-wide concentration
    select 
        seller_id,
        total_seller_revenue,
        ntile(10) over (order by total_seller_revenue desc) as seller_decile,
        sum(total_seller_revenue) over () as platform_total_revenue
    from 
        seller_global_revenue
),

category_seller_share as (
    -- step 3: calculate revenue share for each seller within their specific category
    select 
        prod.product_category_name,
        oi.seller_id,
        sum(p.payment_value) as seller_category_revenue,
        sum(sum(p.payment_value)) over (partition by prod.product_category_name) as total_category_revenue
    from 
        order_items oi
    inner join 
        products prod on oi.product_id = prod.product_id
    inner join 
        payments p on oi.order_id = p.order_id
    inner join 
        orders o on oi.order_id = o.order_id
    where 
        o.order_status not in ('canceled', 'unavailable')
        and prod.product_category_name is not null
    group by 
        prod.product_category_name, oi.seller_id
),

category_dominance as (
    -- step 4: identify the share of the 'top seller' in each category
    select 
        product_category_name,
        count(distinct seller_id) as total_vendors,
        max(seller_category_revenue) as top_seller_revenue,
        total_category_revenue,
        -- market share of the leading seller
        cast(100.0 * max(seller_category_revenue) / total_category_revenue as decimal(5,2)) as leader_market_share_pct
    from 
        category_seller_share
    group by 
        product_category_name, total_category_revenue
)

-- final output: categorizing product categories by competitive health
select 
    product_category_name,
    total_vendors,
    leader_market_share_pct,
    case 
        when leader_market_share_pct >= 50 and total_vendors < 5 then 'Monopoly / Highly Concentrated'
        when leader_market_share_pct >= 30 then 'Dominant Leader'
        when leader_market_share_pct < 15 and total_vendors > 20 then 'Healthy / Fragmented Competition'
        else 'Moderate Competition'
    end as market_health_status
from 
    category_dominance
order by 
    leader_market_share_pct desc;

Results:

product_category_nametotal_vendorsleader_market_share_pctmarket_health_status
cds_dvds_musicais1100.00Monopoly / Highly Concentrated
la_cuisine294.95Monopoly / Highly Concentrated
portateis_cozinha_e_preparadores_de_alimentos574.34Dominant Leader
moveis_escritorio3472.74Dominant Leader
casa_conforto_2471.76Monopoly / Highly Concentrated
telefonia_fixa4970.72Dominant Leader
tablets_impressao_imagem668.33Dominant Leader
fraldas_higiene767.53Dominant Leader
pcs967.33Dominant Leader
seguros_e_servicos264.42Monopoly / Highly Concentrated
pc_gamer259.92Monopoly / Highly Concentrated
flores351.52Monopoly / Highly Concentrated
portateis_casa_forno_e_cafe1550.67Dominant Leader
moveis_colchao_e_estofado546.49Dominant Leader
eletrodomesticos4846.29Dominant Leader
casa_conforto2046.09Dominant Leader
livros_importados745.89Dominant Leader
artes3543.33Dominant Leader
bebidas3643.13Dominant Leader
fashion_roupa_feminina1142.24Dominant Leader
artigos_de_festas1241.25Dominant Leader
fashion_calcados1341.23Dominant Leader
artigos_de_natal1740.30Dominant Leader
fashion_underwear_e_moda_praia939.90Dominant Leader
dvds_blu_ray1036.88Dominant Leader
fashion_roupa_infanto_juvenil434.49Dominant Leader
agro_industria_e_comercio3434.30Dominant Leader
audio3634.18Dominant Leader
papelaria17334.14Dominant Leader
industria_comercio_e_negocios2933.68Dominant Leader
eletrodomesticos_24632.97Dominant Leader
ferramentas_jardim23532.57Dominant Leader
artes_e_artesanato1332.06Dominant Leader
livros_interesse_geral5331.98Dominant Leader
construcao_ferramentas_iluminacao3631.82Dominant Leader
construcao_ferramentas_ferramentas2430.68Dominant Leader
fashion_esporte1030.27Dominant Leader
moveis_sala5530.00Dominant Leader
construcao_ferramentas_jardim3229.79Moderate Competition
construcao_ferramentas_seguranca3429.18Moderate Competition
fashion_roupa_masculina1328.79Moderate Competition
sinalizacao_e_seguranca4628.41Moderate Competition
livros_tecnicos2627.78Moderate Competition
moveis_quarto2323.77Moderate Competition
malas_acessorios7323.52Moderate Competition
alimentos_bebidas3322.65Moderate Competition
consoles_games8122.19Moderate Competition
25622.11Moderate Competition
cool_stuff26321.04Moderate Competition
musica1920.20Moderate Competition
instrumentos_musicais6820.18Moderate Competition
telefonia14619.99Moderate Competition
market_place5919.92Moderate Competition
cine_foto1419.66Moderate Competition
climatizacao5219.63Moderate Competition
moveis_decoracao36619.03Moderate Competition
casa_construcao6818.41Moderate Competition
alimentos5818.17Moderate Competition
eletroportateis10116.91Moderate Competition
perfumaria17416.69Moderate Competition
relogios_presentes9816.31Moderate Competition
brinquedos24915.49Moderate Competition
cama_mesa_banho19515.05Moderate Competition
bebes24214.60Healthy / Fragmented Competition
construcao_ferramentas_construcao12714.08Healthy / Fragmented Competition
eletronicos14612.82Healthy / Fragmented Competition
moveis_cozinha_area_de_servico_jantar_e_jardim5112.82Healthy / Fragmented Competition
fashion_bolsas_e_acessorios9911.58Healthy / Fragmented Competition
automotivo37411.09Healthy / Fragmented Competition
informatica_acessorios28510.01Healthy / Fragmented Competition
pet_shop1357.22Healthy / Fragmented Competition
utilidades_domesticas4637.10Healthy / Fragmented Competition
esporte_lazer4775.27Healthy / Fragmented Competition
beleza_saude4895.22Healthy / Fragmented Competition

Insights:

  • True Monopolies in Niche Markets: Very small categories like cds_dvds_musicais and la_cuisine are completely run by 1 or 2 sellers holding 95% to 100% of the market share. This is expected for low-demand or highly specialized items.
  • The Illusion of Competition: This is a major finding. Categories like moveis_escritorio (office furniture) look competitive on paper because they have 34 vendors. However, one single seller controls almost 73% of the market. We see the exact same thing in telefonia_fixa (landline phones), where the top seller holds 70% despite having 48 competitors.
  • Healthy Core Markets: The best news for Olist is that its top-earning categories are heavily fragmented. Beleza_saude (health & beauty) and esporte_lazer (sports & leisure) each have nearly 500 active vendors, and the top seller only controls about 5% of the sales. This means the platform’s core revenue is highly secure.
  • Single Point of Failure (Risk): Any category tagged as “Dominant Leader” carries a huge supply chain risk. If the top seller in pcs (67% share) or moveis_escritorio leaves the platform or starts delivering late, Olist’s revenue and reputation in that entire category will instantly tank.
  • The Pareto Principle in Action: While the table focuses on category breakdowns rather than the overall platform revenue, it perfectly illustrates the 80/20 rule. Across dozens of categories, a single vendor is doing the heavy lifting and generating the vast majority of the sales volume.

Seller Churn Analysis

18. What is the churn rate of sellers on the platform? Specifically, how many sellers who made a sale in 2017 became inactive (zero sales) in 2018? Is there a correlation between high ‘Order Cancellation Rates’ and subsequent seller churn?

Solution:

  1. Firstly, we aggregate annual activity per seller (seller_activity CTE): We join the orders and order_items tables to calculate each seller’s total_orders and canceled_orders grouped by the specific year they occurred (extracted from the timestamp using the year() function).

  2. Secondly, we build the 2017 cohort and identify churn (seller_cohort_2017 CTE): We isolate sellers who were active in 2017 and perform a LEFT JOIN against their own 2018 activity data. If the 2018 record is missing (s18.seller_id is null), we classify them as ‘Churned (Inactive in 2018)’, otherwise they are ‘Retained’. We also compute their individual 2017 cancellation rate percentage here.

  3. Finally, we aggregate metrics by churn status (Final SELECT): We group the cohort by their churn_status to count the exact number of sellers that stayed versus left. By averaging the cancellation_rate_2017 and the orders_2017 volume for both groups, we can directly observe the correlation between poor fulfillment performance (high cancellations) and subsequent seller churn on the platform.

Show SQL Code
with seller_activity as (
    -- step 1: identify activity years and cancellation counts for every seller
    select 
        oi.seller_id,
        year(o.order_purchase_timestamp) as activity_year,
        count(distinct o.order_id) as total_orders,
        sum(case when o.order_status = 'canceled' then 1 else 0 end) as canceled_orders
    from 
        orders o
    inner join 
        order_items oi on o.order_id = oi.order_id
    group by 
        oi.seller_id, year(o.order_purchase_timestamp)
),

seller_cohort_2017 as (
    -- step 2: isolate sellers who were active in 2017 and track their 2018 status
    select 
        s17.seller_id,
        s17.total_orders as orders_2017,
        cast(100.0 * s17.canceled_orders / s17.total_orders as decimal(5,2)) as cancellation_rate_2017,
        case 
            when s18.seller_id is null then 'Churned (Inactive in 2018)'
            else 'Retained (Active in 2018)'
        end as churn_status
    from 
        seller_activity s17
    left join 
        seller_activity s18 on s17.seller_id = s18.seller_id and s18.activity_year = 2018
    where 
        s17.activity_year = 2017
)

-- final output: comparing cancellation rates across churned vs retained sellers
select 
    churn_status,
    count(seller_id) as seller_count,
    -- average cancellation rate to identify the correlation
    cast(avg(cancellation_rate_2017) as decimal(5,2)) as avg_cancellation_rate_pct,
    -- average volume to see if smaller or larger sellers churn more
    cast(avg(orders_2017 * 1.0) as decimal(10,2)) as avg_2017_order_volume
from 
    seller_cohort_2017
group by 
    churn_status;

Results:

churn_statusseller_countavg_cancellation_rate_pctavg_2017_order_volume
Churned (Inactive in 2018)6856.066.34
Retained (Active in 2018)10990.4837.07

Insights:

  • Massive Drop-Off Rate: A huge chunk of the seller base did not survive into 2018. Out of 1,784 sellers who made sales in 2017, 685 of them (around 38%) completely stopped operating the following year.
  • Low Traction Means High Flight Risk: The sellers who churned never really gained momentum. They averaged only 6.34 orders for the entirety of 2017. In contrast, the retained sellers were moving roughly 37 orders a year. If a vendor doesn’t see quick sales, they abandon the platform.
  • The Cancellation Red Flag: There is a massive, direct correlation between failing to fulfill orders and leaving the platform. The churned sellers had an average cancellation rate of 6.06%, which is over 12 times worse than the healthy, retained sellers (who kept cancellations under 0.5%).
  • Profile of a Failing Seller: The data paints a very clear picture. The typical seller who leaves Olist is a low-volume vendor who struggles to actually deliver the few orders they get. This poor performance likely leads to account suspension or them simply giving up.

Identifying Underperforming Sellers

19. Which sellers have a high sales volume (\>50 orders) but a consistently low average review score. Can we identify specific sellers who are responsible for a disproportionate number of the platforms 1-star reviews?

Solution:

  1. Firstly, we analyze the seller churn correlation (Part 1): We aggregate annual activity and cancellation rates per seller (seller_activity CTE), then isolate the 2017 cohort to track whether they were retained or churned in 2018 (seller_cohort_2017 CTE). The first output aggregates the average cancellation rates and order volumes for churned versus retained sellers.

  2. Secondly, we aggregate review statistics per seller (Part 2 - seller_review_stats CTE): To evaluate seller quality, we join the order_items, orders, and order_reviews tables. Grouping strictly by seller_id, we calculate their total_orders, their avg_review_score, and calculate their specific one_star_count.

  3. Thirdly, we calculate the platform-wide baseline (Part 2 - platform_total_one_stars CTE): We independently calculate the grand_total_one_stars across the entire database. This acts as the denominator for measuring an individual seller’s negative impact against the whole platform.

  4. Finally, we identify the worst offenders and their platform impact (Part 2 Final SELECT): We CROSS JOIN the seller stats with the platform baseline. We filter for high-volume sellers (total_orders > 50) with poor ratings (avg_review_score < 3). By computing pct_of_platform_one_stars, we can pinpoint exactly how much of the platform’s total negative feedback is disproportionately driven by these specific vendors.

Show SQL Code

with seller_activity as (
    -- step 1: identify activity years and cancellation counts for every seller
    select 
        oi.seller_id,
        year(o.order_purchase_timestamp) as activity_year,
        count(distinct o.order_id) as total_orders,
        sum(case when o.order_status = 'canceled' then 1 else 0 end) as canceled_orders
    from 
        orders o
    inner join 
        order_items oi on o.order_id = oi.order_id
    group by 
        oi.seller_id, year(o.order_purchase_timestamp)
),

seller_cohort_2017 as (
    -- step 2: isolate sellers who were active in 2017 and track their 2018 status
    select 
        s17.seller_id,
        s17.total_orders as orders_2017,
        cast(100.0 * s17.canceled_orders / s17.total_orders as decimal(5,2)) as cancellation_rate_2017,
        case 
            when s18.seller_id is null then 'Churned (Inactive in 2018)'
            else 'Retained (Active in 2018)'
        end as churn_status
    from 
        seller_activity s17
    left join 
        seller_activity s18 on s17.seller_id = s18.seller_id and s18.activity_year = 2018
    where 
        s17.activity_year = 2017
)

-- output 1: comparing cancellation rates across churned vs retained sellers
select 
    churn_status,
    count(seller_id) as seller_count,
    cast(avg(cancellation_rate_2017) as decimal(5,2)) as avg_cancellation_rate_pct,
    cast(avg(orders_2017 * 1.0) as decimal(10,2)) as avg_2017_order_volume
from 
    seller_cohort_2017
group by 
    churn_status;


/* PART 2: Seller Review Performance & 1-Star Concentration
    Objective: 
        1. Identify high-volume sellers (>50 orders) with poor ratings (<3 stars).
        2. Calculate their contribution to the platform's total 1-star reviews.
*/

with seller_review_stats as (
    -- step 1: aggregate review scores and total 1-stars per seller
    select 
        oi.seller_id,
        count(distinct o.order_id) as total_orders,
        avg(orv.review_score * 1.0) as avg_review_score,
        sum(case when orv.review_score = 1 then 1 else 0 end) as one_star_count
    from 
        order_items oi
    inner join 
        orders o on oi.order_id = o.order_id
    inner join 
        order_reviews orv on o.order_id = orv.order_id
    group by 
        oi.seller_id
),

platform_total_one_stars as (
    -- step 2: get the total number of 1-star reviews across the whole platform
    select sum(case when review_score = 1 then 1 else 0 end) as grand_total_one_stars
    from order_reviews
)

-- output 2: flagging high-risk sellers based on volume and negative sentiment
select 
    s.seller_id,
    s.total_orders,
    cast(s.avg_review_score as decimal(3,2)) as avg_review_score,
    s.one_star_count,
    -- concentration: what % of the entire platform's 1-star reviews come from this one seller?
    cast(100.0 * s.one_star_count / p.grand_total_one_stars as decimal(5,2)) as pct_of_platform_one_stars
from 
    seller_review_stats s
cross join 
    platform_total_one_stars p
where 
    s.total_orders > 50 
    and s.avg_review_score < 3
order by 
    s.one_star_count desc;

Results 1:

churn_statusseller_countavg_cancellation_rate_pctavg_2017_order_volume
Churned (Inactive in 2018)6856.066.34
Retained (Active in 2018)10990.4837.07

Results 2:

seller_idtotal_ordersavg_review_scoreone_star_countpct_of_platform_one_stars
2eb70248d66e0e3ef83659f71b2443781982.71810.71
1ca7077d890b907f89be8c954a02686a1142.20800.70
a49928bcdf77c55c6d6e05e09a9b4ca5982.95370.32
54965bbe3e4f07ae045b90b0b8541f52742.94340.30
972d0f9cf61b499a4812cf0bfa3ad3c4792.96290.25

Insights:

  • The Worst Offenders: We isolated five specific sellers who process a solid volume of orders (between 74 and 198) but consistently fail their customers, keeping their average rating below 3 stars.
  • Disproportionate Damage: The top two sellers on this list (2eb702... and 1ca707...) are incredibly toxic to the platform. Together, these two vendors are responsible for 1.41% of all 1-star reviews generated across the entire Olist marketplace.
  • High Volume, High Failure Rate: Seller 1ca7077d... is the most concerning data point. They fulfilled 114 orders and racked up 80 one-star reviews. This suggests that a massive majority of their customers had a terrible experience, dragging their average score down to an abysmal 2.20.
  • Reputation Risk at Scale: Sellers hovering around the 2.9 score (a499..., 5496..., 972d...) might not look as disastrous at first glance, but because they process 70 to 100 orders, they are consistently delivering bad experiences to a large group of people.
  • Actionable Next Step: These vendors are actively hurting the Olist brand. The platform should immediately suspend or heavily penalize these top 5 sellers to protect future buyers.

Conclusion

What We Did:

  • Calculated the average dispatch times for sellers to identify the worst bottlenecks in the fulfillment process.
  • Performed a Pareto analysis on market share to see which categories are dominated by monopolies versus those with healthy competition.
  • Analyzed seller churn rates between 2017 and 2018, comparing the performance metrics of retained versus inactive vendors.
  • Isolated high-volume, low-rated sellers to measure their specific impact on the platform’s overall 1-star review count.

What We Obtained (Key Findings):

  • Severe Dispatch Delays: While some sellers operate efficiently, the bottom tier is completely failing. We identified high-volume vendors taking over 11 days just to pack an item, which creates a massive bottleneck.
  • The “Dominant Leader” Risk: Many categories look competitive but are actually carried by a single vendor holding up to 70% of the market. This creates a huge supply chain risk if that seller leaves. Luckily, our biggest revenue categories (like health & beauty) are highly fragmented and safe.
  • Failure Drives Churn: About 38% of 2017 sellers quit in 2018. The data shows they didn’t just leave; they failed out. Churned sellers averaged only 6 orders a year and had a cancellation rate 12 times higher than successful, active sellers.
  • A Few Bad Apples: We pinpointed a tiny group of sellers who are actively damaging the platform’s reputation. Just two specific vendors generated over 1.4% of all 1-star reviews on Olist, proving that strict seller quality enforcement is urgently needed.