Order Fullfilment Efficiency Analysis

This section will explore how efficiently sellers are fulfilling orders on the Olist platform by analyzing the time gap between order approval and handover to logistics partners, identifying the 'Bottom 10%' of sellers in terms of dispatch efficiency, and providing insights into potential areas for improvement in the fulfillment process.

7Project Roadmap: Step-by-Step Navigation

Problem Description - Olist E-Commerce Data AnalysisThis article provides a detailed problem description for the Olist E-Commerce Data Analysis project, outlining the tasks assigned by the company, the structure of the analysis, the tools and skills used, acknowledgements, and references.

Data NormalizationThis section willl cover the data normalization process for the Olist E-Commerce dataset, including data ingestion, cleaning, star schema modeling, and the creation of an explicit date table to support time-based analysis.

Business QuestionsIn this section, business questions will be defined to guide the data analysis process for the Olist E-Commerce dataset. These questions will be categorized into different domains such as customer behavior, sales performance, delivery logistics, seller performance, payments, and customer experience to ensure a comprehensive analysis of the data.

Customer Retention AnalysisThis section focuses on analyzing customer retention patterns using the Olist e-commerce dataset. The analysis includes segmenting customers based on their purchasing behavior, understanding trends in customer retention, and exploring the factors that influence customer loyalty.

Order and Sales Performance AnalysisThis section will explore the order and sales performance metrics for Olist.

Delivery Performance and SLA AnalysisThis section will explore the delivery performance and logistics efficiency of Olist's e-commerce operations, analyzing on-time delivery rates, freight costs, and the decomposition of delivery timelines to identify bottlenecks and regional disparities.

Order Fullfilment Efficiency AnalysisThis section will explore how efficiently sellers are fulfilling orders on the Olist platform by analyzing the time gap between order approval and handover to logistics partners, identifying the 'Bottom 10%' of sellers in terms of dispatch efficiency, and providing insights into potential areas for improvement in the fulfillment process.

Payments and Risk AnalysisThis section will explore the payment distribution across different methods, analyze the correlation between installment counts and order values, and identify regional payment preferences to understand the risk and revenue implications for Olist's e-commerce platform.

Customer Review and Satisfaction AnalysisThis section will explore the effect of delivery delays on customer satisfaction, identify product categories with mismatched delivery speed and review scores, and analyze the distribution of review scores across different product categories to uncover common issues in negative feedback.

Times Series Analysis of Orders and Sales PerformanceThis section will explore the time series analysis of orders and sales performance for Olist, including month-over-month growth rates, seasonality effects, and the distribution of order volume by day and hour to identify key trends and patterns in customer behavior.

Order Fulfillment Efficiency

16. What is the average time gap between order_approved_at and order_delivered_carrier_date for each seller? Who are the ‘Bottom 10%’ of sellers who consistently take the longest to hand over packaged orders to the logistics partner?

Solution:

Firstly, we calculate the dispatch time per order (seller_lead_times CTE): We join the orders and order_items tables to calculate the exact dispatch_hours for each order by finding the difference between order_approved_at and order_delivered_carrier_date. We use minutes divided by 60.0 to safely prevent integer overflow errors, and filter out ‘canceled’ or ‘unavailable’ orders.
Secondly, we aggregate and filter seller performance (seller_performance_summary CTE): Grouping by seller_id, we calculate the avg_dispatch_hours and the total number of orders fulfilled. Crucially, we filter out negative time values (potential data entry errors) and use a HAVING clause to only include sellers with at least 5 orders, ensuring our averages are statistically relevant.
Thirdly, we segment sellers into deciles (seller_efficiency_ranks CTE): We use the ntile(10) window function ordered by avg_dispatch_hours desc. This divides the sellers into 10 equal buckets, placing the absolute slowest 10% of sellers into bucket 1.
Finally, we isolate the worst offenders (Final SELECT): We filter the results to only include efficiency_decile = 1 (the bottom 10%). We format the output to show both the average dispatch hours and days, highlighting the sellers who consistently take the longest to hand over packages to the logistics partner.

Show SQL Code

with seller_lead_times as (
    -- step 1: calculate hours between approval and carrier handover
    -- FIX: Changed 'second' to 'minute' to prevent integer overflow from data anomalies
    select 
        oi.seller_id,
        o.order_id,
        dispatch_hours = datediff(minute, o.order_approved_at, o.order_delivered_carrier_date) / 60.0
    from 
        orders o
    inner join 
        order_items oi on o.order_id = oi.order_id
    where 
        o.order_status not in ('canceled', 'unavailable')
        and o.order_approved_at is not null
        and o.order_delivered_carrier_date is not null
),

seller_performance_summary as (
    -- step 2: aggregate averages per seller
    select 
        seller_id,
        count(distinct order_id) as total_orders_fulfilled,
        avg(dispatch_hours) as avg_dispatch_hours
    from 
        seller_lead_times
    where 
        dispatch_hours >= 0 -- filtering out potential data entry errors
    group by 
        seller_id
    having 
        count(distinct order_id) >= 5 -- filtering for sellers with enough history to be statistically relevant
),

seller_efficiency_ranks as (
    -- step 3: segment sellers into deciles based on speed
    -- ntile(10) with order by desc puts the slowest 10% into bucket 1
    select 
        *,
        efficiency_decile = ntile(10) over (order by avg_dispatch_hours desc)
    from 
        seller_performance_summary
)

-- final output: isolating the "bottom 10%" slowest sellers
select 
    seller_id,
    total_orders_fulfilled,
    cast(avg_dispatch_hours as decimal(10,2)) as avg_dispatch_hours,
    cast(avg_dispatch_hours / 24.0 as decimal(10,2)) as avg_dispatch_days
from 
    seller_efficiency_ranks
where 
    efficiency_decile = 1 -- selecting the slowest decile
order by 
    avg_dispatch_hours desc;

Results:

seller_id	total_orders_fulfilled	avg_dispatch_hours	avg_dispatch_days
ecccfa2bb93b34a3bf033cc5d1dcdc69	14	622.66	25.94
6cf476a4ca74498db55cbccdaa9dcfb6	5	537.57	22.40
5d378b73ab7dd6f0418d743e5dcb0bd1	7	520.27	21.68
fa74b2f3287d296e9fbd2cc80f2d1cf1	5	431.49	17.98
66e0557ecc2b4dbea057e93f215f68d8	30	430.22	17.93
4b1eaadf791bdbbad8c4a35b65236d52	5	427.52	17.81
6ee85be3693ed79a8e80718743d80655	5	406.15	16.92
ed859002ad59dbf8cf3602696a6c3000	7	401.96	16.75
54965bbe3e4f07ae045b90b0b8541f52	73	401.74	16.74
cf8ab1616079e2793aa29d524df01bb1	6	375.38	15.64
5058e8c1e82653974541e83690655b4a	62	368.89	15.37
c990d6cf976a5718aaedc539f383ab88	9	364.59	15.19
b5abf4f36adc043117b4fca82c22984c	8	360.96	15.04
633ecdf879b94b5337cca303328e4a25	6	360.05	15.00
7fc87cc3e89b3d1d5cabdca32f8485aa	14	353.33	14.72
38874e327ce94d11390b96eb42d61928	6	347.97	14.50
817f85dbb65aa3e70831d90fe75cdf89	11	345.08	14.38
b1b3948701c5c72445495bd161b83a4c	15	338.28	14.10
6fd52c528dcb38be2eea044946b811f8	67	331.49	13.81
8bd0e3abda539b9479c4b44a691be1ec	6	330.62	13.78
822b63912576852aea9a8436d72317b7	6	311.81	12.99
538caafddff204241cecbf3a02e6b3cf	8	310.68	12.95
8603f0038fe0e52fedd3382d266723de	10	303.20	12.63
c42fd8e4d47dfb18ce5222f2dd7752f9	7	300.21	12.51
8d46553a36e68f95350a200c12f8f2e2	9	290.26	12.09
95b293867b5862941c9cd74f756a3c68	9	289.23	12.05
6d988d6174a2c27441597174f8905515	19	288.83	12.03
17f51e7198701186712e53a39c564617	56	281.81	11.74
adcf50477d6a1f5d91bb9d54c4903cb5	18	281.75	11.74
b19f3ca2ea475913750f25a5c37c8d8f	19	279.30	11.64
7c67e1448b00f6e969d365cea6b010ab	980	278.39	11.60
cee48807215b30a12ca2ca10ffb5f250	39	273.07	11.38
d32e3909cfb714fb2a6cb092c85b9c5f	10	269.76	11.24
8444e55c1f13cd5c179851e5ca5ebd00	93	262.63	10.94
973f21788dfab357250f69a8dcb7ddee	9	262.33	10.93
ad781527c93d00d89a11eecd9dcad7c1	39	261.46	10.89
ea6b12bf9ffe2bac34602ec631d97a47	9	259.74	10.82
d71d863e5ef30d94e440c11be17dcd8f	23	259.41	10.81
312ba1d77e9c332ef21f9598b7f64cd7	7	258.87	10.79
4cf490a58259286ada5ba8525ba9e84a	8	257.20	10.72
a7f13822ceb966b076af67121f87b063	75	256.97	10.71
f1ed6bd0a9b11b581f16c851c6a5a527	17	256.92	10.70
a2e874074c877c5a05abae80ad6e488f	40	255.92	10.66
2eb70248d66e0e3ef83659f71b244378	188	253.07	10.54
20b54c376b794ed028df09a3cd88e8dc	7	243.37	10.14
579891617139df7d8671d373f0669622	7	243.31	10.14
a425f92c199eb576938df686728acd20	16	242.70	10.11
b55638ad525e906c698fa2ce742c1742	11	239.22	9.97
1b8b75e227c9a9c100d0c210fb6176ce	17	236.98	9.87
02d35243ea2e497335cd0f076b45675d	14	234.78	9.78
a2fa0bdc798ee84cdb08281337cf4fb6	8	233.81	9.74
3faf68a3b0af94b10bac70d86077be49	7	232.27	9.68
30c7f28fd3a5897b2c82d152bb760c17	6	230.73	9.61
70b52a4cfc823994561b00bad161b4ed	8	228.05	9.50
610f72e407cdd7caaa2f8167b0163fd8	15	221.48	9.23
835f0f7810c76831d6c7d24c7a646d4d	42	219.52	9.15
2089a6d640999f9b9141ac719b2af596	42	219.48	9.14
054694fa03fe82cec4b7551487331d74	21	215.54	8.98
38102b031c2a15e54623d711bfc753d3	9	205.01	8.54
f4c4daa86e30c7e5a553a8d518ac03a5	18	204.63	8.53
f1b93673502375d491780bb49d615dbc	8	199.69	8.32
93dc87703c046b603023e75222018b45	35	198.00	8.25
ec4608a1f76453166bb312b2968aeaf4	17	197.64	8.24
54219883e72aad869adfb2a54b7bfa0f	5	196.32	8.18
f76a3b1349b6df1ee875d1f3fa4340f0	24	194.54	8.11
5f67c6082caacb26e431a7b17940cece	13	192.03	8.00
87e24fc7052259b1f7f607d61c10d8be	7	191.80	7.99
33576ec5412fb5905d876f12f33bfde6	31	191.75	7.99
c37b2059d4f90d4feead554e5246565e	12	191.61	7.98
4917cee8d902e13428c3ec4b1ca6f315	31	191.43	7.98
88460e8ebdecbfecb5f9601833981930	243	191.06	7.96
9baf5cb77970f539089d09a38bcec5c3	31	189.97	7.92
dc8798cbf453b7e0f98745e396cc5616	37	186.29	7.76
89bc797e2e29667aa035d4368f9b7f92	17	185.90	7.75
b335c59ab742f751a85db9c411a86739	30	184.34	7.68
7ecd59e5e20407131822c1a68ac59c1f	24	184.16	7.67
9539213aa8e023c01937bef95dc006da	9	184.06	7.67
81336a0f57a33776419727a300249e90	16	183.55	7.65
834f8533b2ecb6598dd004ff3de7203a	41	182.89	7.62
38e6dada03429a47197d5d584d793b41	7	182.66	7.61
8f78f0903005064036736c7173a5c2ed	13	181.55	7.56
6bd69102ab48df500790a8cecfc285c2	5	180.75	7.53
41ab63a91b8b264e8c8780368bf1dd5b	12	180.66	7.53
f419677537694f0462a91990df6ca44f	6	179.79	7.49
213b25e6f54661939f11710a6fddb871	153	178.06	7.42
ab3e0c171fe84a7ba7de130f19cfb485	6	175.02	7.29
784ba75dd9d20200c4caed3d7a77141a	19	173.63	7.23
741f8c587cb4248ebc5b779a0e8e0e51	5	173.47	7.23
cb41bfbcbda0aea354a834ab222f9a59	11	171.51	7.15
6562efe88ce0826a4ca4f189f03b4b84	9	171.48	7.15
f7496d659ca9fdaf323c0aae84176632	25	170.90	7.12
5151aea44289d6c6b090ee31c2132508	7	170.73	7.11
c60b801f2d52c7f7f91de00870882a75	40	169.90	7.08
5882820ef11c4c068eea0677a7a879b4	14	168.89	7.04
ffff564a4f9085cd26170f4732393726	8	168.42	7.02
a67780aba9dd436e7ac0aa1f889e73ec	6	167.90	7.00
db2956745b3a8e9f3785c99f34b5d25e	46	166.93	6.96
1b7e5006cb25dd23ce8686691013b121	11	166.48	6.94
b6dc74ed30f124f95227bed4e43a3bed	11	166.43	6.93
cb5df0dcb9e280b1780e9d589889f2c7	11	165.89	6.91
cce6ab8d1682639fe45ab70234f1665f	62	164.88	6.87
a49928bcdf77c55c6d6e05e09a9b4ca5	98	164.35	6.85
0307f7565ff85b299b6e5ef162b1b1a9	6	163.34	6.81
c1ed9e3daf6154542826a6996af0fb8c	8	163.29	6.80
c6381d2d013342748761e906d45aff76	15	162.97	6.79
712e6ed8aa4aa1fa65dab41fed5737e4	79	162.96	6.79
e504a4e2efaa45cbff7e268a2c58c956	10	162.95	6.79
beadbee30901a7f61d031b6b686095ad	64	162.53	6.77
1da366cade6d8276e7d8beea7af5d4bf	43	161.53	6.73
7fdb0720c8d7c9075538b365dc8c3a22	9	159.16	6.63
85d9eb9ddc5d00ca9336a2219c97bb13	506	158.41	6.60
4e17c65a516f69d023a2ae78b84f28d6	60	158.16	6.59
dd7ddc04e1b6c2c614352b383efe2d36	121	157.70	6.57
a888faf2d1baececa6baf9c3d603ee1f	53	157.43	6.56
c64a2aec32cc408a8a4c6d7c46017f91	9	157.34	6.56
1976a05ecf2996d6ffb7b5430e635fe7	8	156.38	6.52
6c177e38df6d3f34182b1f1d427231bf	55	154.76	6.45
6a51fc556dab5f766ced6fbc860bc613	25	153.92	6.41
d5b9a084373da994a6e37f732169a853	7	153.90	6.41
f97e2b0650c2068227e1d366140eb62f	16	153.14	6.38
e21a1e8be4715b0fcfd5385643bc49c4	21	152.46	6.35
6039e27294dc75811c0d8a39069f52c0	64	152.18	6.34
1dcfa3835714681d4ba4a93bc1e0fd41	12	150.38	6.27
8378eb36b6b70c5734e83ad7b8cdc3b7	7	149.29	6.22
de66a66f2dd06bb9ec37aa96987466a3	7	148.99	6.21
ba6ce6de456f6c11d08c17701ba5c26f	6	148.86	6.20
df560393f3a51e74553ab94004ba5c87	28	148.54	6.19
7bcd7c5f8631701474db233ccf1c094b	9	147.80	6.16
ffeee66ac5d5a62fe688b9d26f83f534	14	147.67	6.15
33c51922b6e3c16b24a8f28e5833c99d	18	145.64	6.07
8a87611c08849ffeeccab52aa798b6c7	6	145.53	6.06
dfc475d54e1b6dbeeb7d7d9bdaa63827	18	144.34	6.01
e81e1f9c3e762e620f3fb60a51499e16	17	143.37	5.97
c24173065483e421f9e5e7f599de101d	9	143.09	5.96
99002261c568a84cce14d43fcffb43ea	36	142.65	5.94
5b179e9e8cc7ab6fd113a46ca584da81	34	142.05	5.92
71271995e85f5b8530be99ed54a91b89	6	141.72	5.90
8e6cc767478edae941d9bd9eb778d77a	104	140.90	5.87
c3acdfac4e3e97ff87529454fbc03642	12	140.80	5.87
f08a5b9dd6767129688d001acafc21e5	11	140.71	5.86
5b0cc932433fa5184b5b94bfe6bcc256	12	140.10	5.84
bbad7e518d7af88a0897397ffdca1979	68	139.96	5.83
855668e0971d4dfd7bef1b6a4133b41b	306	139.38	5.81
e06f09ec0a4aca210779cf1cfc63cf19	29	139.33	5.81
8a40ae794fa3f9892cee24bd5af0488b	5	139.15	5.80
04308b1ee57b6625f47df1d56f00eedf	93	139.12	5.80
37dd29b36c458d82f74a953c40c43645	7	138.78	5.78
e067ad2c1c0b48758eb1b5228bcf7a68	54	138.37	5.77
eb4df17aed01d918c65f0f8d650900c0	11	137.84	5.74
6973a06f484aacf400ece213dbf3d946	92	137.67	5.74
f9903dd0f812c7e771fcad47b6fff231	10	137.38	5.72
f9244d45189d3a3605499abddeade7d5	21	136.88	5.70
9dda5bbacd45e18d6485fee649205d09	19	136.88	5.70
e8b3a3a38279a82f0e5d006d5e5b7d2c	80	136.76	5.70
d93844a9c55ba7ce353388bcf849ea56	12	136.60	5.69
271c58a1d139c45eaf3316107c6d3a3b	28	135.53	5.65
236585a4ddb5fea9fa40233e2249ed91	11	134.85	5.62
4c8b8048e33af2bf94f2eb547746a916	19	134.26	5.59
25debeafbce801fdd479539350185eee	5	133.99	5.58
cac4c8e7b1ca6252d8f20b2fc1a2e4af	74	132.26	5.51
656591be56071d4c9ef4e5fee78a578a	7	132.12	5.50
c4d51195486dc781531876a7d00453d8	17	131.29	5.47
4d600e08ecbe08258c79e536c5a42fee	7	129.89	5.41
184a67a8f9f63234d3a92340bbdb727f	8	129.79	5.41
dd2bdf855a9172734fbc3744021ae9b9	106	128.78	5.37
63b464dbf392c7b80d12d932fa7cafed	36	128.64	5.36
cd843d4cf8ef32827de0396605163ba3	5	128.59	5.36
4e8dacf3d38f281ae26c3e0321d92d88	8	128.42	5.35
343e716476e3748b069f980efbaa294e	17	127.90	5.33
014c0679dd340a0e338872e7ec85666a	14	127.87	5.33
e819bcfade7b5d88a27325eb6cfd62c5	5	127.71	5.32
6e0908ef4d4efadbc3cc2b74ea477cb0	8	127.41	5.31
575df70bde3f9f2b30bf8d2e9910d725	27	127.12	5.30
0873d9f8f36123f8d910f4760e788cfb	6	126.96	5.29
373ee4a3a775a733770ca3f790b8b9ac	13	126.89	5.29
062ce95fa2ad4dfaedfc79260130565f	54	126.73	5.28

Insight:

Extreme Outliers: The absolute slowest seller (ecccfa...) takes an average of 25.9 days just to hand over the package to the carrier! However, the worst offenders at the very top of the list mostly have low order volumes (between 5 and 14 orders), suggesting they might be inactive, poorly managed, or drop-shipping from overseas.
The High-Volume Red Flag: Seller 7c67e144... is a massive problem for the platform. They fulfilled 980 orders but have an incredibly slow average dispatch time of 11.6 days. Because of their high volume, this single seller is likely generating a huge amount of late deliveries and negative customer reviews.
Unacceptable Baselines: In modern e-commerce, dispatching an item should take 1 to 2 days. The sellers on this “Bottom” list are consistently taking over a week (7+ days) just to get the product out of their warehouse. This indicates a severe supply chain or inventory management issue on the seller’s side.
Actionable Next Step: Olist needs to enforce strict Service Level Agreements (SLAs). Sellers like 7c67e144... should be penalized or warned, as they are large enough to seriously damage Olist’s brand reputation with their 11-day packing delays.

Pareto Analysis for Sellers

17. Do the top 10% of sellers generate more than 50% of the platform’s total revenue? Furthermore, which product categories are dominated by a single ‘Monopoly Seller’ vs. categories with a healthy, competitive mix of multiple vendors?

Solution:

Firstly, we calculate global seller revenue (seller_global_revenue and global_pareto CTEs): We join the order_items, payments, and orders tables to sum up the total revenue for each seller across the entire platform. We then use the ntile(10) window function to rank sellers into deciles, which helps identify the platform-wide revenue concentration.
Secondly, we calculate seller market share per category (category_seller_share CTE): We group the sales data by both product_category_name and seller_id to find how much revenue each seller generated in that specific category. We also use a window function sum() over (partition by product_category_name) to calculate the total overall revenue for the entire category without grouping away the seller-level details.
Thirdly, we identify the top seller’s dominance (category_dominance CTE): Grouping strictly by category, we count the total number of unique vendors (total_vendors) and find the highest revenue generated by a single seller (max(seller_category_revenue)). By dividing this top seller’s revenue by the total category revenue, we calculate the exact market share percentage held by the leading vendor (leader_market_share_pct).
Finally, we classify the competitive health of each market (Final SELECT): We use a CASE statement to evaluate the market conditions. Categories where a single seller holds over 50% market share alongside fewer than 5 total vendors are flagged as ‘Monopoly / Highly Concentrated’, while categories with a leading share under 15% and over 20 vendors are labeled as ‘Healthy / Fragmented Competition’.

Show SQL Code

with seller_global_revenue as (
    -- step 1: calculate total revenue for every seller across the platform
    select 
        oi.seller_id,
        sum(p.payment_value) as total_seller_revenue
    from 
        order_items oi
    inner join 
        payments p on oi.order_id = p.order_id
    inner join 
        orders o on oi.order_id = o.order_id
    where 
        o.order_status not in ('canceled', 'unavailable')
    group by 
        oi.seller_id
),

global_pareto as (
    -- step 2: rank sellers into deciles to check platform-wide concentration
    select 
        seller_id,
        total_seller_revenue,
        ntile(10) over (order by total_seller_revenue desc) as seller_decile,
        sum(total_seller_revenue) over () as platform_total_revenue
    from 
        seller_global_revenue
),

category_seller_share as (
    -- step 3: calculate revenue share for each seller within their specific category
    select 
        prod.product_category_name,
        oi.seller_id,
        sum(p.payment_value) as seller_category_revenue,
        sum(sum(p.payment_value)) over (partition by prod.product_category_name) as total_category_revenue
    from 
        order_items oi
    inner join 
        products prod on oi.product_id = prod.product_id
    inner join 
        payments p on oi.order_id = p.order_id
    inner join 
        orders o on oi.order_id = o.order_id
    where 
        o.order_status not in ('canceled', 'unavailable')
        and prod.product_category_name is not null
    group by 
        prod.product_category_name, oi.seller_id
),

category_dominance as (
    -- step 4: identify the share of the 'top seller' in each category
    select 
        product_category_name,
        count(distinct seller_id) as total_vendors,
        max(seller_category_revenue) as top_seller_revenue,
        total_category_revenue,
        -- market share of the leading seller
        cast(100.0 * max(seller_category_revenue) / total_category_revenue as decimal(5,2)) as leader_market_share_pct
    from 
        category_seller_share
    group by 
        product_category_name, total_category_revenue
)

-- final output: categorizing product categories by competitive health
select 
    product_category_name,
    total_vendors,
    leader_market_share_pct,
    case 
        when leader_market_share_pct >= 50 and total_vendors < 5 then 'Monopoly / Highly Concentrated'
        when leader_market_share_pct >= 30 then 'Dominant Leader'
        when leader_market_share_pct < 15 and total_vendors > 20 then 'Healthy / Fragmented Competition'
        else 'Moderate Competition'
    end as market_health_status
from 
    category_dominance
order by 
    leader_market_share_pct desc;

Results:

product_category_name	total_vendors	leader_market_share_pct	market_health_status
cds_dvds_musicais	1	100.00	Monopoly / Highly Concentrated
la_cuisine	2	94.95	Monopoly / Highly Concentrated
portateis_cozinha_e_preparadores_de_alimentos	5	74.34	Dominant Leader
moveis_escritorio	34	72.74	Dominant Leader
casa_conforto_2	4	71.76	Monopoly / Highly Concentrated
telefonia_fixa	49	70.72	Dominant Leader
tablets_impressao_imagem	6	68.33	Dominant Leader
fraldas_higiene	7	67.53	Dominant Leader
pcs	9	67.33	Dominant Leader
seguros_e_servicos	2	64.42	Monopoly / Highly Concentrated
pc_gamer	2	59.92	Monopoly / Highly Concentrated
flores	3	51.52	Monopoly / Highly Concentrated
portateis_casa_forno_e_cafe	15	50.67	Dominant Leader
moveis_colchao_e_estofado	5	46.49	Dominant Leader
eletrodomesticos	48	46.29	Dominant Leader
casa_conforto	20	46.09	Dominant Leader
livros_importados	7	45.89	Dominant Leader
artes	35	43.33	Dominant Leader
bebidas	36	43.13	Dominant Leader
fashion_roupa_feminina	11	42.24	Dominant Leader
artigos_de_festas	12	41.25	Dominant Leader
fashion_calcados	13	41.23	Dominant Leader
artigos_de_natal	17	40.30	Dominant Leader
fashion_underwear_e_moda_praia	9	39.90	Dominant Leader
dvds_blu_ray	10	36.88	Dominant Leader
fashion_roupa_infanto_juvenil	4	34.49	Dominant Leader
agro_industria_e_comercio	34	34.30	Dominant Leader
audio	36	34.18	Dominant Leader
papelaria	173	34.14	Dominant Leader
industria_comercio_e_negocios	29	33.68	Dominant Leader
eletrodomesticos_2	46	32.97	Dominant Leader
ferramentas_jardim	235	32.57	Dominant Leader
artes_e_artesanato	13	32.06	Dominant Leader
livros_interesse_geral	53	31.98	Dominant Leader
construcao_ferramentas_iluminacao	36	31.82	Dominant Leader
construcao_ferramentas_ferramentas	24	30.68	Dominant Leader
fashion_esporte	10	30.27	Dominant Leader
moveis_sala	55	30.00	Dominant Leader
construcao_ferramentas_jardim	32	29.79	Moderate Competition
construcao_ferramentas_seguranca	34	29.18	Moderate Competition
fashion_roupa_masculina	13	28.79	Moderate Competition
sinalizacao_e_seguranca	46	28.41	Moderate Competition
livros_tecnicos	26	27.78	Moderate Competition
moveis_quarto	23	23.77	Moderate Competition
malas_acessorios	73	23.52	Moderate Competition
alimentos_bebidas	33	22.65	Moderate Competition
consoles_games	81	22.19	Moderate Competition
	256	22.11	Moderate Competition
cool_stuff	263	21.04	Moderate Competition
musica	19	20.20	Moderate Competition
instrumentos_musicais	68	20.18	Moderate Competition
telefonia	146	19.99	Moderate Competition
market_place	59	19.92	Moderate Competition
cine_foto	14	19.66	Moderate Competition
climatizacao	52	19.63	Moderate Competition
moveis_decoracao	366	19.03	Moderate Competition
casa_construcao	68	18.41	Moderate Competition
alimentos	58	18.17	Moderate Competition
eletroportateis	101	16.91	Moderate Competition
perfumaria	174	16.69	Moderate Competition
relogios_presentes	98	16.31	Moderate Competition
brinquedos	249	15.49	Moderate Competition
cama_mesa_banho	195	15.05	Moderate Competition
bebes	242	14.60	Healthy / Fragmented Competition
construcao_ferramentas_construcao	127	14.08	Healthy / Fragmented Competition
eletronicos	146	12.82	Healthy / Fragmented Competition
moveis_cozinha_area_de_servico_jantar_e_jardim	51	12.82	Healthy / Fragmented Competition
fashion_bolsas_e_acessorios	99	11.58	Healthy / Fragmented Competition
automotivo	374	11.09	Healthy / Fragmented Competition
informatica_acessorios	285	10.01	Healthy / Fragmented Competition
pet_shop	135	7.22	Healthy / Fragmented Competition
utilidades_domesticas	463	7.10	Healthy / Fragmented Competition
esporte_lazer	477	5.27	Healthy / Fragmented Competition
beleza_saude	489	5.22	Healthy / Fragmented Competition

Insights:

True Monopolies in Niche Markets: Very small categories like cds_dvds_musicais and la_cuisine are completely run by 1 or 2 sellers holding 95% to 100% of the market share. This is expected for low-demand or highly specialized items.
The Illusion of Competition: This is a major finding. Categories like moveis_escritorio (office furniture) look competitive on paper because they have 34 vendors. However, one single seller controls almost 73% of the market. We see the exact same thing in telefonia_fixa (landline phones), where the top seller holds 70% despite having 48 competitors.
Healthy Core Markets: The best news for Olist is that its top-earning categories are heavily fragmented. Beleza_saude (health & beauty) and esporte_lazer (sports & leisure) each have nearly 500 active vendors, and the top seller only controls about 5% of the sales. This means the platform’s core revenue is highly secure.
Single Point of Failure (Risk): Any category tagged as “Dominant Leader” carries a huge supply chain risk. If the top seller in pcs (67% share) or moveis_escritorio leaves the platform or starts delivering late, Olist’s revenue and reputation in that entire category will instantly tank.
The Pareto Principle in Action: While the table focuses on category breakdowns rather than the overall platform revenue, it perfectly illustrates the 80/20 rule. Across dozens of categories, a single vendor is doing the heavy lifting and generating the vast majority of the sales volume.

Seller Churn Analysis

18. What is the churn rate of sellers on the platform? Specifically, how many sellers who made a sale in 2017 became inactive (zero sales) in 2018? Is there a correlation between high ‘Order Cancellation Rates’ and subsequent seller churn?

Solution:

Firstly, we aggregate annual activity per seller (seller_activity CTE): We join the orders and order_items tables to calculate each seller’s total_orders and canceled_orders grouped by the specific year they occurred (extracted from the timestamp using the year() function).
Secondly, we build the 2017 cohort and identify churn (seller_cohort_2017 CTE): We isolate sellers who were active in 2017 and perform a LEFT JOIN against their own 2018 activity data. If the 2018 record is missing (s18.seller_id is null), we classify them as ‘Churned (Inactive in 2018)’, otherwise they are ‘Retained’. We also compute their individual 2017 cancellation rate percentage here.
Finally, we aggregate metrics by churn status (Final SELECT): We group the cohort by their churn_status to count the exact number of sellers that stayed versus left. By averaging the cancellation_rate_2017 and the orders_2017 volume for both groups, we can directly observe the correlation between poor fulfillment performance (high cancellations) and subsequent seller churn on the platform.

Show SQL Code

with seller_activity as (
    -- step 1: identify activity years and cancellation counts for every seller
    select 
        oi.seller_id,
        year(o.order_purchase_timestamp) as activity_year,
        count(distinct o.order_id) as total_orders,
        sum(case when o.order_status = 'canceled' then 1 else 0 end) as canceled_orders
    from 
        orders o
    inner join 
        order_items oi on o.order_id = oi.order_id
    group by 
        oi.seller_id, year(o.order_purchase_timestamp)
),

seller_cohort_2017 as (
    -- step 2: isolate sellers who were active in 2017 and track their 2018 status
    select 
        s17.seller_id,
        s17.total_orders as orders_2017,
        cast(100.0 * s17.canceled_orders / s17.total_orders as decimal(5,2)) as cancellation_rate_2017,
        case 
            when s18.seller_id is null then 'Churned (Inactive in 2018)'
            else 'Retained (Active in 2018)'
        end as churn_status
    from 
        seller_activity s17
    left join 
        seller_activity s18 on s17.seller_id = s18.seller_id and s18.activity_year = 2018
    where 
        s17.activity_year = 2017
)

-- final output: comparing cancellation rates across churned vs retained sellers
select 
    churn_status,
    count(seller_id) as seller_count,
    -- average cancellation rate to identify the correlation
    cast(avg(cancellation_rate_2017) as decimal(5,2)) as avg_cancellation_rate_pct,
    -- average volume to see if smaller or larger sellers churn more
    cast(avg(orders_2017 * 1.0) as decimal(10,2)) as avg_2017_order_volume
from 
    seller_cohort_2017
group by 
    churn_status;

Results:

churn_status	seller_count	avg_cancellation_rate_pct	avg_2017_order_volume
Churned (Inactive in 2018)	685	6.06	6.34
Retained (Active in 2018)	1099	0.48	37.07

Insights:

Massive Drop-Off Rate: A huge chunk of the seller base did not survive into 2018. Out of 1,784 sellers who made sales in 2017, 685 of them (around 38%) completely stopped operating the following year.
Low Traction Means High Flight Risk: The sellers who churned never really gained momentum. They averaged only 6.34 orders for the entirety of 2017. In contrast, the retained sellers were moving roughly 37 orders a year. If a vendor doesn’t see quick sales, they abandon the platform.
The Cancellation Red Flag: There is a massive, direct correlation between failing to fulfill orders and leaving the platform. The churned sellers had an average cancellation rate of 6.06%, which is over 12 times worse than the healthy, retained sellers (who kept cancellations under 0.5%).
Profile of a Failing Seller: The data paints a very clear picture. The typical seller who leaves Olist is a low-volume vendor who struggles to actually deliver the few orders they get. This poor performance likely leads to account suspension or them simply giving up.

Identifying Underperforming Sellers

19. Which sellers have a high sales volume (\>50 orders) but a consistently low average review score. Can we identify specific sellers who are responsible for a disproportionate number of the platforms 1-star reviews?

Solution:

Firstly, we analyze the seller churn correlation (Part 1): We aggregate annual activity and cancellation rates per seller (seller_activity CTE), then isolate the 2017 cohort to track whether they were retained or churned in 2018 (seller_cohort_2017 CTE). The first output aggregates the average cancellation rates and order volumes for churned versus retained sellers.
Secondly, we aggregate review statistics per seller (Part 2 - seller_review_stats CTE): To evaluate seller quality, we join the order_items, orders, and order_reviews tables. Grouping strictly by seller_id, we calculate their total_orders, their avg_review_score, and calculate their specific one_star_count.
Thirdly, we calculate the platform-wide baseline (Part 2 - platform_total_one_stars CTE): We independently calculate the grand_total_one_stars across the entire database. This acts as the denominator for measuring an individual seller’s negative impact against the whole platform.
Finally, we identify the worst offenders and their platform impact (Part 2 Final SELECT): We CROSS JOIN the seller stats with the platform baseline. We filter for high-volume sellers (total_orders > 50) with poor ratings (avg_review_score < 3). By computing pct_of_platform_one_stars, we can pinpoint exactly how much of the platform’s total negative feedback is disproportionately driven by these specific vendors.

Show SQL Code


with seller_activity as (
    -- step 1: identify activity years and cancellation counts for every seller
    select 
        oi.seller_id,
        year(o.order_purchase_timestamp) as activity_year,
        count(distinct o.order_id) as total_orders,
        sum(case when o.order_status = 'canceled' then 1 else 0 end) as canceled_orders
    from 
        orders o
    inner join 
        order_items oi on o.order_id = oi.order_id
    group by 
        oi.seller_id, year(o.order_purchase_timestamp)
),

seller_cohort_2017 as (
    -- step 2: isolate sellers who were active in 2017 and track their 2018 status
    select 
        s17.seller_id,
        s17.total_orders as orders_2017,
        cast(100.0 * s17.canceled_orders / s17.total_orders as decimal(5,2)) as cancellation_rate_2017,
        case 
            when s18.seller_id is null then 'Churned (Inactive in 2018)'
            else 'Retained (Active in 2018)'
        end as churn_status
    from 
        seller_activity s17
    left join 
        seller_activity s18 on s17.seller_id = s18.seller_id and s18.activity_year = 2018
    where 
        s17.activity_year = 2017
)

-- output 1: comparing cancellation rates across churned vs retained sellers
select 
    churn_status,
    count(seller_id) as seller_count,
    cast(avg(cancellation_rate_2017) as decimal(5,2)) as avg_cancellation_rate_pct,
    cast(avg(orders_2017 * 1.0) as decimal(10,2)) as avg_2017_order_volume
from 
    seller_cohort_2017
group by 
    churn_status;


/* PART 2: Seller Review Performance & 1-Star Concentration
    Objective: 
        1. Identify high-volume sellers (>50 orders) with poor ratings (<3 stars).
        2. Calculate their contribution to the platform's total 1-star reviews.
*/

with seller_review_stats as (
    -- step 1: aggregate review scores and total 1-stars per seller
    select 
        oi.seller_id,
        count(distinct o.order_id) as total_orders,
        avg(orv.review_score * 1.0) as avg_review_score,
        sum(case when orv.review_score = 1 then 1 else 0 end) as one_star_count
    from 
        order_items oi
    inner join 
        orders o on oi.order_id = o.order_id
    inner join 
        order_reviews orv on o.order_id = orv.order_id
    group by 
        oi.seller_id
),

platform_total_one_stars as (
    -- step 2: get the total number of 1-star reviews across the whole platform
    select sum(case when review_score = 1 then 1 else 0 end) as grand_total_one_stars
    from order_reviews
)

-- output 2: flagging high-risk sellers based on volume and negative sentiment
select 
    s.seller_id,
    s.total_orders,
    cast(s.avg_review_score as decimal(3,2)) as avg_review_score,
    s.one_star_count,
    -- concentration: what % of the entire platform's 1-star reviews come from this one seller?
    cast(100.0 * s.one_star_count / p.grand_total_one_stars as decimal(5,2)) as pct_of_platform_one_stars
from 
    seller_review_stats s
cross join 
    platform_total_one_stars p
where 
    s.total_orders > 50 
    and s.avg_review_score < 3
order by 
    s.one_star_count desc;

Results 1:

churn_status	seller_count	avg_cancellation_rate_pct	avg_2017_order_volume
Churned (Inactive in 2018)	685	6.06	6.34
Retained (Active in 2018)	1099	0.48	37.07

Results 2:

seller_id	total_orders	avg_review_score	one_star_count	pct_of_platform_one_stars
2eb70248d66e0e3ef83659f71b244378	198	2.71	81	0.71
1ca7077d890b907f89be8c954a02686a	114	2.20	80	0.70
a49928bcdf77c55c6d6e05e09a9b4ca5	98	2.95	37	0.32
54965bbe3e4f07ae045b90b0b8541f52	74	2.94	34	0.30
972d0f9cf61b499a4812cf0bfa3ad3c4	79	2.96	29	0.25

Insights:

The Worst Offenders: We isolated five specific sellers who process a solid volume of orders (between 74 and 198) but consistently fail their customers, keeping their average rating below 3 stars.
Disproportionate Damage: The top two sellers on this list (2eb702... and 1ca707...) are incredibly toxic to the platform. Together, these two vendors are responsible for 1.41% of all 1-star reviews generated across the entire Olist marketplace.
High Volume, High Failure Rate: Seller 1ca7077d... is the most concerning data point. They fulfilled 114 orders and racked up 80 one-star reviews. This suggests that a massive majority of their customers had a terrible experience, dragging their average score down to an abysmal 2.20.
Reputation Risk at Scale: Sellers hovering around the 2.9 score (a499..., 5496..., 972d...) might not look as disastrous at first glance, but because they process 70 to 100 orders, they are consistently delivering bad experiences to a large group of people.
Actionable Next Step: These vendors are actively hurting the Olist brand. The platform should immediately suspend or heavily penalize these top 5 sellers to protect future buyers.

Conclusion

What We Did:

Calculated the average dispatch times for sellers to identify the worst bottlenecks in the fulfillment process.
Performed a Pareto analysis on market share to see which categories are dominated by monopolies versus those with healthy competition.
Analyzed seller churn rates between 2017 and 2018, comparing the performance metrics of retained versus inactive vendors.
Isolated high-volume, low-rated sellers to measure their specific impact on the platform’s overall 1-star review count.

What We Obtained (Key Findings):

Severe Dispatch Delays: While some sellers operate efficiently, the bottom tier is completely failing. We identified high-volume vendors taking over 11 days just to pack an item, which creates a massive bottleneck.
The “Dominant Leader” Risk: Many categories look competitive but are actually carried by a single vendor holding up to 70% of the market. This creates a huge supply chain risk if that seller leaves. Luckily, our biggest revenue categories (like health & beauty) are highly fragmented and safe.
Failure Drives Churn: About 38% of 2017 sellers quit in 2018. The data shows they didn’t just leave; they failed out. Churned sellers averaged only 6 orders a year and had a cancellation rate 12 times higher than successful, active sellers.
A Few Bad Apples: We pinpointed a tiny group of sellers who are actively damaging the platform’s reputation. Just two specific vendors generated over 1.4% of all 1-star reviews on Olist, proving that strict seller quality enforcement is urgently needed.

← Previous Delivery Performance and SLA Analysis

Next → Payments and Risk Analysis