Customer Review and Satisfaction Analysis¶
Delivery Delay & Customer Satisfaction Analysis¶
23. What is the quantitative correlation between Delivery Delay (Actual Delivery Date - Estimated Delivery Date) and the Average Review Score? Specifically, at what 'delay threshold' (e.g., +1 day, +3 days) does the average customer rating strictly drop below 3 stars?
Solution:
Analytical Methodology for Delivery Delay and Sentiment Correlation
-
Quantification of Logistical Latency (
order_delay_metricsCTE): The initial phase of the analysis involves calculating the temporal variance between theorder_estimated_delivery_dateand theorder_delivered_customer_date. By performing an inner join between theordersandorder_reviewsdatasets, we isolate thedelay_days—where positive integers signify late deliveries—and associate them directly with the corresponding customerreview_score. -
Aggregation of Sentiment Baselines (
aggregated_scoresCTE): To evaluate the impact of fulfillment delays on customer perception, the data is aggregated at thedelay_dayslevel. This step computes the arithmetic mean of review scores and total transaction volumes for each specific interval, establishing the "On-Time" benchmark required to measure the rate of sentiment degradation. -
Identification of the 'Satisfaction Cliff' (Final Query): The final logic categorizes results into hierarchical satisfaction tiers via a
CASEstatement. We distinguish between 'Tolerable' delays (where the average score remains \(\geq 3.0\)) and 'Critical' delays (where scores fall below the \(3.0\) threshold). Furthermore, a percentage variance from the "On-Time" benchmark is calculated to quantify the financial and reputational risk associated with extended latencies. -
Statistical Significance and Boundary Control: To maintain the integrity of the findings and prevent outliers from skewing the commercial insights, the output is restricted to intervals with a
volume > 5. The temporal scope is further constrained to a range of \(-10\) to \(+30\) days to focus the business intelligence on the most statistically significant and actionable delivery scenarios.
SQL SQL Code
/* delivery satisfaction threshold analysis
objective: identify the "cliff" where delivery delays result in failing satisfaction scores (< 3 stars)
logic:
1. calculate 'delay_days' as the delta between actual and estimated delivery.
2. aggregate average review scores for every day of delay.
3. identify the first day where the average rating strictly falls below 3.0.
granularity: delay_days
*/
with order_delay_metrics as (
-- step 1: calculate the delay for every delivered order with a review
select
o.order_id,
-- delay: positive = late, 0 = on time, negative = early
datediff(day, o.order_estimated_delivery_date, o.order_delivered_customer_date) as delay_days,
r.review_score
from
orders o
inner join
order_reviews r on o.order_id = r.order_id
where
o.order_status = 'delivered'
and o.order_delivered_customer_date is not null
and o.order_estimated_delivery_date is not null
),
aggregated_scores as (
-- step 2: find average score and volume per delay day
select
delay_days,
count(order_id) as volume,
avg(review_score * 1.0) as avg_review_score
from
order_delay_metrics
group by
delay_days
)
-- final output: categorizing the impact of delays on sentiment
select
delay_days,
volume as order_volume,
cast(avg_review_score as decimal(10,2)) as avg_rating,
-- identifying the threshold status
case
when delay_days < 0 then 'Early (Bonus Rating)'
when delay_days = 0 then 'On-Time (High Satisfaction)'
when delay_days > 0 and avg_review_score >= 3.0 then 'Late (Tolerable)'
when delay_days > 0 and avg_review_score < 3.0 then 'Critical Delay (Satisfaction Cliff)'
else 'Standard'
end as satisfaction_tier,
-- calculating the percentage drop from the "On-Time" benchmark
cast(100.0 * (avg_review_score - (select avg_review_score from aggregated_scores where delay_days = 0))
/ (select avg_review_score from aggregated_scores where delay_days = 0) as decimal(10,2)) as pct_drop_from_ontime
from
aggregated_scores
where
volume > 5 -- filter for statistical relevance
and delay_days between -10 and 30 -- focusing on the most relevant window
order by
delay_days asc;
Results:
| delay_days | order_volume | avg_rating | satisfaction_tier | pct_drop_from_ontime |
|---|---|---|---|---|
| -10 | 4652 | 4.33 | Early (Bonus Rating) | 7.47 |
| -9 | 4632 | 4.29 | Early (Bonus Rating) | 6.44 |
| -8 | 4818 | 4.26 | Early (Bonus Rating) | 5.53 |
| -7 | 4841 | 4.26 | Early (Bonus Rating) | 5.55 |
| -6 | 3616 | 4.25 | Early (Bonus Rating) | 5.26 |
| -5 | 2221 | 4.19 | Early (Bonus Rating) | 3.96 |
| -4 | 1910 | 4.15 | Early (Bonus Rating) | 2.80 |
| -3 | 1726 | 4.12 | Early (Bonus Rating) | 2.16 |
| -2 | 1548 | 4.12 | Early (Bonus Rating) | 2.23 |
| -1 | 1457 | 4.15 | Early (Bonus Rating) | 2.97 |
| 0 | 1291 | 4.03 | On-Time (High Satisfaction) | 0.00 |
| 1 | 823 | 3.73 | Late (Tolerable) | -7.51 |
| 2 | 537 | 3.18 | Late (Tolerable) | -21.28 |
| 3 | 496 | 2.68 | Critical Delay (Satisfaction Cliff) | -33.47 |
| 4 | 437 | 2.49 | Critical Delay (Satisfaction Cliff) | -38.16 |
| 5 | 436 | 2.19 | Critical Delay (Satisfaction Cliff) | -45.58 |
| 6 | 408 | 1.81 | Critical Delay (Satisfaction Cliff) | -55.15 |
| 7 | 475 | 1.92 | Critical Delay (Satisfaction Cliff) | -52.50 |
| 8 | 332 | 1.72 | Critical Delay (Satisfaction Cliff) | -57.36 |
| 9 | 219 | 1.67 | Critical Delay (Satisfaction Cliff) | -58.68 |
| 10 | 208 | 1.59 | Critical Delay (Satisfaction Cliff) | -60.66 |
| 11 | 179 | 1.73 | Critical Delay (Satisfaction Cliff) | -57.20 |
| 12 | 155 | 1.66 | Critical Delay (Satisfaction Cliff) | -58.73 |
| 13 | 190 | 1.75 | Critical Delay (Satisfaction Cliff) | -56.68 |
| 14 | 169 | 1.59 | Critical Delay (Satisfaction Cliff) | -60.54 |
| 15 | 157 | 1.71 | Critical Delay (Satisfaction Cliff) | -57.68 |
| 16 | 95 | 1.68 | Critical Delay (Satisfaction Cliff) | -58.24 |
| 17 | 90 | 1.67 | Critical Delay (Satisfaction Cliff) | -58.68 |
| 18 | 73 | 1.73 | Critical Delay (Satisfaction Cliff) | -57.21 |
| 19 | 74 | 1.51 | Critical Delay (Satisfaction Cliff) | -62.47 |
| 20 | 87 | 1.66 | Critical Delay (Satisfaction Cliff) | -58.96 |
| 21 | 76 | 1.67 | Critical Delay (Satisfaction Cliff) | -58.57 |
| 22 | 64 | 1.72 | Critical Delay (Satisfaction Cliff) | -57.39 |
| 23 | 34 | 1.47 | Critical Delay (Satisfaction Cliff) | -63.54 |
| 24 | 53 | 1.53 | Critical Delay (Satisfaction Cliff) | -62.11 |
| 25 | 40 | 1.60 | Critical Delay (Satisfaction Cliff) | -60.33 |
| 26 | 34 | 1.35 | Critical Delay (Satisfaction Cliff) | -66.46 |
| 27 | 40 | 1.53 | Critical Delay (Satisfaction Cliff) | -62.19 |
| 28 | 46 | 1.61 | Critical Delay (Satisfaction Cliff) | -60.11 |
| 29 | 37 | 1.32 | Critical Delay (Satisfaction Cliff) | -67.17 |
| 30 | 14 | 1.29 | Critical Delay (Satisfaction Cliff) | -68.12 |
Insights:
- Early Delivery Bonus: On-time deliveries average a solid 4.03 rating. Beating the estimated delivery date by 1 to 10 days gives a slight "bonus," pushing the average score up to 4.33.
- The Tolerable Delay: Customers show a small amount of patience for minor hiccups. Packages arriving 1 or 2 days late take a hit but still manage to maintain an average score above 3 stars (3.73 and 3.18, respectively).
- The 3-Day Cliff: The exact threshold where satisfaction falls apart is 3 days late. At this point, the average rating strictly drops below 3 stars, crashing to 2.68 (a 33.5% drop compared to on-time deliveries).
- The Point of No Return: Once a delivery hits 6 days late, the average score plummets below 2.0. From that day onward, ratings mostly flatline near the 1.5 mark. At this stage, the customer experience is completely ruined, regardless of whether the package takes 7 days or 20 days to finally arrive.
Quality vs. Logistics Matrix Analysis¶
24. Can we identify product categories that consistently have Fast Delivery (Top 25% speed) but Low Review Scores (Bottom 25% rating)? Conversely, which categories have slow delivery but high scores?
Solution:
-
The initial phase of the analysis involves the aggregation of delivery lead times and customer satisfaction scores at the category level. By joining the
orders,order_items,products, andorder_reviewsdatasets, we calculate the arithmetic mean of theavg_delivery_lead_time(measuring the duration from purchase to customer delivery) and theavg_review_score. AHAVINGclause is utilized to restrict the results to categories with atotal_orders > 30, ensuring the statistical integrity of the benchmarks. -
To establish relative performance tiers, we apply the
percent_rank()window function to both metrics. Thespeed_rankis calculated in ascending order (where 0.0 represents the fastest 1% of the platform), while therating_rankis also calculated in ascending order (where 0.0 represents the lowest 1% of scores). This normalization allows for a direct comparison across diverse product lines regardless of varying volume. -
The categorical data is then segmented into a four-quadrant matrix using a conditional
CASEstatement. We specifically isolate 'Product/Catalog Issues'—defined as categories within the top 25% for speed (speed_rank <= 0.25) but the bottom 25% for satisfaction (rating_rank <= 0.25). Conversely, 'Logistics Bottlenecks' are identified as categories in the bottom 25% for speed (speed_rank >= 0.75) that maintain top 25% sentiment scores (rating_rank >= 0.75). -
The resulting classification provides actionable business intelligence for supply chain and catalog management. By identifying the 'Systemic Failure' quadrant (Slow and Hated) versus the 'Gold Standard' (Fast and Loved), stakeholders can prioritize interventions—distinguishing between the need for logistical optimization versus fundamental improvements in product quality or catalog accuracy.
SQL SQL Code
/* quality vs. logistics matrix analysis
objective: segment categories to identify if low scores are caused by product quality or delivery speed
logic:
1. calculate average lead time (purchase to delivery) and average review score per category.
2. use percent_rank() to identify categories in the top/bottom 25th percentiles.
3. categorize into the 4 quadrants of the quality/logistics matrix.
granularity: product_category_name
*/
with category_metrics as (
-- step 1: aggregate logistics and satisfaction metrics at the category level
select
p.product_category_name,
count(o.order_id) as total_orders,
-- lead time speed (days)
avg(datediff(day, o.order_purchase_timestamp, o.order_delivered_customer_date) * 1.0) as avg_delivery_lead_time,
-- customer satisfaction
avg(r.review_score * 1.0) as avg_review_score
from
orders o
inner join
order_items oi on o.order_id = oi.order_id
inner join
products p on oi.product_id = p.product_id
inner join
order_reviews r on o.order_id = r.order_id
where
o.order_status = 'delivered'
and o.order_delivered_customer_date is not null
and p.product_category_name is not null
group by
p.product_category_name
having
count(o.order_id) > 30 -- filter for statistical significance
),
category_percentiles as (
-- step 2: assign percentile ranks (0.0 to 1.0)
select
*,
-- speed: 0.0 is fastest, 1.0 is slowest
speed_rank = percent_rank() over (order by avg_delivery_lead_time asc),
-- rating: 0.0 is lowest, 1.0 is highest
rating_rank = percent_rank() over (order by avg_review_score asc)
from
category_metrics
)
-- final output: matrix segmentation
select
product_category_name,
total_orders,
cast(avg_delivery_lead_time as decimal(10,2)) as avg_lead_time_days,
cast(avg_review_score as decimal(10,2)) as avg_rating,
-- step 3: segmenting into the quality/logistics quadrants
case
-- fast delivery (top 25%) + low score (bottom 25%)
when speed_rank <= 0.25 and rating_rank <= 0.25 then 'Product/Catalog Issue (Fast but Hated)'
-- slow delivery (bottom 25%) + high score (top 25%)
when speed_rank >= 0.75 and rating_rank >= 0.75 then 'Logistics Bottleneck (Slow but Loved)'
-- fast delivery + high score
when speed_rank <= 0.25 and rating_rank >= 0.75 then 'Gold Standard (Fast & Loved)'
-- slow delivery + low score
when speed_rank >= 0.75 and rating_rank <= 0.25 then 'Systemic Failure (Slow & Hated)'
else 'Standard Performance'
end as matrix_segment
from
category_percentiles
order by
avg_rating asc;
Results:
| product_category_name | total_orders | avg_lead_time_days | avg_rating | matrix_segment |
|---|---|---|---|---|
| fraldas_higiene | 37 | 10.65 | 3.38 | Standard Performance |
| moveis_escritorio | 1664 | 20.78 | 3.52 | Systemic Failure (Slow & Hated) |
| fashion_roupa_masculina | 124 | 12.74 | 3.76 | Standard Performance |
| telefonia_fixa | 253 | 12.60 | 3.76 | Standard Performance |
| audio | 359 | 13.27 | 3.83 | Systemic Failure (Slow & Hated) |
| artigos_de_festas | 42 | 9.14 | 3.83 | Product/Catalog Issue (Fast but Hated) |
| casa_conforto | 430 | 13.47 | 3.85 | Systemic Failure (Slow & Hated) |
| moveis_colchao_e_estofado | 37 | 14.41 | 3.89 | Systemic Failure (Slow & Hated) |
| cama_mesa_banho | 10985 | 12.71 | 3.92 | Standard Performance |
| moveis_sala | 494 | 13.80 | 3.94 | Systemic Failure (Slow & Hated) |
| 1533 | 12.71 | 3.94 | Standard Performance | |
| moveis_decoracao | 8159 | 12.77 | 3.95 | Standard Performance |
| fashion_roupa_feminina | 46 | 11.85 | 3.96 | Standard Performance |
| casa_construcao | 593 | 13.18 | 3.96 | Systemic Failure (Slow & Hated) |
| construcao_ferramentas_seguranca | 182 | 11.83 | 3.97 | Standard Performance |
| informatica_acessorios | 7672 | 7.49 | 3.98 | Product/Catalog Issue (Fast but Hated) |
| telefonia | 4408 | 12.80 | 4.00 | Standard Performance |
| moveis_cozinha_area_de_servico_jantar_e_jardim | 272 | 11.83 | 4.03 | Standard Performance |
| fashion_underwear_e_moda_praia | 126 | 13.56 | 4.05 | Standard Performance |
| climatizacao | 284 | 12.13 | 4.05 | Standard Performance |
| eletronicos | 2711 | 12.79 | 4.07 | Standard Performance |
| artigos_de_natal | 143 | 14.48 | 4.07 | Standard Performance |
| relogios_presentes | 5825 | -2.33 | 4.07 | Standard Performance |
| market_place | 303 | 12.09 | 4.07 | Standard Performance |
| bebes | 2967 | 12.36 | 4.08 | Standard Performance |
| ferramentas_jardim | 4254 | 13.59 | 4.08 | Standard Performance |
| artes | 195 | 11.21 | 4.08 | Standard Performance |
| construcao_ferramentas_iluminacao | 293 | 9.51 | 4.09 | Standard Performance |
| agro_industria_e_comercio | 206 | 11.58 | 4.09 | Standard Performance |
| construcao_ferramentas_construcao | 913 | 10.57 | 4.09 | Standard Performance |
| sinalizacao_e_seguranca | 196 | 10.31 | 4.09 | Standard Performance |
| utilidades_domesticas | 6780 | 10.86 | 4.11 | Standard Performance |
| automotivo | 4117 | 1.63 | 4.12 | Standard Performance |
| tablets_impressao_imagem | 81 | 12.80 | 4.12 | Standard Performance |
| construcao_ferramentas_jardim | 234 | 11.84 | 4.12 | Standard Performance |
| bebidas | 365 | 10.35 | 4.13 | Standard Performance |
| industria_comercio_e_negocios | 263 | -154.02 | 4.13 | Standard Performance |
| consoles_games | 1080 | 13.51 | 4.14 | Standard Performance |
| eletrodomesticos_2 | 232 | 13.80 | 4.16 | Standard Performance |
| esporte_lazer | 8436 | 6.97 | 4.17 | Standard Performance |
| dvds_blu_ray | 61 | 12.85 | 4.18 | Standard Performance |
| beleza_saude | 9456 | 11.88 | 4.19 | Standard Performance |
| fashion_bolsas_e_acessorios | 1994 | 11.03 | 4.19 | Standard Performance |
| cool_stuff | 3698 | 12.29 | 4.19 | Standard Performance |
| brinquedos | 4007 | 0.76 | 4.21 | Standard Performance |
| musica | 38 | 11.55 | 4.21 | Standard Performance |
| perfumaria | 3344 | 11.71 | 4.22 | Standard Performance |
| pet_shop | 1917 | 11.13 | 4.22 | Standard Performance |
| pcs | 196 | 13.25 | 4.22 | Logistics Bottleneck (Slow but Loved) |
| instrumentos_musicais | 646 | 12.93 | 4.22 | Logistics Bottleneck (Slow but Loved) |
| eletroportateis | 657 | 10.72 | 4.23 | Standard Performance |
| eletrodomesticos | 789 | 11.18 | 4.23 | Standard Performance |
| papelaria | 2459 | 12.66 | 4.24 | Standard Performance |
| moveis_quarto | 103 | 13.17 | 4.25 | Logistics Bottleneck (Slow but Loved) |
| alimentos | 487 | 9.52 | 4.26 | Gold Standard (Fast & Loved) |
| fashion_calcados | 256 | 15.44 | 4.29 | Logistics Bottleneck (Slow but Loved) |
| cine_foto | 71 | 10.42 | 4.30 | Gold Standard (Fast & Loved) |
| malas_acessorios | 1073 | 10.60 | 4.35 | Standard Performance |
| alimentos_bebidas | 271 | 10.78 | 4.37 | Standard Performance |
| livros_tecnicos | 264 | 10.58 | 4.39 | Gold Standard (Fast & Loved) |
| flores | 31 | 10.81 | 4.42 | Standard Performance |
| portateis_casa_forno_e_cafe | 73 | 9.79 | 4.44 | Gold Standard (Fast & Loved) |
| construcao_ferramentas_ferramentas | 99 | 11.71 | 4.44 | Standard Performance |
| livros_importados | 57 | 7.96 | 4.51 | Gold Standard (Fast & Loved) |
| livros_interesse_geral | 533 | -69.70 | 4.51 | Gold Standard (Fast & Loved) |
Insights:
- The "Fast but Hated" Problem: Categories like IT accessories (
informatica_acessorios) and party supplies (artigos_de_festas) arrive faster than average (7 to 9 days) but still get bottom-tier reviews. Since delivery speed isn't the issue, this strongly points to catalog problems—these items are likely of poor quality, arrive broken, or don't match their online descriptions. - The "Slow but Loved" Forgiveness: Customers are surprisingly patient when buying specific, high-value, or specialized goods. Personal computers (
pcs), musical instruments, and shoes take a long time to deliver (13 to 15 days), yet they maintain high ratings (above 4.2). People are clearly willing to wait as long as the product is exactly what they wanted. - The "Systemic Failure" Zone: Office furniture (
moveis_escritorio) is the worst performer. It takes almost 21 days to deliver on average and holds a terrible 3.52 rating. Bulky, heavy items are clearly causing huge headaches for our logistics partners, and the slow delivery is destroying customer satisfaction. - The "Gold Standard" Champions: Books (imported, technical, and general) and food (
alimentos) sit in the perfect quadrant. They are easy to package, ship fast, and have highly predictable quality. As a result, they enjoy the highest ratings on the platform (4.26 to 4.51). - Data Quality Warnings: While looking at the lead times, I noticed some categories like watches (
relogios_presentes) and industry/business goods have negative lead times (e.g., -154 days or -69 days). This tells me there are data entry errors in the system where the delivery date was somehow recorded before the purchase date. This is something I'll need to filter out in my data cleaning process.
Customer Review Deep-Dive Analysis¶
25. What is the distribution of review scores across different product categories? Specifically, which categories have the highest percentage of 1-star reviews, and can we identify any common issues or themes in the negative feedback for these categories?
Solution:
-
The analysis commences by establishing a direct linkage between customer reviews and their corresponding product categories (
category_reviewsCTE). By performing inner joins across theorder_reviews,order_items, andproductsdatasets, and subsequently applying a left join to the English translation table, we accurately map each review to a standardized category name. ADISTINCTclause is strategically implemented to prevent the artificial inflation of review counts in instances where multiple identical items are purchased within a single transaction. -
Following the data integration, the methodology proceeds to aggregate the review score distributions at the category level (
category_statsCTE). Utilizing conditional summation logic, the system computes the absolute frequency of each specific star rating—from one to five—alongside the total review volume for the category. ACOALESCEfunction is applied to ensure that any products missing categorical metadata are grouped into an 'unknown_category' rather than systematically omitted from the dataset. -
In the final phase, the absolute frequencies are mathematically converted into proportional metrics to facilitate equitable comparative analysis across diverse product lines. The query calculates the exact percentage of one-star (
pct_one_star) and five-star reviews. To ensure the analytical findings are statistically robust and not disproportionately skewed by low-volume anomalies, a strict threshold is enforced, restricting the output strictly to categories possessing a minimum of 50 total reviews. The results are ordered descendingly based on the one-star percentage, effectively isolating the product lines responsible for driving the highest proportion of negative customer sentiment on the platform.
SQL SQL Code
with category_reviews as (
-- step 1: link reviews to categories.
-- we use distinct to prevent overcounting reviews if a customer bought 3 of the EXACT same item in one order.
-- however, if an order has multiple different categories, the review score applies to both.
select distinct
r.review_id,
r.order_id,
r.review_score,
t.product_category_name_english as category_name
from
order_reviews r
inner join
order_items oi on r.order_id = oi.order_id
inner join
products p on oi.product_id = p.product_id
left join
product_category_name_translation t on p.product_category_name = t.product_category_name
where
r.review_score is not null
),
category_stats as (
-- step 2: aggregate the counts for each star rating per category
select
coalesce(category_name, 'unknown_category') as category,
count(review_id) as total_reviews,
sum(case when review_score = 1 then 1 else 0 end) as one_star_reviews,
sum(case when review_score = 2 then 1 else 0 end) as two_star_reviews,
sum(case when review_score = 3 then 1 else 0 end) as three_star_reviews,
sum(case when review_score = 4 then 1 else 0 end) as four_star_reviews,
sum(case when review_score = 5 then 1 else 0 end) as five_star_reviews
from
category_reviews
group by
coalesce(category_name, 'unknown_category')
)
-- final output: calculate percentages and rank by the highest rate of 1-star reviews
select
category,
total_reviews,
one_star_reviews,
cast((one_star_reviews * 100.0 / total_reviews) as decimal(5,2)) as pct_one_star,
cast((five_star_reviews * 100.0 / total_reviews) as decimal(5,2)) as pct_five_star
from
category_stats
where
total_reviews >= 50 -- filtering out obscure categories with too few reviews to be statistically relevant
order by
pct_one_star desc;
Results:
| category | total_reviews | one_star_reviews | pct_one_star | pct_five_star |
|---|---|---|---|---|
| fashion_male_clothing | 111 | 26 | 23.42 | 52.25 |
| construction_tools_safety | 166 | 30 | 18.07 | 51.81 |
| office_furniture | 1268 | 227 | 17.90 | 39.59 |
| audio | 348 | 58 | 16.67 | 53.16 |
| unknown_category | 1470 | 242 | 16.46 | 53.74 |
| dvds_blu_ray | 58 | 9 | 15.52 | 60.34 |
| fixed_telephony | 215 | 33 | 15.35 | 51.16 |
| home_confort | 398 | 61 | 15.33 | 53.02 |
| air_conditioning | 249 | 34 | 13.65 | 57.03 |
| furniture_decor | 6457 | 831 | 12.87 | 55.85 |
| bed_bath_table | 9432 | 1201 | 12.73 | 53.28 |
| computers_accessories | 6702 | 851 | 12.70 | 55.60 |
| baby | 2869 | 360 | 12.55 | 57.09 |
| furniture_bedroom | 96 | 12 | 12.50 | 61.46 |
| art | 200 | 25 | 12.50 | 56.00 |
| construction_tools_construction | 744 | 89 | 11.96 | 57.80 |
| computers | 178 | 21 | 11.80 | 60.67 |
| kitchen_dining_laundry_garden_furniture | 247 | 29 | 11.74 | 54.25 |
| watches_gifts | 5586 | 655 | 11.73 | 57.30 |
| fashion_underwear_beach | 120 | 14 | 11.67 | 49.17 |
| telephony | 4173 | 486 | 11.65 | 52.96 |
| agro_industry_and_commerce | 182 | 21 | 11.54 | 50.55 |
| home_construction | 488 | 56 | 11.48 | 51.64 |
| consoles_games | 1053 | 120 | 11.40 | 56.60 |
| books_imported | 53 | 6 | 11.32 | 73.58 |
| auto | 3895 | 438 | 11.25 | 57.02 |
| market_place | 278 | 31 | 11.15 | 52.52 |
| christmas_supplies | 126 | 14 | 11.11 | 51.59 |
| drinks | 297 | 32 | 10.77 | 58.92 |
| electronics | 2532 | 271 | 10.70 | 56.24 |
| small_appliances | 628 | 67 | 10.67 | 61.46 |
| cine_photo | 66 | 7 | 10.61 | 69.70 |
| sports_leisure | 7721 | 810 | 10.49 | 60.58 |
| perfumery | 3165 | 332 | 10.49 | 63.92 |
| furniture_living_room | 420 | 44 | 10.48 | 54.29 |
| garden_tools | 3506 | 367 | 10.47 | 58.84 |
| home_appliances | 799 | 82 | 10.26 | 59.32 |
| costruction_tools_garden | 196 | 20 | 10.20 | 62.76 |
| health_beauty | 8826 | 899 | 10.19 | 61.51 |
| toys | 3861 | 393 | 10.18 | 61.98 |
| signaling_and_security | 138 | 14 | 10.14 | 56.52 |
| housewares | 5865 | 590 | 10.06 | 59.18 |
| musical_instruments | 623 | 60 | 9.63 | 60.67 |
| cool_stuff | 3609 | 340 | 9.42 | 59.46 |
| small_appliances_home_oven_and_coffee | 75 | 7 | 9.33 | 65.33 |
| food | 445 | 41 | 9.21 | 64.72 |
| fashion_bags_accessories | 1870 | 170 | 9.09 | 59.52 |
| home_appliances_2 | 234 | 21 | 8.97 | 55.56 |
| industry_commerce_and_business | 234 | 21 | 8.97 | 59.40 |
| pet_shop | 1703 | 152 | 8.93 | 63.12 |
| fashion_shoes | 239 | 21 | 8.79 | 62.76 |
| stationery | 2302 | 198 | 8.60 | 62.42 |
| costruction_tools_tools | 94 | 8 | 8.51 | 69.15 |
| construction_tools_lights | 242 | 19 | 7.85 | 54.55 |
| books_technical | 259 | 19 | 7.34 | 71.43 |
| books_general_interest | 508 | 36 | 7.09 | 73.23 |
| luggage_accessories | 1030 | 70 | 6.80 | 65.15 |
| tablets_printing_image | 77 | 4 | 5.19 | 53.25 |
| food_drink | 228 | 10 | 4.39 | 64.91 |
Insights:
- Male Fashion is the Most Problematic: Men's clothing (
fashion_male_clothing) has the highest 1-star rate on the platform at 23.4%. In e-commerce, high negative feedback in clothing is typically driven by inconsistent sizing or the physical fabric not matching the online pictures. - Office Furniture is a Confirmed Failure: Backing up our delivery speed findings,
office_furnitureperforms terribly here as well. It has a nearly 18% 1-star rate and the absolute lowest 5-star rate on the entire list (just 39.6%). Bulky, hard-to-assemble items paired with slow logistics are heavily damaging this category. - The "Unknown" Category Risk: Items tagged as
unknown_categorygenerate a surprisingly high 1-star rate (16.5% out of 1,470 reviews). If a product is poorly classified, it likely lacks a clear description, which leads to confused buyers and mismatched expectations. - Books and Food are the Safest Bets: Categories like
food_drink,books_general_interest, andbooks_technicalboast the lowest 1-star rates (all under 7.5%) and massive 5-star rates (up to 73%). These are standardized, easy-to-ship items that rarely surprise the buyer in a negative way.
Review Volume vs. Score Polarization Analysis¶
26. Is there a correlation between the number of reviews a product receives and its average review score? Specifically, do products with a higher volume of reviews tend to have more polarized ratings (more 1-star and 5-star reviews) compared to products with fewer reviews?
SQL SQL Code
with product_reviews as (
-- step 1: link reviews to specific products.
-- we use distinct to prevent overcounting reviews if a customer bought multiple of the EXACT same item in one order.
select distinct
r.review_id,
r.order_id,
oi.product_id,
r.review_score
from
order_reviews r
inner join
order_items oi on r.order_id = oi.order_id
where
r.review_score is not null
),
product_stats as (
-- step 2: aggregate the review counts and score distributions per product
select
product_id,
count(review_id) as total_reviews,
avg(review_score * 1.0) as avg_review_score,
sum(case when review_score = 1 then 1 else 0 end) as count_1_star,
sum(case when review_score = 5 then 1 else 0 end) as count_5_star
from
product_reviews
group by
product_id
),
volume_buckets as (
-- step 3: segment products into review volume buckets
select
product_id,
total_reviews,
avg_review_score,
count_1_star,
count_5_star,
case
when total_reviews between 1 and 10 then '1-10 Reviews (Low)'
when total_reviews between 11 and 50 then '11-50 Reviews (Medium)'
when total_reviews between 51 and 100 then '51-100 Reviews (High)'
else '100+ Reviews (Very High)'
end as volume_bucket,
case
when total_reviews between 1 and 10 then 1
when total_reviews between 11 and 50 then 2
when total_reviews between 51 and 100 then 3
else 4
end as bucket_order
from
product_stats
)
-- final output: aggregate the metrics by bucket to analyze correlation and polarization
select
volume_bucket,
count(product_id) as number_of_products,
sum(total_reviews) as total_reviews_in_bucket,
cast(avg(avg_review_score) as decimal(5,2)) as avg_score_in_bucket,
cast((sum(count_1_star) * 100.0 / sum(total_reviews)) as decimal(5,2)) as pct_1_star,
cast((sum(count_5_star) * 100.0 / sum(total_reviews)) as decimal(5,2)) as pct_5_star,
-- measuring polarization by combining extreme scores
cast(((sum(count_1_star) + sum(count_5_star)) * 100.0 / sum(total_reviews)) as decimal(5,2)) as pct_polarized
from
volume_buckets
group by
volume_bucket,
bucket_order
order by
bucket_order;
Results:
| volume_bucket | number_of_products | total_reviews_in_bucket | avg_score_in_bucket | pct_1_star | pct_5_star | pct_polarized |
|---|---|---|---|---|---|---|
| 1-10 Reviews (Low) | 31331 | 61151 | 4.05 | 12.07 | 57.77 | 69.84 |
| 11-50 Reviews (Medium) | 1315 | 25711 | 4.09 | 11.07 | 57.19 | 68.25 |
| 51-100 Reviews (High) | 93 | 6449 | 4.12 | 10.51 | 57.99 | 68.51 |
| 100+ Reviews (Very High) | 50 | 8919 | 4.05 | 11.28 | 56.54 | 67.82 |
Insights:
- No Correlation with Average Score: Surprisingly, the number of reviews a product gets does not change its overall rating. Whether a product has 5 reviews or 500, the average score stays incredibly stable between 4.05 and 4.12.
- Polarization is a Constant Behavior: The assumption that highly reviewed products get more polarized (lots of 1s and 5s) is false. Across every single volume bucket, roughly 68% to 70% of all ratings are extreme (either 1-star or 5-star). This tells us that e-commerce customers generally only leave reviews when they either absolutely love or completely hate an item, regardless of the product's popularity.
- The "Long Tail" Catalog: The data highlights how massive and fragmented the Olist catalog is. Out of the entire dataset, over 31,000 products fall into the "Low Volume" bucket (1-10 reviews), while only 50 products reached the "Very High" bucket (100+ reviews). This proves that Olist's revenue is driven by a "long tail" of thousands of niche items rather than a few massive blockbusters.
Seller Performance Analysis¶
27. Can we identify specific sellers or products that are responsible for a disproportionate number of 1-star reviews? Specifically, which sellers have the highest percentage of their orders resulting in 1-star reviews, and what common issues are cited in these reviews?
Solution:
-
To analyze review polarization, customer reviews should first be linked back to the actual products. This must be done in the first CTE by joining the order reviews table with the order items table. A distinct select should be used here because sometimes a customer buys multiple units of the exact same item in one order, but only one review is left for the whole transaction. The distinct keyword will ensure that the single review is not accidentally overcounted for the product.
-
Next, the data should be grouped by the product ID to pull baseline metrics for each individual item. The total number of reviews and the overall average score must be calculated. In the same step, a simple case statement will be used to count exactly how many one-star and five-star reviews each product received. This will isolate the extreme ratings needed for the analysis.
-
Then, those product-level metrics should be used to assign each product into a specific bucket based on its review volume, such as low, medium, or high. A bucket order column with simple numbers must also be added so the final output will be sorted logically from the lowest volume to the highest, rather than being sorted alphabetically by the bucket name.
-
Finally, all the data must be rolled up by those volume buckets in the main query. The share of one-star and five-star reviews should be calculated against the total reviews in each bucket to get the final percentages. The one-star and five-star percentages will then be added together into a polarized percentage metric. This metric will be used to see if highly reviewed products actually trigger more extreme reactions compared to items that only have a handful of reviews.
SQL SQL Code
with seller_reviews as (
-- step 1: link reviews to specific sellers.
-- we use distinct to prevent overcounting reviews if a customer bought multiple items from the same seller in one order.
select distinct
oi.seller_id,
r.review_id,
r.order_id,
r.review_score
from
order_reviews r
inner join
order_items oi on r.order_id = oi.order_id
where
r.review_score is not null
),
seller_stats as (
-- step 2: aggregate the total reviews and 1-star reviews per seller
select
seller_id,
count(review_id) as total_reviews,
sum(case when review_score = 1 then 1 else 0 end) as count_1_star,
avg(review_score * 1.0) as avg_review_score
from
seller_reviews
group by
seller_id
)
-- final output: calculate the percentage of 1-star reviews and identify the worst offenders
select
seller_id,
total_reviews,
count_1_star,
cast((count_1_star * 100.0 / total_reviews) as decimal(5,2)) as pct_1_star,
cast(avg_review_score as decimal(5,2)) as overall_avg_score
from
seller_stats
where
total_reviews >= 30 -- threshold: filtering out sellers with very few reviews to ensure statistical significance
order by
pct_1_star desc,
total_reviews desc;
Results:
Insights:
- The 50% Failure Rate: Seller
1ca7077d...is the most toxic vendor in this dataset. Out of 114 total orders, 61 resulted in a 1-star review. Failing over 53% of the time is catastrophic for platform trust and drags their average rating down to an abysmal 2.33. - High Volume, High Damage: Seller
2eb70248...is a major platform risk. They process a solid amount of volume (199 orders) but maintain a 40.2% failure rate, dumping 80 one-star reviews into the marketplace. - The Volume Trap: Seller
7c67e144...sits lower on the percentage list with a 19.9% failure rate. However, because they handled 982 orders, they single-handedly created 196 one-star reviews. Even a "moderate" failure rate becomes a massive issue at high volumes. - Perfect Sellers Exist: At the very bottom of the dataset, we can see a group of sellers (like
48efc9d9...and18e694e0...) who processed 30+ orders with exactly zero 1-star reviews. This proves that maintaining a flawless customer experience is completely possible on Olist. - Root Causes of 1-Star Reviews: Connecting this data to our previous geographic and logistics analysis, the most common issues driving these negative reviews are highly predictable. Customers leave 1-star ratings when packages arrive more than 3 days past the estimated delivery window, when sellers take over a week just to dispatch the item, or when products (especially in categories like men's clothing or office furniture) are poor quality and fail to match their online descriptions.
Customer Review Impact¶
28. What is the effect of customer review answers on the average review score? Specifically, do products that receive responses to customer reviews (e.g., seller replies) tend to have higher average review scores compared to products that do not receive any responses? Also the average response time to customer reviews and its correlation with review scores?
Solution:
-
To analyze the effect of response times on review scores, the gap between when a review was created and when it was answered must first be calculated. This should be done in the first CTE by calculating the difference in hours between the creation date and the answer timestamp from the order reviews table. The data must also be filtered to only include records where a review score actually exists.
-
Next, those calculated response times should be segmented into logical buckets to help spot trends. A case statement must be used to categorize each response as fast, medium, slow, or very slow, based on the number of hours it took to reply. A specific category for reviews that never received a response must also be included to serve as a baseline for comparison.
-
Finally, the metrics must be aggregated based on these newly created response speed categories in the main query. The total number of reviews, the average response time, and the overall average review score should be calculated for each bucket. The distribution of extreme scores must also be captured by calculating the percentage of one-star and five-star reviews within each category. This final output will be used to determine if faster responses actually correlate with higher overall customer satisfaction and fewer negative ratings.
Show SQL Code
with review_response_times as (
-- step 1: calculate the time it took for the customer to answer the review survey
select
review_id,
review_score,
review_creation_date,
review_answer_timestamp,
-- calculating response time in hours to measure the gap
datediff(hour, review_creation_date, review_answer_timestamp) as response_time_hours
from
order_reviews
where
review_score is not null
),
response_buckets as (
-- step 2: segment the response times into logical buckets to spot trends
select
review_id,
review_score,
response_time_hours,
case
when review_answer_timestamp is null then 'No Response Recorded'
when response_time_hours <= 24 then '1. Fast (Under 24 Hours)'
when response_time_hours <= 72 then '2. Medium (1 - 3 Days)'
when response_time_hours <= 168 then '3. Slow (3 - 7 Days)'
else '4. Very Slow (Over 1 Week)'
end as response_speed_category
from
review_response_times
)
-- final output: aggregate the scores based on how quickly the survey was answered
select
response_speed_category,
count(review_id) as total_reviews,
cast(avg(response_time_hours * 1.0) as decimal(10,2)) as avg_response_time_hours,
cast(avg(review_score * 1.0) as decimal(5,2)) as avg_review_score,
-- getting the distribution of extreme scores for correlation analysis
cast((sum(case when review_score = 1 then 1 else 0 end) * 100.0 / count(review_id)) as decimal(5,2)) as pct_1_star,
cast((sum(case when review_score = 5 then 1 else 0 end) * 100.0 / count(review_id)) as decimal(5,2)) as pct_5_star
from
response_buckets
group by
response_speed_category
order by
response_speed_category;
Results:
| response_speed_category | total_reviews | avg_response_time_hours | avg_review_score | pct_1_star | pct_5_star |
|---|---|---|---|---|---|
| 1. Fast (Under 24 Hours) | 28072 | 18.90 | 3.94 | 15.76 | 56.19 |
| 2. Medium (1 - 3 Days) | 44618 | 46.06 | 4.16 | 9.53 | 59.16 |
| 3. Slow (3 - 7 Days) | 21683 | 100.95 | 4.13 | 9.68 | 57.33 |
| 4. Very Slow (Over 1 Week) | 4851 | 551.43 | 4.02 | 13.38 | 56.26 |
Insights:
- The "Damage Control" Anomaly: The most surprising finding is that the fastest responses (under 24 hours) actually correlate with the lowest average score (3.94) and the highest 1-star rate (15.76%). This does not mean fast replies cause bad reviews. Instead, it shows that furious customers trigger immediate "damage control." When a 1-star review drops, sellers rush to reply right away to apologize or fix the issue.
- The "Happy Batch" Effect: Reviews that are answered within 1 to 3 days (Medium speed) boast the best average score (4.16) and the lowest 1-star rate (9.5%). It is highly likely that sellers do not feel an urgent need to reply to positive 5-star reviews immediately. Instead, they probably batch-process their "thank you" replies a few days later when they have free time.
- Most Sellers are Active: The vast majority of reviews (over 72,000) are answered within 3 days. This shows that the platform's seller base is generally very active and monitors their customer feedback closely.
- The Cost of Neglect: When response times drag on for over a week (averaging 551 hours), the 1-star rate starts to climb back up to over 13%. This suggests that true neglect—where a seller is completely disengaged from their storefront—is linked to a generally poorer customer experience and lower product quality.
Conclusion¶
What We Did: * Mapped out the exact correlation between delivery delays and average review scores to find the customer tolerance threshold. * Categorized products into a matrix of delivery speed versus satisfaction (e.g., "Fast but Hated", "Slow but Loved"). * Analyzed review polarization across different order volumes to understand how customers rate products. * Isolated the specific toxic sellers driving 1-star reviews and examined how response times correlate with ratings.
What We Obtained (Key Findings): * The 3-Day Late Cliff: Customers will forgive a 1-to-2 day delivery delay. However, once a package hits exactly 3 days late, the average rating permanently crashes below 3 stars. At 6 days late, the customer experience is completely unrecoverable (under 2.0 stars). * Quality Beats Speed: Fast shipping cannot save a bad product. Categories like IT accessories and male fashion arrive quickly but get terrible reviews due to likely quality or sizing issues. Conversely, customers happily wait 13+ days for specific high-value items like PCs or musical instruments without punishing the seller's rating. * The Polarization Rule: Customers generally only leave a review if they absolutely love or completely hate an item. Roughly 68% of all ratings across the platform are either 1-star or 5-star, regardless of whether a product is a niche item or a massive bestseller. * Toxic Outliers Destroy Trust: A very small group of high-volume sellers is responsible for a massive chunk of negative feedback. The worst offender we found had an unacceptable 53% failure rate (1-star reviews). * The Damage Control Effect: Sellers reply to negative reviews immediately (under 24 hours) to put out fires and manage angry buyers. Positive 5-star reviews, on the other hand, are usually answered a few days later in batches.