Order and Sales Performance Analysis

This section will explore the order and sales performance metrics for Olist.

5Project Roadmap: Step-by-Step Navigation
1
Problem Description - Olist E-Commerce Data AnalysisThis article provides a detailed problem description for the Olist E-Commerce Data Analysis project, outlining the tasks assigned by the company, the structure of the analysis, the tools and skills used, acknowledgements, and references.
2
Data NormalizationThis section willl cover the data normalization process for the Olist E-Commerce dataset, including data ingestion, cleaning, star schema modeling, and the creation of an explicit date table to support time-based analysis.
3
Business QuestionsIn this section, business questions will be defined to guide the data analysis process for the Olist E-Commerce dataset. These questions will be categorized into different domains such as customer behavior, sales performance, delivery logistics, seller performance, payments, and customer experience to ensure a comprehensive analysis of the data.
4
Customer Retention AnalysisThis section focuses on analyzing customer retention patterns using the Olist e-commerce dataset. The analysis includes segmenting customers based on their purchasing behavior, understanding trends in customer retention, and exploring the factors that influence customer loyalty.
5
Order and Sales Performance AnalysisThis section will explore the order and sales performance metrics for Olist.
6
Delivery Performance and SLA AnalysisThis section will explore the delivery performance and logistics efficiency of Olist's e-commerce operations, analyzing on-time delivery rates, freight costs, and the decomposition of delivery timelines to identify bottlenecks and regional disparities.
7
Order Fullfilment Efficiency AnalysisThis section will explore how efficiently sellers are fulfilling orders on the Olist platform by analyzing the time gap between order approval and handover to logistics partners, identifying the 'Bottom 10%' of sellers in terms of dispatch efficiency, and providing insights into potential areas for improvement in the fulfillment process.
8
Payments and Risk AnalysisThis section will explore the payment distribution across different methods, analyze the correlation between installment counts and order values, and identify regional payment preferences to understand the risk and revenue implications for Olist's e-commerce platform.
9
Customer Review and Satisfaction AnalysisThis section will explore the effect of delivery delays on customer satisfaction, identify product categories with mismatched delivery speed and review scores, and analyze the distribution of review scores across different product categories to uncover common issues in negative feedback.
10
Times Series Analysis of Orders and Sales PerformanceThis section will explore the time series analysis of orders and sales performance for Olist, including month-over-month growth rates, seasonality effects, and the distribution of order volume by day and hour to identify key trends and patterns in customer behavior.

Introduction

In this section, we will analyze the following aspects of Olist’s sales performance:

  1. Monthly Sales Trends. We will examine the monthly sales trends for Olist over the past few years to identify any seasonal patterns or significant flu ctuations in sales performance.
  2. Pareto Principle for Product Categories. We will identify which product categories constitute the top 20% of sales volume and whether they align with the top 20% of revenue generators. We will also classify categories into ‘High Volume / Low Value’ and ‘Low Volume / High Value’ segments.
  3. Rolling Average for Revenue. We will calculate the 3-month rolling average for total revenue to smooth out daily and weekly volatility and compare this trendline to the raw sales data to understand the true direction of business growth.
  4. Regional Sales Distribution. We will analyze how revenue is distributed across different states and cities, identifying the top 5 regions with the highest sales density per capita versus regions with high order volume but low total revenue.
  5. AOV and Basket Size Analysis. We will analyze the Average Order Value (AOV) and average basket size over time to understand customer purchasing behavior and identify any trends or shifts in these metrics.

6. Monthly Sales Trends: What are the monthly sales trends for Olist over the past few years? Are there any seasonal patterns or significant fluctuations in sales performance that can be identified?

Solution:

Solution Steps for Review Response Analysis

  1. Firstly, we calculate the time gap for survey responses (review_response_times CTE): We use the datediff function to calculate the response_time_hours between review_creation_date (when the satisfaction survey was sent) and review_answer_timestamp (when the customer submitted it). We filter out records where review_score is null.

  2. Secondly, we segment the response times into analytical cohorts (response_buckets CTE): We use a CASE statement to group the continuous hour values into logical buckets: “Fast (Under 24 Hours)”, “Medium (1 - 3 Days)”, “Slow (3 - 7 Days)”, and “Very Slow (Over 1 Week)”, alongside handling missing timestamps.

  3. Finally, we aggregate the scores to find correlations (Final SELECT): We group the results by the new response_speed_category. We calculate the avg_review_score for each bucket and use conditional summation (sum(case when...)) to find the exact percentage of 1-star and 5-star reviews, revealing how response speed correlates with customer satisfaction.

Show SQL Code
with monthly_sales as (
    -- step 1: aggregate core metrics by month
    -- we use the calendar table to ensure clean date grouping
    select 
        c.year,
        c.month_number,
        c.month_name,
        count(distinct o.order_id) as total_orders,
        sum(p.payment_value) as total_revenue
    from 
        orders o
    inner join 
        payments p on o.order_id = p.order_id
    inner join 
        Calendar c on o.purchase_datekey = c.datekey
    where 
        o.order_status not in ('canceled', 'unavailable')
    group by 
        c.year, c.month_number, c.month_name
),

growth_metrics as (
    -- step 2: use lag() to fetch values from the previous month and previous year
    select 
        *,
        -- previous month values
        prev_month_orders = lag(total_orders) over (order by year, month_number),
        prev_month_revenue = lag(total_revenue) over (order by year, month_number),
        
        -- previous year values (offset of 12 rows)
        prev_year_orders = lag(total_orders, 12) over (order by year, month_number),
        prev_year_revenue = lag(total_revenue, 12) over (order by year, month_number)
    from 
        monthly_sales
)

-- final output: calculating percentage growth
select 
    year,
    month_name,
    total_orders,
    total_revenue,
    
    -- month-over-month (mom) growth
    mom_revenue_growth_pct = cast(100.0 * (total_revenue - prev_month_revenue) / nullif(prev_month_revenue, 0) as decimal(10,2)),
    
    -- year-over-year (yoy) growth
    yoy_revenue_growth_pct = cast(100.0 * (total_revenue - prev_year_revenue) / nullif(prev_year_revenue, 0) as decimal(10,2)),
    
    -- identifying seasonal spikes
    case 
        when month_number = 11 then 'Black Friday Period'
        when month_number = 12 then 'Holiday Season'
        when month_number = 1 then 'New Year Peak'
        else 'Standard Period'
    end as seasonality_tag
from 
    growth_metrics
order by 
    year, month_number;
go

Results:

yearmonth_nametotal_orderstotal_revenuemom_revenue_growth_pctyoy_revenue_growth_pctseasonality_tag
2016September1136.22999572753906Standard Period
2016October29351657.5299947261837819.35Standard Period
2016December119.6200008392334-99.96Holiday Season
2017January787137006.76003867388698201.50New Year Peak
2017February1718283621.93943831325107.01Standard Period
2017March2617425656.399554163250.08Standard Period
2017April2377405988.3795110285-4.62Standard Period
2017May3640582926.16016700343.58Standard Period
2017June3205499827.47035080194-14.26Standard Period
2017July3946578858.579521149415.81Standard Period
2017August4272662071.769808424614.38Standard Period
2017September4227717195.22036169478.33Standard Period
2017October4547764785.27968363846.64561292.72Standard Period
2017November74231172639.229781104353.332170.03Black Friday Period
2017December5620861914.4599985629-26.504392939.87Holiday Season
2018January71871102639.409112945227.93704.81New Year Peak
2018February6625979966.2302345932-11.13245.52Standard Period
2018March71681152736.739690676317.63170.81Standard Period
2018April69191156303.91070481580.31184.81Standard Period
2018May68331145748.6310662013-0.9196.55Standard Period
2018June61451020494.2907593064-10.93104.17Standard Period
2018July62331039880.15903876161.9079.64Standard Period
2018August6421996896.1512343884-4.1350.57Standard Period
2018September1166.4600067138672-99.98-99.98Standard Period

Insights:

  • There is a clear seasonal pattern with significant spikes in revenue during the Black Friday period (November) and the Holiday Season (December). The New Year Peak in January also shows a notable increase in sales.
  • The month-over-month growth rates are highly volatile, especially around the holiday season, indicating that raw monthly revenue can be influenced by short-term factors. This underscores the importance of using rolling averages to identify long-term trends.
  • The year-over-year growth rates show a strong upward trajectory, particularly in the early months of 2018 compared to 2017, suggesting that Olist is experiencing significant growth year over year, despite the volatility in month-over-month performance.
  • The data reveals distinct patterns in product category performance, with some categories driving high sales volumes while others generate substantial revenue per unit sold. This highlights the need for a nuanced approach to inventory and marketing strategies, focusing on both high-volume and high-value categories to optimize overall sales performance.

Pareto Principles for Product Categories

7. Which product categories constitute the top 20% of sales volume, and do they align with the top 20% of revenue generators? Can we identify ‘High Volume / Low Value’ categories versus ‘Low Volume / High Value’ (Niche/Luxury) categories?

Solution:

  1. Firstly, we aggregate volume and revenue per category (category_metrics CTE): We join the order_items, products, and orders tables to calculate the total number of items sold (total_volume) and the sum of their prices (total_revenue) for each product_category_name. We also ensure to filter out any ‘canceled’ or ‘unavailable’ orders so our metrics reflect actual successful sales.

  2. Secondly, we calculate the percentile rankings (category_rankings CTE): We utilize the percent_rank() window function, ordering descendingly by volume and revenue respectively. This gives us a statistical ranking from 0 to 1 (volume_percentile and revenue_percentile), showing exactly where each category stands relative to all others.

  3. Finally, we segment the categories based on the top 20% thresholds (Final SELECT): We compute the avg_unit_price and use a CASE statement to classify each category. Since percent_rank() outputs 0 for the highest value, checking for <= 0.20 captures the top 20%. By intersecting the volume and revenue conditions, we can easily identify the ‘High Vol / Low Rev’ (cheap but popular) and ‘Low Vol / High Rev’ (niche/luxury) segments.

Show SQL Code
with category_metrics as  (
    select 
        products.product_category_name,
        count(order_items.order_item_id) as  total_volume,
        sum(order_items.price) as total_revenue       
    from 
        order_items
    inner join 
        products on products.product_id = order_items.product_id
    inner join 
        orders on orders.order_id = order_items.order_id
    where
        orders.order_status not in ('canceled', 'unavailable')
            and products.product_category_name is not null 
    group by 
        products.product_category_name
), 

category_rankings as (
    select
        *,
        percent_rank() over (order by total_volume desc) as volume_percentile,
        percent_rank() over (order by total_revenue desc) as revenue_percentile
    from  
        category_metrics
)

select
    product_category_name,
    total_volume,
    total_revenue,
    cast( total_revenue / total_volume as decimal(10,2)) as avg_unit_price,
    case
        when volume_percentile <= 0.20 and revenue_percentile <= 0.20 then 'High Vol / High Rev'
        when volume_percentile <= 0.20 and revenue_percentile > 0.20 then 'High Vol / Low Rev'
        when volume_percentile > 0.20 and revenue_percentile <= 0.20 then 'Low Vol / High Rev'
        else 'Standard'
    end as category_segmentation
from 
    category_rankings
order by 
    total_revenue desc;

Results:

product_category_nametotal_volumetotal_revenueavg_unit_pricecategory_segmentation
beleza_saude96341255695.130966425130.34High Vol / High Rev
relogios_presentes59701198185.207540512200.70High Vol / High Rev
cama_mesa_banho110971035964.060027122593.36High Vol / High Rev
esporte_lazer8590979740.9188661575114.06High Vol / High Rev
informatica_acessorios7781904322.0174894333116.22High Vol / High Rev
moveis_decoracao8298727465.052305221687.67High Vol / High Rev
utilidades_domesticas6915626825.800787448990.65High Vol / High Rev
cool_stuff3779620770.4918966293164.27High Vol / High Rev
automotivo4204586585.7307322025139.53High Vol / High Rev
ferramentas_jardim4328481009.94201755524111.14High Vol / High Rev
brinquedos4083479808.5404686928117.51High Vol / High Rev
bebes3043410312.2000389099134.84High Vol / High Rev
perfumaria3402396599.30858802795116.58High Vol / High Rev
telefonia4527322342.639416217871.20High Vol / High Rev
moveis_escritorio1690273580.70111846924161.88Low Vol / High Rev
papelaria2505229593.8398811817291.65Standard
pcs203222963.12988281251098.34Standard
pet_shop1942213766.63003492355110.08Standard
eletroportateis671187907.2599811554280.04Standard
instrumentos_musicais669187788.44005393982280.70Standard
1589178572.54968833923112.38Standard
eletronicos2755157079.5394055843457.02High Vol / Low Rev
consoles_games1127154927.27038812637137.47Standard
fashion_bolsas_e_acessorios2025151622.7504100799674.88Standard
construcao_ferramentas_construcao927144550.18987429142155.93Standard
malas_acessorios1088140111.37979125977128.78Standard
eletrodomesticos_2235110649.73986434937470.85Standard
casa_construcao60383010.11981487274137.66Standard
eletrodomesticos76980001.24019479752104.03Standard
agro_industria_e_comercio21272530.46979904175342.12Standard
moveis_sala50168638.55983161926137.00Standard
casa_conforto43458572.04043722153134.96Standard
telefonia_fixa26157824.20998239517221.55Standard
climatizacao29554723.15971946716185.50Standard
audio36350668.59962081909139.58Standard
portateis_casa_forno_e_cafe7647445.71008014679624.29Standard
moveis_cozinha_area_de_servico_jantar_e_jardim27946070.37002372742165.13Standard
livros_interesse_geral54645622.2904987335283.56Standard
construcao_ferramentas_iluminacao30240002.00014305115132.46Standard
industria_comercio_e_negocios26839669.61002922058148.02Standard
construcao_ferramentas_seguranca18939589.01976776123209.47Standard
alimentos50929243.50996971130457.45Standard
market_place31028288.47003936767691.25Standard
construcao_ferramentas_jardim23425552.180088043213109.20Standard
artes20824167.639976978302116.19Standard
fashion_calcados26023352.8799247741789.82Standard
bebidas37822378.7100467681959.20Standard
sinalizacao_e_seguranca19921509.230089187622108.09Standard
moveis_quarto10920028.77994632721183.75Standard
livros_tecnicos26719096.06010818481471.52Standard
construcao_ferramentas_ferramentas10315903.94997882843154.41Standard
alimentos_bebidas27615119.480049133354.78Standard
fashion_roupa_masculina13110748.820034027182.05Standard
fashion_underwear_e_moda_praia1319541.55010223388772.84Standard
artigos_de_natal1528787.83007287979157.81Standard
tablets_impressao_imagem837528.41004562377990.70Standard
cine_foto726933.45995807647796.30Standard
musica386034.349970340729158.80Standard
livros_importados604639.84999275207577.33Standard
dvds_blu_ray624559.489982604980573.54Standard
artigos_de_festas434485.180006027222104.31Standard
moveis_colchao_e_estofado384368.079982757568114.95Standard
portateis_cozinha_e_preparadores_de_alimentos143933.6299953460693280.97Standard
fashion_roupa_feminina482803.640010833740258.41Standard
fashion_esporte302119.51001930236870.65Standard
la_cuisine142054.9899978637695146.78Standard
artes_e_artesanato241814.010004997253475.58Standard
fraldas_higiene381538.789993286132840.49Standard
pc_gamer81306.9500274658203163.37Standard
flores331110.040006637573233.64Standard
casa_conforto_230760.269996643066425.34Standard
cds_dvds_musicais14730.052.14Standard
fashion_roupa_infanto_juvenil8569.850002288818471.23Standard
seguros_e_servicos2283.2899932861328141.64Standard

Insights: According to the results, we can obtain the following insights:

  • Volume and Revenue Align Well: The biggest sellers are also the biggest earners. Categories like beleza_saude (health & beauty), relogios_presentes (watches & gifts), and cama_mesa_banho (bed & bath) are the backbone of the platform, bringing in both massive order counts and top-tier revenue.
  • The “High Volume / Low Revenue” Trap: Eletronicos (electronics) fits this perfectly. People buy a lot of them (2,755 units), but because the average item is cheap (just $57.02), the total revenue stays relatively low compared to the effort of shipping that many boxes.
  • The “Low Volume / High Revenue” Winners: Moveis_escritorio (office furniture) is our official high-value niche. It only took 1,690 sales to generate over 273Kinrevenuethankstoasolid273K in revenue thanks to a solid 161.88 average price.
  • The Ultimate Premium Outlier: While the system tagged it as “Standard,” pcs (personal computers) is the perfect example of a luxury/high-ticket item. With an average price of 1,098,itonlytook203salestomake1,098, it only took 203 sales to make 222K—almost matching the revenue of categories that sold thousands of items.

3-Month Rolling Average for Total Revenue

8. Can we calculate the 3-month rolling average for total revenue to smooth out daily and weekly volatility? How does this long-term trendline compare to the raw sales data in identifying the true direction of business growth?

Solution:

  1. Firstly, we aggregate the raw monthly revenue (monthly_revenue CTE): We join the orders, payments, and Calendar tables to calculate the total payment_value (raw_monthly_revenue) for each month and year. We also filter out ‘canceled’ or ‘unavailable’ orders to ensure we only look at valid sales data.

  2. Secondly, we calculate the 3-month rolling average (Final SELECT): We use the avg() window function coupled with rows between 2 preceding and current row. By ordering chronologically by year and month, this smooths out short-term volatility by averaging the current month’s revenue with the two previous months to establish a trendline.

  3. Finally, we calculate the variance from the trend: We determine how much the current month’s raw revenue deviates from the calculated 3-month rolling average. We compute this as a percentage ((raw - average) / raw) * 100 using nullif to prevent division by zero, which helps easily identify months that significantly overperformed or underperformed the smoothed long-term growth trend.

Show SQL Code
with monthly_revenue as (
    select
        Calendar.year, 
        Calendar.month_number,
        Calendar.month_name,
        sum(payments.payment_value) as raw_monthly_revenue
    from 
        orders 
    inner join 
        payments on payments.order_id = orders.order_id 
    inner join 
        Calendar on Calendar.datekey = orders.purchase_datekey
    where 
        orders.order_status not in ('canceled', 'unavailable')
    group by 
        Calendar.year, calendar.month_number, Calendar.month_name
)

select 
    [year], 
    month_number,
    month_name,
    raw_monthly_revenue,

    -- average the current month and 2 preceding (last) months
    cast (avg(raw_monthly_revenue) over (order by [year], month_number 
            rows between 2 preceding and current row) as decimal(10,3)) as rolling_avg_3m,

    -- calculate deviation from the trend = (current_month - rolling_avg_3m) / current_month * 100.0 = var_dev %
    cast(100.0 * (raw_monthly_revenue - avg(raw_monthly_revenue) over (order by [year], month_number 
            rows between 2 preceding and current row)) / nullif(raw_monthly_revenue,0) as decimal (10,2)) as variance_from_trend
from 
    monthly_revenue
group by 
    [year], 
    month_number, 
    month_name,
    raw_monthly_revenue
order by
    [year], month_number;

Results:

yearmonth_numbermonth_nameraw_monthly_revenuerolling_avg_3mvariance_from_trend
20169September136.22999572753906136.2300.00
201610October51657.5299947261825896.88049.87
201612December19.620000839233417271.127-87928.16
20171January137006.7600386738862894.63754.09
20172February283621.93943831325140216.10650.56
20173March425656.3995541632282095.03333.73
20174April405988.3795110285371755.5738.43
20175May582926.160167003471523.64619.11
20176June499827.47035080194496247.3370.72
20177July578858.5795211494553870.7374.32
20178August662071.7698084246580252.60712.36
20179September717195.2203616947652708.5238.99
201710October764785.2796836384714684.0906.55
201711November1172639.2297811043884873.24324.54
201712December861914.4599985629933112.990-8.26
20181January1102639.40911294521045731.0335.16
20182February979966.2302345932981506.700-0.16
20183March1152736.73969067631078447.4606.44
20184April1156303.91070481581096335.6275.19
20185May1145748.63106620131151596.427-0.51
20186June1020494.29075930641107515.611-8.53
20187July1039880.15903876161068707.694-2.77
20188August996896.15123438841019090.200-2.23
20189September166.4600067138672678980.923-407794.33

Insights:

  • A Clearer Picture of Growth: The 3-month rolling average successfully smooths out the messy monthly jumps. It reveals a very clean, consistent upward curve throughout 2017 before settling into a steady, flat line in 2018.
  • Isolating the Black Friday Boom: In November 2017, raw revenue jumped 24.5% higher than the rolling trendline. This confirms that the Black Friday success was a temporary seasonal spike rather than a permanent shift in baseline sales.
  • The Post-Holiday Drop: December 2017 shows an 8.2% negative variance from the trend. Because November pulled the 3-month average so high, this negative variance perfectly illustrates the sudden “hangover” effect as customer spending dropped off.
  • Highly Predictable 2018: Between February and August 2018, the raw revenue tracks incredibly closely to the 3-month average. The variance is tiny (mostly between -2% and +6%), showing that the business matured into a very stable phase, predictably earning around 1Mto1M to 1.15M per month.
  • Spotting Bad Data Instantly: The rolling average makes data cutoffs obvious. The massive negative variance in September 2018 immediately flags that our dataset ends abruptly, preventing us from accidentally using that raw number in any serious forecasting.

Revenue Distribution across States and Cities

9. How is revenue distributed across different states (customer_state) and cities? specifically, identifying the top 5 regions with the highest sales density per capita versus regions with high order volume but low total revenue?

Solution:

  1. Firstly, we calculate regional baseline metrics (region_performance CTE): We join the orders, customers, and payments tables to aggregate the total_orders and total_revenue for each customer_state and customer_city. We also calculate the avg_order_value (revenue density) by dividing total revenue by the distinct order count, ensuring we exclude ‘canceled’ or ‘unavailable’ orders.

  2. Secondly, we rank the regions (region_rankings CTE): We use the rank() window function to assign a revenue_rank and a volume_rank to each city/state combination. By ordering descendingly, the regions with the highest revenue and order counts receive the top ranks (e.g., Rank 1, 2, 3).

  3. Finally, we segment the regions based on their rankings (Final SELECT): We use a CASE statement to classify each region’s profile by evaluating its volume and revenue ranks. This allows us to categorize regions into distinct business profiles such as ‘Major Economic Hub’ (top ranks in both volume and revenue) or ‘Affluent Niche’ (lower volume rank but high revenue rank), clearly identifying the sales density and distribution across different geographies.

Show SQL Code
with region_performance as (
    SELECT
        customers.customer_state,
        customers.customer_city,
        count(distinct orders.order_id) as total_orders,
        sum(payments.payment_value) as total_revenue,
        -- avg order value -> revenue density
        sum(payments.payment_value) / count(distinct orders.order_id) as avg_order_value
    FROM 
        orders
    join   
        customers on customers.customer_id = orders.customer_id
    join 
        payments on payments.order_id = orders.order_id
    where 
        orders.order_status not in ('canceled', 'unavailable')
    group by 
        customers.customer_state,
        customers.customer_city
), 

region_rankings as (
    select 
        *,
        rank() over (order by total_revenue desc) as revenue_rank, 
        rank() over ( order by total_orders desc) as volume_rank
    from 
        region_performance
)

select 
    customer_state,
    customer_city,
    total_orders,
    total_revenue,
    cast(avg_order_value as decimal(10,2)) as avg_order_value,
    case 
        when revenue_rank <= 10 and volume_rank <= 10 then 'Major Economic Hub (High Vol / High Rev)'
        when revenue_rank <= 20 and volume_rank > 50 then 'Affluent Niche (Low Vol / High Rev)'
        when revenue_rank <= 10 and volume_rank <= 10 then 'Mass Market (High Vol / Low Rev)'
        else 'Standard Market'
    end as regional_profile
from 
    region_rankings
order by 
    total_revenue desc;

Results:

customer_statecustomer_citytotal_orderstotal_revenueavg_order_valueregional_profile
SPsao paulo152912150534.770117907140.64Major Economic Hub (High Vol / High Rev)
RJrio de janeiro68011147675.809347108168.75Major Economic Hub (High Vol / High Rev)
MGbelo horizonte2737411571.5802227948150.37Major Economic Hub (High Vol / High Rev)
DFbrasilia2112350563.3297816515165.99Major Economic Hub (High Vol / High Rev)
PRcuritiba1502240889.5599949062160.38Major Economic Hub (High Vol / High Rev)
RSporto alegre1370220135.7902894616160.68Major Economic Hub (High Vol / High Rev)
BAsalvador1234216352.82025164366175.33Major Economic Hub (High Vol / High Rev)
SPcampinas1422211539.7600118816148.76Major Economic Hub (High Vol / High Rev)
SPguarulhos1167161739.5199765265138.59Major Economic Hub (High Vol / High Rev)
RJniteroi842137399.449943915163.18Standard Market
SPsao bernardo do campo921118343.4201908093128.49Standard Market
CEfortaleza647117983.99994421005182.36Standard Market
GOgoiania681117095.89010572433171.95Standard Market
SPsantos706111670.21002027392158.17Standard Market
PErecife610109421.82003313303179.38Standard Market
SPsanto andre786103892.71011769772132.18Standard Market
SCflorianopolis56399524.4301700592176.78Standard Market
PAbelem44095455.55000805855216.94Standard Market
SPosasco73693696.3600088805127.30Standard Market
SPjundiai55691147.71014773846163.93Standard Market
SPsao jose dos campos68389081.69994664192130.43Standard Market
SPsorocaba62186911.57992458344139.95Standard Market
SPribeirao preto50775379.01995331049148.68Standard Market
MGjuiz de fora42575089.71999931335176.68Standard Market
RJnova iguacu43870736.99004948139161.50Standard Market
MScampo grande31565520.08040642738208.00Standard Market
MAsao luis34563114.89991879463182.94Standard Market
PBjoao pessoa25362606.56971311569247.46Standard Market
ESvitoria37662057.10984182358165.05Standard Market
RJsao goncalo40561570.400188207626152.03Standard Market
PIteresina27860322.97010374069216.99Standard Market
SPpiracicaba36358216.87995111942160.38Standard Market
SPbarueri42558004.639814246446136.48Standard Market
SPmogi das cruzes37757979.230025827885153.79Standard Market
MGcontagem41957814.660032749176137.98Standard Market
MGuberlandia37256786.0901248306152.65Standard Market
ESvila velha32955557.45991837978168.87Standard Market
SPsao jose do rio preto33253719.34991595149161.81Standard Market
ALmaceio24653101.38010084629215.86Standard Market
PRlondrina30253029.720022678375175.60Standard Market
SPmaua31948421.219797849655151.79Standard Market
MTcuiaba24847117.16004541516189.99Standard Market
SPbauru26944903.49977505207166.93Standard Market
PRmaringa26444655.54022991657169.15Standard Market
SEaracaju22044598.79984438419202.72Standard Market
RScaxias do sul22344111.57005023956197.81Standard Market
SPpraia grande30444072.049874305725144.97Standard Market
ESserra28542965.76991789043150.76Standard Market
RNnatal20542851.00978708267209.03Standard Market
SPindaiatuba27741590.47012972832150.15Standard Market
SCjoinville26240905.22002220154156.13Standard Market
RJduque de caxias26339908.03999757767151.74Standard Market
RJcampos dos goytacazes23739898.530123472214168.35Standard Market
SPtaboao da serra28839766.34001151472138.08Standard Market
RJpetropolis23739532.84000623226166.81Standard Market
MGdivinopolis13438727.710107803345289.01Standard Market
RJvolta redonda22838645.94992351532169.50Standard Market
RJmacae23637543.26006150246159.08Standard Market
SPmarilia19637493.970249176025191.30Standard Market
RSsanta maria19437224.64966106415191.88Standard Market
SPcarapicuiba32437196.800077438354114.80Standard Market
SPsao caetano do sul27436571.90995502472133.47Standard Market
SPcotia24734772.90995979309140.78Standard Market
SPtaubate26634560.32997727394129.93Standard Market
MGbetim20134349.920018196106170.90Standard Market
SPsao carlos23133906.410032749176146.78Standard Market
SPguaruja22433690.860169410706150.41Standard Market
SPdiadema28233675.98999106884119.42Standard Market
BAfeira de santana18532466.80003336072175.50Standard Market
RScanoas20931982.830045431852153.03Standard Market
MGuberaba18731911.67006677389170.65Standard Market
MGmontes claros20931283.009807288647149.68Standard Market
RJnova friburgo14831206.370090961456210.85Standard Market
SPpresidente prudente16029916.460077762604186.98Standard Market
SCblumenau18629206.580057948828157.02Standard Market
SPsuzano23229022.78001856804125.10Standard Market
SPamericana21928545.730053901672130.35Standard Market
PRcascavel14827649.599861621857186.82Standard Market
RSpelotas18427356.58004850149148.68Standard Market
SPlimeira18726206.299997091293140.14Standard Market
ROporto velho10926121.920029640198239.65Standard Market
SPsantana de parnaiba18026074.63006234169144.86Standard Market
MGipatinga17026003.559985399246152.96Standard Market
AMmanaus13925793.0799767375185.56Standard Market
SPfranca15825736.110070228577162.89Standard Market
SPvalinhos18225607.430002212524140.70Standard Market
PEjaboatao dos guararapes13325458.900188446045191.42Standard Market
SPsumare17924509.429921627045136.92Standard Market
SCsao jose16824347.42995071411144.93Standard Market
MGpouso alegre13124195.410016059875184.70Standard Market
SPatibaia15723783.42003250122151.49Standard Market
RJrio das ostras13523584.38001394272174.70Standard Market
MGnova lima9823287.82988166809237.63Standard Market
SPrio claro15223181.3299202919152.51Standard Market
SPararaquara15323106.97009086609151.03Standard Market
RJbarra mansa12022938.179938793182191.15Standard Market
SPembu das artes16222846.049993515015141.02Standard Market
PBcampina grande6822783.29009628296335.05Standard Market
SPbraganca paulista14422683.220050811768157.52Standard Market
MGgovernador valadares13522368.88001060486165.70Standard Market
SPsao vicente16122237.13000050187138.12Standard Market
PRponta grossa14222152.279952526093156.00Standard Market
RJsao joao de meriti13121932.93999648094167.43Standard Market
TOpalmas9021790.370041251183242.12Standard Market
RSgravatai11221719.35986328125193.92Standard Market
SPitaquaquecetuba17021716.27007201314127.74Standard Market
RJangra dos reis8421567.900049209595256.76Standard Market
SCitajai11921465.519989013672180.38Standard Market
PAananindeua8821236.850120782852241.33Standard Market
SPitapevi17021190.409993935376124.65Standard Market
SPjacarei19121113.10000383854110.54Standard Market
EScariacica13820593.749937057495149.23Standard Market
RSnovo hamburgo12120527.589955329895169.65Standard Market
RJteresopolis11920266.340026408434170.31Standard Market
MGsete lagoas12720239.31993675232159.36Standard Market
GOanapolis11120029.820016860962180.45Standard Market
RJcabo frio11419568.42012345791171.65Standard Market
SParacatuba14019530.12003302574139.50Standard Market
MGpocos de caldas11819415.190086126328164.54Standard Market
SCbalneario camboriu9919358.679819107056195.54Standard Market
MGpatos de minas10019349.550039052963193.50Standard Market
RSrio grande12319338.510021209717157.22Standard Market
BAvitoria da conquista8819252.20994949341218.78Standard Market
RJararuama7719196.049936294556249.30Standard Market
EScachoeiro de itapemirim12119175.44999051094158.47Standard Market
SPbotucatu10119053.070067703724188.64Standard Market
SPpaulinia12918790.840075671673145.67Standard Market
SPhortolandia14518748.569893836975129.30Standard Market
RJmage11018679.999910354614169.82Standard Market
MGvicosa8918649.7098236084209.55Standard Market
RJmarica13718511.419981002808135.12Standard Market
RSpasso fundo11218088.809967041016161.51Standard Market
SCcriciuma9217812.22989654541193.61Standard Market
PRsao jose dos pinhais13117763.43998658657135.60Standard Market
SCchapeco10217607.440014123917172.62Standard Market
SPferraz de vasconcelos11017471.239877700806158.83Standard Market
GOaparecida de goiania10916962.150075912476155.62Standard Market
PEpetrolina6516959.21994781494260.91Standard Market
ACrio branco7016928.469888687134241.84Standard Market
SPbirigui7816626.830087661743213.16Standard Market
SPguaratingueta9616593.199988663197172.85Standard Market
MGitajuba9416505.20000600815175.59Standard Market
RJresende11716496.71998155117141.00Standard Market
RJbelford roxo11016281.119910791516148.01Standard Market
MGvarginha8816022.510070800781182.07Standard Market
RJnilopolis10615881.68000793457149.83Standard Market
PEolinda9315650.040005683899168.28Standard Market
SCjaragua do sul8915357.909955978394172.56Standard Market
MGlavras8315307.959968531504184.43Standard Market
SPararas9415187.249944269657161.57Standard Market
SPcaraguatatuba8615082.739949911833175.38Standard Market
MGaraxa7515041.850039482117200.56Standard Market
SPsanta barbara d’oeste11814980.129915863276126.95Standard Market
MGribeirao das neves9914890.069915771484150.40Standard Market
SPcaieiras9814729.870056152344150.30Standard Market
SPitu13214614.819989204407110.72Standard Market
SPpindamonhangaba10014375.700001001358143.76Standard Market
SPourinhos7714325.329902648926186.04Standard Market
MGbarbacena8014321.699979782104179.02Standard Market
MTrondonopolis6514105.620091438293217.01Standard Market
SPitapetininga7714048.460102081299182.45Standard Market
ESlinhares6413987.340067327023218.55Standard Market
SPmogi-guacu9113972.309995651245153.54Standard Market
PRfoz do iguacu8713902.670015335083159.80Standard Market
SPassis8413819.309928894043164.52Standard Market
APmacapa5413666.030033111572253.07Standard Market
SPitatiba9713634.020051956177140.56Standard Market
SClages8313616.230019569397164.05Standard Market
SPsao joao da boa vista8113608.180032730103168.00Standard Market
MGsanta luzia9113575.690139114857149.18Standard Market
CEsobral3513295.730205535889379.88Standard Market
RSsao leopoldo10413291.570051595569127.80Standard Market
SCpalhoca9313273.769969940186142.73Standard Market
SPamparo6613001.819974184036197.00Standard Market
BAbarreiras5212952.729969024658249.09Standard Market
SPtatui7312815.720006942749175.56Standard Market
MGitabira7912748.300029754639161.37Standard Market
SPvarzea paulista7812659.479978561401162.30Standard Market
PEcaruaru6412579.229939937592196.55Standard Market
MGmuriae6312532.070067167282198.92Standard Market
SPitanhaem7912338.11986541748156.18Standard Market
SPsalto9312137.629947304726130.51Standard Market
SCtubarao6012130.980094909668202.18Standard Market
SPvotorantim8412085.639886379242143.88Standard Market
MGaraguari5412048.940017700195223.13Standard Market
ESguarapari6712021.139976501465179.42Standard Market
RSsanta cruz do sul6211974.649991989136193.14Standard Market
SPfranco da rocha10911893.329976558685109.11Standard Market
GOrio verde6411847.880012512207185.12Standard Market
SPribeirao pires10711780.649990081787110.10Standard Market
PRtoledo7011688.66003036499166.98Standard Market
BAporto seguro3911671.599877357483299.27Standard Market
SCgaspar2911602.949851989746400.10Standard Market
SPcacapava7311520.139964461327157.81Standard Market
SPcajamar9811515.510002374649117.51Standard Market
GOjatai5411337.930023908615209.96Standard Market
BAlauro de freitas7511278.609934806824150.38Standard Market
MGcaratinga6211211.449956178665180.83Standard Market
MGconselheiro lafaiete8411192.38003563881133.24Standard Market
SPcatanduva7411183.029972076416151.12Standard Market

Insights:

  • The Dominance of São Paulo (SP): São Paulo completely crushes the competition. It acts as our absolute major economic hub, pulling in $2.15M in revenue from over 15,000 orders. This single city brings in almost double the revenue of the second-place city, Rio de Janeiro.
  • Top 5 Revenue Centers: The top 5 regions driving the highest total revenue are all major state capitals: São Paulo (SP), Rio de Janeiro (RJ), Belo Horizonte (MG), Brasília (DF), and Curitiba (PR). Together, they represent the core market for Olist.
  • State Performance (The “SP” Factor): Even outside of the main capital, the state of São Paulo (SP) is dominant. Cities like Campinas, Guarulhos, and Santos pull in massive volumes, showing that the platform’s logistics and marketing are incredibly effective throughout this entire state.
  • High Volume vs. High Revenue Outliers:
    • High Average Order Value (AOV) in Remote Regions: If you look at standard markets like Belém (PA), João Pessoa (PB), or Macapá (AP), their order volumes are lower (under 500 orders), but their Average Order Values are huge (216,216, 247, and $253 respectively). Customers in these regions are likely buying heavier, more expensive items because local retail options are limited.
    • Volume Does Not Always Equal Value: Take Carapicuiba (SP) as an example. It generated 324 orders (higher than many standard markets), but only pulled in 37KwithaverylowAOVof37K with a very low AOV of 114.80. This tells us the city generates high engagement but mostly for cheaper items.

Average Order Value and Basket Size Analysis

10. AOV and Busket Size Analysis: What is the average order value (AOV) and average basket size (number of items per order) for Olist? How do these metrics vary across year and months?

Solution:

  1. Firstly, we isolate order-level totals (order_item_totals and order_payment_totals CTEs): This is crucial to prevent data duplication caused by 1-to-many relationships (multiple items or payment methods per order). We calculate the total number of items (items_in_basket) and the total payment value (total_order_payment) grouped strictly by unique order_id.

  2. Secondly, we aggregate these metrics by month (monthly_metrics CTE): We join our safe order-level totals with the orders and Calendar tables, ensuring we filter out ‘canceled’ or ‘unavailable’ orders. This gives us the absolute total_orders, monthly_revenue, and monthly_items_count for each specific year and month.

  3. Finally, we calculate the key performance indicators (Final SELECT): By dividing the monthly revenue by total orders, we get the Average Order Value (AOV). Dividing the total items by total orders gives us the Average Basket Size. We also calculate an avg_price_per_item proxy (AOV divided by basket size) to understand the value density of the baskets over time.

Show SQL Code
with order_item_totals as (
    -- step 1: calculate total items and total price per order
    -- this prevents duplication before joining with payments
    select 
        order_id,
        count(order_item_id) as items_in_basket,
        sum(price) as total_items_price
    from 
        order_items
    group by 
        order_id
),

order_payment_totals as (
    -- step 2: calculate total payment value per order 
    -- (accounts for multi-payment methods/installments)
    select 
        order_id,
        sum(payment_value) as total_order_payment
    from 
        payments
    group by 
        order_id
),

monthly_metrics as (
    -- step 3: aggregate metrics by month
    select 
        c.year,
        c.month_number,
        c.month_name,
        count(o.order_id) as total_orders,
        sum(p.total_order_payment) as monthly_revenue,
        sum(i.items_in_basket) as monthly_items_count
    from 
        orders o
    inner join 
        Calendar c on o.purchase_datekey = c.datekey
    inner join 
        order_item_totals i on o.order_id = i.order_id
    inner join 
        order_payment_totals p on o.order_id = p.order_id
    where 
        o.order_status not in ('canceled', 'unavailable')
    group by 
        c.year, c.month_number, c.month_name
)

-- final output: calculating aov and avg basket size trends
select 
    year,
    month_name,
    total_orders,
    -- aov: total revenue / total orders
    cast(monthly_revenue / total_orders as decimal(10,2)) as aov,
    -- avg basket size: total items / total orders
    cast(monthly_items_count * 1.0 / total_orders as decimal(10,2)) as avg_basket_size,
    -- item value proxy: aov / avg basket size (average price per item)
    cast((monthly_revenue / total_orders) / (monthly_items_count * 1.0 / total_orders) as decimal(10,2)) as avg_price_per_item
from 
    monthly_metrics
order by 
    year, month_number;

Results:

yearmonth_nametotal_ordersaovavg_basket_sizeavg_price_per_item
2016September1136.232.0068.11
2016October290177.351.18150.38
2016December119.621.0019.62
2017January787174.091.21143.76
2017February1718165.091.13146.50
2017March2617162.651.14143.08
2017April2377170.801.12152.63
2017May3640160.141.13141.97
2017June3205155.951.11139.97
2017July3946146.701.14128.84
2017August4272154.981.14135.61
2017September4227169.671.14148.95
2017October4547168.201.17144.30
2017November7421157.981.16135.91
2017December5618153.371.12136.77
2018January7187153.421.14134.91
2018February6624147.921.15128.98
2018March7168160.821.14140.66
2018April6919167.121.15145.32
2018May6833167.681.16145.07
2018June6145166.071.15144.55
2018July6233166.831.13147.73
2018August6421155.261.12138.15
2018September1166.461.00166.46

İnsights:

  • Single-Item Shoppers: The most striking takeaway is the average basket size. It stays completely flat, hovering between 1.11 and 1.17 items per order across all active months. Olist customers almost exclusively buy just one item at a time.
  • Stable Average Order Value (AOV): AOV is remarkably consistent, generally floating between 150and150 and 170. There are slight dips (like 146.70inJuly2017or146.70 in July 2017 or 147.92 in Feb 2018), but it always bounces back into the $160s quickly.
  • The Black Friday Shopping Behavior: During the massive sales peak in November 2017, the AOV ($157.98) and basket size (1.16) were completely average. This proves the Black Friday revenue boom came entirely from a massive flood of new orders, not from customers stuffing their carts with multiple discounted items.
  • Price Per Item Controls AOV: Because the basket size is practically stuck at 1, the AOV is directly controlled by the avg_price_per_item. When customers buy slightly cheaper goods overall (like in early 2018), the AOV drops right alongside it.

Conclusion

What We Did:

  • Analyzed overall order and revenue trends using both raw monthly groupings and a 3-month rolling average to smooth out volatility.
  • Segmented product categories to compare raw sales volume against actual revenue generated.
  • Mapped out the geographic distribution of sales to find high-performing states and cities.
  • Tracked Average Order Value (AOV) and average basket size across the entire timeline.

What We Obtained (Key Findings):

  • Growth & Plateau: Olist experienced explosive growth in 2017 and matured into a stable business in 2018, consistently generating between 1Mand1M and 1.15M monthly.
  • Single-Item Baskets: Customers are overwhelmingly single-item shoppers. The average basket size is practically locked at ~1.14 items across all months.
  • Volume Over Cart Size: The massive Black Friday peak in November 2017 was driven entirely by a flood of new orders, not by customers stuffing more items into their carts.
  • The São Paulo Anchor: São Paulo (SP) is the undisputed core market for volume and revenue. However, remote regions consistently show a higher AOV, likely due to a lack of local retail alternatives.
  • Category Efficiency: Categories like health/beauty and watches are highly efficient (high volume + high revenue). In contrast, electronics are a “trap” category, moving lots of units but yielding relatively low revenue.