Time Series Analysis¶
MoM and Seasonality Analysis¶
29. What is the Month-over-Month (MoM) growth rate for total revenue across the entire dataset? Can we identify specific months where the business experienced 'Hyper-Growth' (>20% increase) versus months of stagnation or contraction?
Solution:
-
To calculate the month-over-month growth rate, the core metrics must first be aggregated by month. This should be done in the first CTE by joining the orders, payments, and calendar tables. The total number of orders and the total revenue must be calculated for each specific year and month. Canceled or unavailable orders must also be filtered out to ensure the revenue figures are perfectly accurate.
-
Next, the historical data must be retrieved to allow for comparison. The lag window function should be used here to fetch the revenue and order volume from both the previous month and the previous year. The data must be ordered chronologically by year and month number to ensure the lag function pulls the correct prior periods for every single row.
-
Once the current and previous metrics are aligned, the actual percentage growth must be calculated. The mathematical formula for percentage change will be applied to both month-over-month and year-over-year revenue. A nullif function must be included during the division step to prevent any mathematical errors just in case the previous month had zero recorded revenue.
-
Finally, the calculated growth rates must be categorized to clearly highlight business performance. A case statement will be used in the main query to assign a growth performance tag, easily identifying periods of hyper-growth above twenty percent, standard positive growth, or stagnation. An additional case statement should also be applied to flag key seasonal periods like Black Friday and the holiday season to provide necessary business context for any major revenue fluctuations.
Show SQL Code
/* growth and seasonality analysis
objective: calculate mom and yoy growth rates for revenue and order volume
logic:
1. aggregate revenue and order counts by month.
2. use lag() window function to compare current month vs prior month and prior year.
3. calculate percentage growth rates and classify performance tiers.
granularity: year_month
*/
with monthly_sales as (
-- step 1: aggregate core metrics by month
select
c.year,
c.month_number,
c.month_name,
count(distinct o.order_id) as total_orders,
sum(p.payment_value) as total_revenue
from
orders o
inner join
payments p on o.order_id = p.order_id
inner join
Calendar c on o.purchase_datekey = c.datekey
where
o.order_status not in ('canceled', 'unavailable')
group by
c.year, c.month_number, c.month_name
),
growth_metrics as (
-- step 2: use lag() to fetch values from the previous month and previous year
select
*,
-- previous month values
prev_month_orders = lag(total_orders) over (order by year, month_number),
prev_month_revenue = lag(total_revenue) over (order by year, month_number),
-- previous year values (offset of 12 rows)
prev_year_orders = lag(total_orders, 12) over (order by year, month_number),
prev_year_revenue = lag(total_revenue, 12) over (order by year, month_number)
from
monthly_sales
),
final_calculations as (
-- step 3: calculate percentage growth
select
*,
mom_revenue_growth_pct = cast(100.0 * (total_revenue - prev_month_revenue) / nullif(prev_month_revenue, 0) as decimal(10,2)),
yoy_revenue_growth_pct = cast(100.0 * (total_revenue - prev_year_revenue) / nullif(prev_year_revenue, 0) as decimal(10,2))
from
growth_metrics
)
-- final output: categorizing growth performance levels and seasonality
select
year,
month_name,
total_orders,
total_revenue,
mom_revenue_growth_pct,
-- identifying growth tiers as requested
case
when mom_revenue_growth_pct > 20 then 'Hyper-Growth (>20%)'
when mom_revenue_growth_pct > 0 then 'Positive Growth'
when mom_revenue_growth_pct <= 0 then 'Stagnation / Contraction'
else 'Baseline (No Prior Month)'
end as growth_performance_tag,
-- identifying seasonal spikes
case
when month_number = 11 then 'Black Friday Period'
when month_number = 12 then 'Holiday Season'
when month_number = 1 then 'New Year Peak'
else 'Standard Period'
end as seasonality_tag
from
final_calculations
order by
year, month_number;
Results:
| year | month_name | total_orders | total_revenue | mom_revenue_growth_pct | growth_performance_tag | seasonality_tag |
|---|---|---|---|---|---|---|
| 2016 | September | 1 | 136.22999572753906 | Baseline (No Prior Month) | Standard Period | |
| 2016 | October | 293 | 51657.52999472618 | 37819.35 | Hyper-Growth (>20%) | Standard Period |
| 2016 | December | 1 | 19.6200008392334 | -99.96 | Stagnation / Contraction | Holiday Season |
| 2017 | January | 787 | 137006.76003867388 | 698201.50 | Hyper-Growth (>20%) | New Year Peak |
| 2017 | February | 1718 | 283621.93943831325 | 107.01 | Hyper-Growth (>20%) | Standard Period |
| 2017 | March | 2617 | 425656.3995541632 | 50.08 | Hyper-Growth (>20%) | Standard Period |
| 2017 | April | 2377 | 405988.3795110285 | -4.62 | Stagnation / Contraction | Standard Period |
| 2017 | May | 3640 | 582926.160167003 | 43.58 | Hyper-Growth (>20%) | Standard Period |
| 2017 | June | 3205 | 499827.47035080194 | -14.26 | Stagnation / Contraction | Standard Period |
| 2017 | July | 3946 | 578858.5795211494 | 15.81 | Positive Growth | Standard Period |
| 2017 | August | 4272 | 662071.7698084246 | 14.38 | Positive Growth | Standard Period |
| 2017 | September | 4227 | 717195.2203616947 | 8.33 | Positive Growth | Standard Period |
| 2017 | October | 4547 | 764785.2796836384 | 6.64 | Positive Growth | Standard Period |
| 2017 | November | 7423 | 1172639.2297811043 | 53.33 | Hyper-Growth (>20%) | Black Friday Period |
| 2017 | December | 5620 | 861914.4599985629 | -26.50 | Stagnation / Contraction | Holiday Season |
| 2018 | January | 7187 | 1102639.4091129452 | 27.93 | Hyper-Growth (>20%) | New Year Peak |
| 2018 | February | 6625 | 979966.2302345932 | -11.13 | Stagnation / Contraction | Standard Period |
| 2018 | March | 7168 | 1152736.7396906763 | 17.63 | Positive Growth | Standard Period |
| 2018 | April | 6919 | 1156303.9107048158 | 0.31 | Positive Growth | Standard Period |
| 2018 | May | 6833 | 1145748.6310662013 | -0.91 | Stagnation / Contraction | Standard Period |
| 2018 | June | 6145 | 1020494.2907593064 | -10.93 | Stagnation / Contraction | Standard Period |
| 2018 | July | 6233 | 1039880.1590387616 | 1.90 | Positive Growth | Standard Period |
| 2018 | August | 6421 | 996896.1512343884 | -4.13 | Stagnation / Contraction | Standard Period |
| 2018 | September | 1 | 166.4600067138672 | -99.98 | Stagnation / Contraction | Standard Period |
Insights:
- The 2017 Startup Boom: The first half of 2017 is defined by aggressive "Hyper-Growth." Months like February (+107%), March (+50%), and May (+43%) show huge revenue jumps, proving this was the phase where Olist rapidly acquired its early market share.
- The Black Friday Rollercoaster: November 2017 was an absolute game-changer. Revenue spiked by 53.3% due to Black Friday sales. However, this created a predictable hangover in December, resulting in a 26.5% contraction as the holiday shopping rush ended.
- The 2018 Plateau: After recovering with a strong January 2018 (+27.9%), the platform's growth essentially flattened out. The rest of 2018 is filled with slight contractions (like -11% in February and -10% in June) and very mild growth (like +0.3% in April). The business stopped doubling its size and stabilized at around $1M to $1.15M per month.
- Ignoring the Noise: As analysts, we need to ignore the massive percentages at the very start (late 2016) and very end (Sept 2018) of the table. These months only have 1 recorded order, meaning they represent the setup phase and the cutoff date of our dataset, not actual business performance.
Seasonality Effect¶
30. How does the revenue of specific months compare across different years (e.g., Jan 2017 vs. Jan 2018)? Can we observe a consistent 'Seasonality Effect' where certain months (like November) consistently outperform others regardless of the year?
Solution:
-
To analyze the seasonality effect, revenue must first be aggregated by year and month. This should be done in the first CTE by joining the orders, payments, and calendar tables. The total revenue for each month of every year must be calculated while filtering out canceled or unavailable orders to ensure accuracy.
-
Next, the lag window function should be used to compare the revenue of the same month across different years. This will allow for a direct year-over-year comparison for each month, such as January 2017 vs January 2018. Additionally, the average monthly revenue for each year should be calculated to serve as a benchmark for identifying seasonal trends.
-
Once the current month revenue, previous year same month revenue, and yearly average are all aligned, a seasonality index can be calculated. This index will be derived by dividing the current month's revenue by the yearly average. An index greater than 1.0 will indicate that the month outperformed the year's average, suggesting a strong seasonal effect.
-
Finally, the seasonality index should be categorized to clearly identify which months are peak performers, standard performers, or low season periods. A case statement will be used to classify months with a seasonality index above 1.2 as 'Peak Months', those between 0.9 and 1.2 as 'Standard Performance', and those below 0.9 as 'Low Season / Slack Period'. This final output will help determine if certain months consistently outperform others across different years, confirming the presence of a seasonality effect.
Show SQL Code
/* growth and seasonality analysis
objective:
1. compare specific months across years (e.g., Jan 2017 vs Jan 2018).
2. calculate a 'seasonality index' to identify recurring monthly trends.
logic:
- aggregate revenue by year and month.
- use window functions to compare same-month performance across years.
- calculate the deviation of each month from its year's average to find the 'seasonal push'.
*/
with monthly_revenue_base as (
-- step 1: aggregate total revenue by year and month
select
c.year,
c.month_number,
c.month_name,
sum(p.payment_value) as monthly_revenue
from
orders o
inner join
payments p on o.order_id = p.order_id
inner join
Calendar c on o.purchase_datekey = c.datekey
where
o.order_status not in ('canceled', 'unavailable')
group by
c.year, c.month_number, c.month_name
),
seasonal_indexing as (
-- step 2: calculate year-over-year same-month comparisons and yearly averages
select
*,
-- direct comparison: revenue of the same month in the previous year
prev_year_same_month_rev = lag(monthly_revenue) over (
partition by month_number
order by year
),
-- average monthly revenue for that specific year (benchmark)
yearly_avg_monthly_rev = avg(monthly_revenue) over (
partition by year
)
from
monthly_revenue_base
)
-- final output: quantifying the seasonality effect
select
year,
month_name,
cast(monthly_revenue as decimal(15,2)) as current_month_revenue,
-- yoy comparison for the specific month
cast(100.0 * (monthly_revenue - prev_year_same_month_rev) / nullif(prev_year_same_month_rev, 0) as decimal(10,2)) as month_specific_yoy_growth_pct,
-- seasonality index: revenue / yearly average
-- index > 1.0 means the month outperforms the year's average
cast(monthly_revenue / nullif(yearly_avg_monthly_rev, 0) as decimal(10,2)) as seasonality_index,
-- classification of the month's role in the business cycle
case
when (monthly_revenue / yearly_avg_monthly_rev) >= 1.2 then 'Peak Month (Strong Seasonality)'
when (monthly_revenue / yearly_avg_monthly_rev) between 0.9 and 1.2 then 'Standard Performance'
when (monthly_revenue / yearly_avg_monthly_rev) < 0.9 then 'Low Season / Slack Period'
else 'Baseline'
end as seasonal_classification
from
seasonal_indexing
order by
year, month_number;
Show SQL Code
/* growth and seasonality analysis
objective:
1. compare specific months across years (e.g., Jan 2017 vs Jan 2018).
2. calculate a 'seasonality index' to identify recurring monthly trends.
3. analyze order distribution by Day of the Week and Hour of Day.
logic:
- aggregate revenue by year and month for high-level trends.
- use window functions for year-over-year same-month comparisons.
- extract time-of-day and day-of-week metrics to find peak engagement windows.
*/
with monthly_revenue_base as (
-- step 1: aggregate total revenue by year and month
select
c.year,
c.month_number,
c.month_name,
sum(p.payment_value) as monthly_revenue
from
orders o
inner join
payments p on o.order_id = p.order_id
inner join
Calendar c on o.purchase_datekey = c.datekey
where
o.order_status not in ('canceled', 'unavailable')
group by
c.year, c.month_number, c.month_name
),
seasonal_indexing as (
-- step 2: calculate year-over-year same-month comparisons and yearly averages
select
*,
-- direct comparison: revenue of the same month in the previous year
prev_year_same_month_rev = lag(monthly_revenue) over (
partition by month_number
order by year
),
-- average monthly revenue for that specific year (benchmark)
yearly_avg_monthly_rev = avg(monthly_revenue) over (
partition by year
)
from
monthly_revenue_base
),
-- step 3: analyzing hourly and weekly volume distribution
time_of_day_stats as (
select
datename(dw, order_purchase_timestamp) as day_of_week,
datepart(dw, order_purchase_timestamp) as day_num, -- used for sorting
datepart(hour, order_purchase_timestamp) as hour_of_day,
count(order_id) as total_orders
from
orders
where
order_status not in ('canceled', 'unavailable')
group by
datename(dw, order_purchase_timestamp),
datepart(dw, order_purchase_timestamp),
datepart(hour, order_purchase_timestamp)
)
-- final output: combining the original seasonality analysis with the new time-of-day insights
-- part 1: monthly seasonality (existing)
select
year,
month_name,
cast(monthly_revenue as decimal(15,2)) as current_month_revenue,
-- yoy comparison for the specific month
cast(100.0 * (monthly_revenue - prev_year_same_month_rev) / nullif(prev_year_same_month_rev, 0) as decimal(10,2)) as month_specific_yoy_growth_pct,
-- seasonality index
cast(monthly_revenue / nullif(yearly_avg_monthly_rev, 0) as decimal(10,2)) as seasonality_index,
case
when (monthly_revenue / yearly_avg_monthly_rev) >= 1.2 then 'Peak Month (Strong Seasonality)'
when (monthly_revenue / yearly_avg_monthly_rev) between 0.9 and 1.2 then 'Standard Performance'
when (monthly_revenue / yearly_avg_monthly_rev) < 0.9 then 'Low Season / Slack Period'
else 'Baseline'
end as seasonal_classification
from
seasonal_indexing
union all
-- part 2: placeholder/separator row for clear data interpretation if exported
select
null, '--- INTRA-DAY ANALYSIS ---', null, null, null, null
union all
-- part 3: day/hour distribution
select
day_num as year, -- repurposing columns for the union's schema
concat(day_of_week, ' at ', hour_of_day, ':00') as month_name,
total_orders as current_month_revenue,
null as month_specific_yoy_growth_pct,
null as seasonality_index,
case
when hour_of_day between 12 and 14 then 'Lunchtime Spike Window'
when hour_of_day between 20 and 22 then 'Evening Spike Window'
else 'Standard Time'
end as seasonal_classification
from
time_of_day_stats
order by
year, month_name;
Show SQL Code
/* growth and seasonality analysis
objective:
1. compare specific months across years (e.g., Jan 2017 vs Jan 2018).
2. calculate a 'seasonality index' to identify recurring monthly trends.
3. analyze order distribution by Day of the Week and Hour of Day.
4. calculate cumulative revenue to track "The Hockey Stick" growth curve.
logic:
- aggregate revenue by year and month for high-level trends.
- use window functions for year-over-year and running total calculations.
- extract time-of-day and day-of-week metrics to find peak engagement windows.
*/
with monthly_revenue_base as (
-- step 1: aggregate total revenue by year and month
select
c.year,
c.month_number,
c.month_name,
sum(p.payment_value) as monthly_revenue
from
orders o
inner join
payments p on o.order_id = p.order_id
inner join
Calendar c on o.purchase_datekey = c.datekey
where
o.order_status not in ('canceled', 'unavailable')
group by
c.year, c.month_number, c.month_name
),
seasonal_indexing as (
-- step 2: calculate YoY comparisons, yearly averages, and cumulative running total
select
*,
-- direct comparison: revenue of the same month in the previous year
prev_year_same_month_rev = lag(monthly_revenue) over (
partition by month_number
order by year
),
-- average monthly revenue for that specific year (benchmark)
yearly_avg_monthly_rev = avg(monthly_revenue) over (
partition by year
),
-- cumulative revenue: running total since the platform inception
running_total_revenue = sum(monthly_revenue) over (
order by year, month_number
)
from
monthly_revenue_base
),
-- step 3: analyzing hourly and weekly volume distribution
time_of_day_stats as (
select
datename(dw, order_purchase_timestamp) as day_of_week,
datepart(dw, order_purchase_timestamp) as day_num, -- used for sorting
datepart(hour, order_purchase_timestamp) as hour_of_day,
count(order_id) as total_orders
from
orders
where
order_status not in ('canceled', 'unavailable')
group by
datename(dw, order_purchase_timestamp),
datepart(dw, order_purchase_timestamp),
datepart(hour, order_purchase_timestamp)
)
-- final output: combining the original seasonality analysis with the new running total and time-of-day insights
-- part 1: monthly seasonality and cumulative growth
select
year,
month_name,
cast(monthly_revenue as decimal(15,2)) as current_month_revenue,
cast(running_total_revenue as decimal(15,2)) as cumulative_revenue,
-- yoy comparison for the specific month
cast(100.0 * (monthly_revenue - prev_year_same_month_rev) / nullif(prev_year_same_month_rev, 0) as decimal(10,2)) as month_specific_yoy_growth_pct,
-- seasonality index
cast(monthly_revenue / nullif(yearly_avg_monthly_rev, 0) as decimal(10,2)) as seasonality_index,
case
when (monthly_revenue / yearly_avg_monthly_rev) >= 1.2 then 'Peak Month (Strong Seasonality)'
when (monthly_revenue / yearly_avg_monthly_rev) between 0.9 and 1.2 then 'Standard Performance'
when (monthly_revenue / yearly_avg_monthly_rev) < 0.9 then 'Low Season / Slack Period'
else 'Baseline'
end as seasonal_classification
from
seasonal_indexing
union all
-- part 2: placeholder/separator row
select
null, '--- INTRA-DAY ANALYSIS ---', null, null, null, null, null
union all
-- part 3: day/hour distribution
select
day_num as year,
concat(day_of_week, ' at ', hour_of_day, ':00') as month_name,
total_orders as current_month_revenue,
null as cumulative_revenue,
null as month_specific_yoy_growth_pct,
null as seasonality_index,
case
when hour_of_day between 12 and 14 then 'Lunchtime Spike Window'
when hour_of_day between 20 and 22 then 'Evening Spike Window'
else 'Standard Time'
end as seasonal_classification
from
time_of_day_stats
order by
year, month_name;