Payments and Risk Analysis

This section will explore the payment distribution across different methods, analyze the correlation between installment counts and order values, and identify regional payment preferences to understand the risk and revenue implications for Olist's e-commerce platform.

8Project Roadmap: Step-by-Step Navigation
1
Problem Description - Olist E-Commerce Data AnalysisThis article provides a detailed problem description for the Olist E-Commerce Data Analysis project, outlining the tasks assigned by the company, the structure of the analysis, the tools and skills used, acknowledgements, and references.
2
Data NormalizationThis section willl cover the data normalization process for the Olist E-Commerce dataset, including data ingestion, cleaning, star schema modeling, and the creation of an explicit date table to support time-based analysis.
3
Business QuestionsIn this section, business questions will be defined to guide the data analysis process for the Olist E-Commerce dataset. These questions will be categorized into different domains such as customer behavior, sales performance, delivery logistics, seller performance, payments, and customer experience to ensure a comprehensive analysis of the data.
4
Customer Retention AnalysisThis section focuses on analyzing customer retention patterns using the Olist e-commerce dataset. The analysis includes segmenting customers based on their purchasing behavior, understanding trends in customer retention, and exploring the factors that influence customer loyalty.
5
Order and Sales Performance AnalysisThis section will explore the order and sales performance metrics for Olist.
6
Delivery Performance and SLA AnalysisThis section will explore the delivery performance and logistics efficiency of Olist's e-commerce operations, analyzing on-time delivery rates, freight costs, and the decomposition of delivery timelines to identify bottlenecks and regional disparities.
7
Order Fullfilment Efficiency AnalysisThis section will explore how efficiently sellers are fulfilling orders on the Olist platform by analyzing the time gap between order approval and handover to logistics partners, identifying the 'Bottom 10%' of sellers in terms of dispatch efficiency, and providing insights into potential areas for improvement in the fulfillment process.
8
Payments and Risk AnalysisThis section will explore the payment distribution across different methods, analyze the correlation between installment counts and order values, and identify regional payment preferences to understand the risk and revenue implications for Olist's e-commerce platform.
9
Customer Review and Satisfaction AnalysisThis section will explore the effect of delivery delays on customer satisfaction, identify product categories with mismatched delivery speed and review scores, and analyze the distribution of review scores across different product categories to uncover common issues in negative feedback.
10
Times Series Analysis of Orders and Sales PerformanceThis section will explore the time series analysis of orders and sales performance for Olist, including month-over-month growth rates, seasonality effects, and the distribution of order volume by day and hour to identify key trends and patterns in customer behavior.

Payment Distribution and Risk Analysis

20. What is the distribution of total revenue across different payment methods (Credit Card, Boleto, Voucher, Debit Card)? Specifically, does the ‘Average Transaction Value’ differ significantly between Credit Card users and Boleto users?

Solution:

Solution Steps for Payment Method Distribution and ATV

  1. Firstly, we aggregate metrics by payment type (payment_distribution CTE): We query the payments table and group by payment_type to calculate the total transaction_count, the total_revenue (sum(payment_value)), and the average transaction value (avg(payment_value)).

  2. Secondly, we calculate the platform-wide revenue baseline (global_metrics CTE): We aggregate the sum of all revenue from the previous CTE to establish the grand_total_revenue. This gives us the necessary denominator for calculating market share percentages.

  3. Finally, we compare payment methods and value differences (Final SELECT): We CROSS JOIN the aggregated payment data with the global baseline. By dividing the payment type’s revenue by the grand total, we compute the revenue_share_pct. Additionally, we format the avg_transaction_value (ATV) and use a CASE statement to classify the transaction profiles, specifically highlighting the differing use cases of ‘Credit Card’ (High Convenience / Installment Potential) versus ‘Boleto’ (Cash-based / Single Payment).

Show SQL Code
with payment_distribution as (
    -- step 1: aggregate metrics at the payment type level
    select 
        payment_type,
        count(*) as transaction_count,
        sum(payment_value) as total_revenue,
        avg(payment_value) as avg_transaction_value
    from 
        payments
    group by 
        payment_type
),

global_metrics as (
    -- step 2: get platform-wide totals for percentage calculations
    select 
        sum(total_revenue) as grand_total_revenue
    from 
        payment_distribution
)

-- final output: comparing payment methods and identifying value differences
select 
    pd.payment_type,
    pd.transaction_count,
    cast(pd.total_revenue as decimal(15,2)) as revenue_contribution,
    -- revenue share percentage
    cast(100.0 * pd.total_revenue / gm.grand_total_revenue as decimal(5,2)) as revenue_share_pct,
    -- average transaction value (atv)
    cast(pd.avg_transaction_value as decimal(10,2)) as atv,
    -- comparison logic specifically for credit card vs boleto
    case 
        when pd.payment_type = 'credit_card' then 'High Convenience / Installment Potential'
        when pd.payment_type = 'boleto' then 'Cash-based / Single Payment'
        else 'Alternative Method'
    end as payment_profile
from 
    payment_distribution pd
cross join 
    global_metrics gm
order by 
    revenue_contribution desc;

Results:

payment_typetransaction_countrevenue_contributionrevenue_share_pctatvpayment_profile
credit_card7679512542084.1978.34163.32High Convenience / Installment Potential
boleto197842869361.2717.92145.03Cash-based / Single Payment
voucher5775379436.872.3765.70Alternative Method
debit_card1529217989.791.36142.57Alternative Method
not_defined30.000.000.00Alternative Method

Insights:

  • Credit Cards Dominate the Market: Credit cards are by far the most important payment method for Olist, bringing in an overwhelming 78.3% of the total revenue ($12.5M).
  • Boleto is Still Essential: The Brazilian cash-based payment method, Boleto, is the second largest driver, securing almost 18% of the revenue. This shows that a solid portion of our customer base relies on cash or doesn’t use traditional credit systems.
  • Credit Drives Bigger Baskets: There is a clear gap in the Average Transaction Value (ATV). Credit card users spend an average of 163.32,comparedto163.32, compared to 145.03 for Boleto users. Customers are obviously more willing to make high-ticket purchases when they can take advantage of credit card installments.
  • Vouchers Mean Small Spends: Vouchers have the lowest ATV at just $65.70. This makes sense, as they are typically used as gift cards, promotional discounts, or partial payments for smaller items rather than for massive purchases.
  • Debit Cards are Rare: Surprisingly, standard debit cards are barely used on the platform, accounting for just 1.36% of total revenue. Customers overwhelmingly prefer the installment options of a credit card or the cash accessibility of a Boleto.

Installment Correlation and High-Ticket Analysis

21. Is there a positive correlation between the number of installments chosen (payment_installments) and the total order value? Specifically, what percentage of high-ticket orders (>R$500) are purchased using 5+ installments?”

Solution:

Solution Steps for Payment Method and Installment Analysis

  1. Firstly, we compare payment method performance (Part 1): We aggregate the transaction_count, total_revenue, and avg_transaction_value grouped by payment_type. By cross-joining this with a global revenue total, we calculate the revenue_share_pct, allowing us to compare the financial weight of credit cards versus cash-based methods like Boleto.

  2. Secondly, we correlate installment counts with order value (Part 2 - installment_stats CTE): We isolate ‘credit_card’ transactions—since they are the primary source of installments—and group them by the number of payment_installments. We calculate the average payment value for each count to see if customers purchasing expensive items tend to choose higher installment numbers.

  3. Thirdly, we isolate the high-ticket segment (Part 2 - high_ticket_segment CTE): We filter the payments table for orders with a value greater than R$500. We then count the total number of these orders and use a conditional sum to identify how many of them utilized 5 or more installments, effectively identifying “financing” behavior.

  4. Finally, we calculate the financing penetration (Part 2 Final SELECT): We compute the high_ticket_financing_pct by dividing our high-installment count by the total high-ticket count. To provide a clear correlation proof, we also use subqueries to fetch and compare the average values of 1-installment versus 10-installment purchases, highlighting the trend toward higher installment counts for more expensive goods.

Show SQL Code
with payment_distribution as (
    -- step 1: aggregate metrics at the payment type level
    select 
        payment_type,
        count(*) as transaction_count,
        sum(payment_value) as total_revenue,
        avg(payment_value) as avg_transaction_value
    from 
        payments
    group by 
        payment_type
),

global_metrics as (
    -- step 2: get platform-wide totals for percentage calculations
    select 
        sum(total_revenue) as grand_total_revenue
    from 
        payment_distribution
)

-- output 1: comparing payment methods and identifying value differences
select 
    pd.payment_type,
    pd.transaction_count,
    cast(pd.total_revenue as decimal(15,2)) as revenue_contribution,
    cast(100.0 * pd.total_revenue / gm.grand_total_revenue as decimal(5,2)) as revenue_share_pct,
    cast(pd.avg_transaction_value as decimal(10,2)) as atv
from 
    payment_distribution pd
cross join 
    global_metrics gm
order by 
    revenue_contribution desc;

/* PART 2: Installment Correlation & High-Ticket Analysis
    Objective: 
        1. Analyze if higher installment counts correlate with higher order values.
        2. Calculate the % of high-ticket orders (>R$500) using 5+ installments.
*/

with installment_stats as (
    -- step 1: calculate average value per installment count
    select 
        payment_installments,
        count(*) as order_count,
        avg(payment_value) as avg_payment_value
    from 
        payments
    where 
        payment_type = 'credit_card' -- installments primarily apply to credit cards
        and payment_installments > 0
    group by 
        payment_installments
),

high_ticket_segment as (
    -- step 2: isolate high-ticket orders and check installment counts
    select 
        count(*) as total_high_ticket_orders,
        sum(case when payment_installments >= 5 then 1 else 0 end) as high_installment_count
    from 
        payments
    where 
        payment_value > 500
        and payment_type = 'credit_card'
)

-- output 2: identifying the "financing" behavior of high-value customers
select 
    h.total_high_ticket_orders,
    h.high_installment_count,
    -- percentage of high-ticket users who "finance" via 5+ installments
    cast(100.0 * h.high_installment_count / h.total_high_ticket_orders as decimal(5,2)) as high_ticket_financing_pct,
    -- referencing the average value for 1 installment vs 10 installments for context
    (select cast(avg_payment_value as decimal(10,2)) from installment_stats where payment_installments = 1) as avg_val_1_inst,
    (select cast(avg_payment_value as decimal(10,2)) from installment_stats where payment_installments = 10) as avg_val_10_inst
from 
    high_ticket_segment h;

Results 1:

payment_typetransaction_countrevenue_contributionrevenue_share_pctatv
credit_card7679512542084.1978.34163.32
boleto197842869361.2717.92145.03
voucher5775379436.872.3765.70
debit_card1529217989.791.36142.57
not_defined30.000.000.00

Results 2:

total_high_ticket_ordershigh_installment_counthigh_ticket_financing_pctavg_val_1_instavg_val_10_inst
3488263775.6095.87415.09

Insights:

  • Clear Positive Correlation: There is a massive link between how many installments a customer chooses and how much they spend. Orders paid in a single payment average just 95.87.However,whencustomersstretchtheirpaymentacross10installments,theaverageordervalueskyrocketsto95.87. However, when customers stretch their payment across 10 installments, the average order value skyrockets to 415.09.
  • Financing is Mandatory for Big Purchases: For high-ticket items (orders over $500), offering installment plans is absolutely critical. Out of 3,488 high-ticket orders on the platform, a huge 75.6% were purchased using 5 or more installments.
  • Credit Unlocks Revenue: The data proves that Olist’s ability to sell expensive goods is directly tied to its credit card installment options. Without the flexibility to pay over several months, the vast majority of those premium, $500+ sales simply would not happen.

Regional Payment Preferences & Boleto Penetration

22. How does payment preference vary by region (customer_state)? Are there specific states where “Boleto’ usage is isproportionately high compared to the national average, indicating a lower penetration of credit cards?

Solution:

Solution Steps for Payment Method, Installment, and Regional Analysis

  1. Firstly, we compare payment method performance (Part 1): We aggregate the transaction_count, total_revenue, and avg_transaction_value grouped by payment_type. By cross-joining this with a global revenue total, we calculate the revenue_share_pct, allowing us to compare the financial weight of credit cards versus cash-based methods like Boleto.

  2. Secondly, we correlate installment counts with order value (Part 2 - installment_stats CTE): We isolate ‘credit_card’ transactions—since they are the primary source of installments—and group them by the number of payment_installments. We calculate the average payment value for each count to see if customers purchasing expensive items tend to choose higher installment numbers.

  3. Thirdly, we isolate the high-ticket segment (Part 2 - high_ticket_segment CTE): We filter the payments table for orders with a value greater than R$500. We then count the total number of these orders and use a conditional sum to identify how many of them utilized 5 or more installments, effectively identifying “financing” behavior.

  4. Fourthly, we calculate the financing penetration (Part 2 Final SELECT): We compute the high_ticket_financing_pct by dividing our high-installment count by the total high-ticket count. To provide a clear correlation proof, we also use subqueries to fetch and compare the average values of 1-installment versus 10-installment purchases.

  5. Fifthly, we aggregate payment preferences by state (Part 3 - state_payment_counts CTE): We join the payments, orders, and customers tables to count the volume of each payment type within every customer_state. We specifically isolate the counts for ‘boleto’ and ‘credit_card’ to prepare for regional penetration analysis.

  6. Finally, we identify regional outliers and Boleto penetration (Part 3 Final SELECT): We calculate the state_boleto_pct for each state and compare it to a national_avg_boleto_pct generated via a window function. By calculating the percentage_point_diff, we categorize states into profiles like ‘High Boleto Usage (Low Credit Penetration)’ or ‘High Credit Usage (Digital/Banked Hub)’ based on their deviation from the national norm.

Show SQL Code

/* PART 1: Payment Method & Transaction Value Analysis
    Objective: 
        1. Analyze the distribution of revenue across payment types.
        2. Compare average transaction value (atv) between credit card and boleto users.
*/

with payment_distribution as (
    -- step 1: aggregate metrics at the payment type level
    select 
        payment_type,
        count(*) as transaction_count,
        sum(payment_value) as total_revenue,
        avg(payment_value) as avg_transaction_value
    from 
        payments
    group by 
        payment_type
),

global_metrics as (
    -- step 2: get platform-wide totals for percentage calculations
    select 
        sum(total_revenue) as grand_total_revenue
    from 
        payment_distribution
)

-- output 1: comparing payment methods and identifying value differences
select 
    pd.payment_type,
    pd.transaction_count,
    cast(pd.total_revenue as decimal(15,2)) as revenue_contribution,
    cast(100.0 * pd.total_revenue / gm.grand_total_revenue as decimal(5,2)) as revenue_share_pct,
    cast(pd.avg_transaction_value as decimal(10,2)) as atv
from 
    payment_distribution pd
cross join 
    global_metrics gm
order by 
    revenue_contribution desc;


/* PART 2: Installment Correlation & High-Ticket Analysis
    Objective: 
        1. Analyze if higher installment counts correlate with higher order values.
        2. Calculate the % of high-ticket orders (>R$500) using 5+ installments.
*/

with installment_stats as (
    -- step 1: calculate average value per installment count
    select 
        payment_installments,
        count(*) as order_count,
        avg(payment_value) as avg_payment_value
    from 
        payments
    where 
        payment_type = 'credit_card' -- installments primarily apply to credit cards
        and payment_installments > 0
    group by 
        payment_installments
),

high_ticket_segment as (
    -- step 2: isolate high-ticket orders and check installment counts
    select 
        count(*) as total_high_ticket_orders,
        sum(case when payment_installments >= 5 then 1 else 0 end) as high_installment_count
    from 
        payments
    where 
        payment_value > 500
        and payment_type = 'credit_card'
)

-- output 2: identifying the "financing" behavior of high-value customers
select 
    h.total_high_ticket_orders,
    h.high_installment_count,
    -- percentage of high-ticket users who "finance" via 5+ installments
    cast(100.0 * h.high_installment_count / h.total_high_ticket_orders as decimal(5,2)) as high_ticket_financing_pct,
    -- referencing the average value for 1 installment vs 10 installments for context
    (select cast(avg_payment_value as decimal(10,2)) from installment_stats where payment_installments = 1) as avg_val_1_inst,
    (select cast(avg_payment_value as decimal(10,2)) from installment_stats where payment_installments = 10) as avg_val_10_inst
from 
    high_ticket_segment h;


/* PART 3: Regional Payment Preferences & Boleto Penetration
    Objective: 
        1. Determine how payment preferences vary by customer state.
        2. Identify states where Boleto usage is significantly higher than the national average.
*/

with state_payment_counts as (
    -- step 1: count payment types per state
    select 
        c.customer_state,
        count(p.payment_type) as total_payments,
        sum(case when p.payment_type = 'boleto' then 1 else 0 end) as boleto_count,
        sum(case when p.payment_type = 'credit_card' then 1 else 0 end) as credit_card_count
    from 
        payments p
    inner join 
        orders o on p.order_id = o.order_id
    inner join 
        customers c on o.customer_id = c.customer_id
    group by 
        c.customer_state
),

state_boleto_share as (
    -- step 2: calculate state-level boleto share and the national benchmark
    select 
        customer_state,
        total_payments,
        cast(100.0 * boleto_count / total_payments as decimal(5,2)) as state_boleto_pct,
        -- national benchmark using window function
        avg(100.0 * boleto_count / total_payments) over () as national_avg_boleto_pct
    from 
        state_payment_counts
)

-- output 3: identifying regional outliers in payment behavior
select 
    customer_state,
    total_payments,
    state_boleto_pct,
    cast(national_avg_boleto_pct as decimal(5,2)) as national_avg_pct,
    -- deviation from national norm
    cast(state_boleto_pct - national_avg_boleto_pct as decimal(5,2)) as percentage_point_diff,
    case 
        when state_boleto_pct > national_avg_boleto_pct + 5 then 'High Boleto Usage (Low Credit Penetration)'
        when state_boleto_pct < national_avg_boleto_pct - 5 then 'High Credit Usage (Digital/Banked Hub)'
        else 'Standard Regional Mix'
    end as regional_payment_profile
from 
    state_boleto_share
order by 
    state_boleto_pct desc;

Results 1:

payment_typetransaction_countrevenue_contributionrevenue_share_pctatv
credit_card7679512542084.1978.34163.32
boleto197842869361.2717.92145.03
voucher5775379436.872.3765.70
debit_card1529217989.791.36142.57
not_defined30.000.000.00

Results 2:

total_high_ticket_ordershigh_installment_counthigh_ticket_financing_pctavg_val_1_instavg_val_10_inst
3488263775.6095.87415.09

Results 3:

customer_statetotal_paymentsstate_boleto_pctnational_avg_pctpercentage_point_diffregional_payment_profile
AP7028.5720.358.22High Boleto Usage (Low Credit Penetration)
RR4628.2620.357.91High Boleto Usage (Low Credit Penetration)
MA76726.4720.356.12High Boleto Usage (Low Credit Penetration)
TO30125.2520.354.90Standard Regional Mix
MT95824.8420.354.49Standard Regional Mix
RO26124.5220.354.17Standard Regional Mix
RS566823.9820.353.63Standard Regional Mix
MS73623.7820.353.43Standard Regional Mix
SC375422.3820.352.03Standard Regional Mix
PA101121.2720.350.92Standard Regional Mix
GO211221.2620.350.91Standard Regional Mix
PR526221.2520.350.90Standard Regional Mix
SE36120.7820.350.43Standard Regional Mix
ES210719.1320.35-1.22Standard Regional Mix
AC8419.0520.35-1.30Standard Regional Mix
MG1210219.0420.35-1.31Standard Regional Mix
SP4362218.8120.35-1.54Standard Regional Mix
DF220417.9720.35-2.38Standard Regional Mix
PI52417.9420.35-2.41Standard Regional Mix
BA361017.0120.35-3.34Standard Regional Mix
PB57016.3220.35-4.03Standard Regional Mix
PE172816.0320.35-4.32Standard Regional Mix
RJ1352715.9920.35-4.36Standard Regional Mix
AL42715.9320.35-4.42Standard Regional Mix
RN52215.3320.35-5.02High Credit Usage (Digital/Banked Hub)
CE139814.6620.35-5.69High Credit Usage (Digital/Banked Hub)
AM15413.6420.35-6.71High Credit Usage (Digital/Banked Hub)

Insights:

  • The National Baseline: Across the entire country, the average Boleto usage sits at 20.35%. Most states stay relatively close to this baseline, representing a standard mix of cash and credit.
  • Low Credit Penetration in the North: States like Amapá (AP), Roraima (RR), and Maranhão (MA) show disproportionately high Boleto usage, ranging from 26% to over 28%. This strongly indicates that customers in these regions have lower access to traditional banking and credit cards, making cash-based payments essential for them to shop online.
  • Highly Banked Regions: On the other end of the spectrum, Amazonas (AM), Ceará (CE), and Rio Grande do Norte (RN) have the lowest Boleto usage on the platform (13% to 15%). Customers here overwhelmingly prefer to use credit cards.
  • The Major Economic Hubs: Our biggest revenue-driving states, São Paulo (SP) and Rio de Janeiro (RJ), actually fall below the national average for Boleto usage (18.8% and 15.9% respectively). Because these are major economic centers, their populations are highly banked and comfortably rely on credit for e-commerce.
  • Actionable Takeaway: If Olist wants to run marketing campaigns in states like AP or MA, advertisements should clearly highlight that Boleto is accepted, as it is a major deciding factor for those specific buyers.

Conclusion

What We Did:

  • Calculated the revenue share and Average Transaction Value (ATV) for each payment method (Credit Card, Boleto, Voucher, Debit Card).
  • Analyzed the relationship between the number of installments chosen and the success of high-ticket (>$500) sales.
  • Mapped out regional payment preferences to find states with high cash (Boleto) dependency versus those with high credit card penetration.

What We Obtained (Key Findings):

  • Credit is King: Credit cards drive the platform, generating over 78% of total revenue. They also encourage customers to spend more, with a higher ATV (163)comparedtoBoletousers(163) compared to Boleto users (145).
  • Installments Unlock Big Sales: Offering payment plans is absolutely mandatory for expensive items. Over 75% of all high-ticket orders (above $500) were purchased using 5 or more installments. Without credit, these sales likely wouldn’t happen.
  • Boleto is Essential for Accessibility: While credit dominates, Boleto still secures nearly 18% of revenue. It is especially critical in Northern states like Amapá (AP) and Roraima (RR), where Boleto usage jumps to 28% due to lower local credit card access.
  • Economic Hubs Prefer Credit: Major markets like São Paulo and Rio de Janeiro fall below the national Boleto average, confirming that their highly banked populations comfortably rely on credit for e-commerce.