The Problem
This project reconstructs the full purchase funnel from raw event logs to identify exactly where users abandon, and which segments are most affected.
The analysis focuses on three questions: Where do users drop off in the funnel? Which traffic sources bring high-intent visitors? How do device usage and engagement patterns relate to conversion?
Google Merchandise Store via BigQuery
The data comes from the Google Analytics Sample E-commerce dataset, publicly available through Google BigQuery. It contains anonymized interaction data from the real Google Merchandise Store, a retail site selling Google-branded products like apparel, accessories, and stationery.
The dataset is structured at the hit level: each row represents a single user interaction, a pageview, a product click, an add-to-cart event, or a transaction. These hits belong to sessions, and sessions belong to users. Key fields include:
- Traffic acquisition (channel, source, campaign)
- Device information (desktop, mobile, tablet)
- Navigation behavior (pageviews, time on site)
- Product interactions (impressions, clicks, add-to-cart)
- Transactions and revenue
Because the raw data contains millions of hit-level events, BigQuery was used to query and aggregate the data before bringing it into Python for analysis.
Since the raw data is at the hit level, the first step was reconstructing session-level behavior. The workflow involved:
Where Users Abandon
The first step of the analysis is to reconstruct the purchasing funnel and observe how users move through the main stages of the buying process. The most critical drop occurs right at the top: only 14.8% of sessions ever reach a product page. Most users leave before any meaningful product interaction happens. Once users do engage with products, the funnel tightens progressively, and about 50% of checkout sessions convert into a purchase, suggesting the checkout experience itself is not the main problem.
This shifts the main priority: the biggest lever is not optimizing checkout, it's getting more users to interact with products in the first place.
Engagement Predicts Purchase
Classifying sessions by their deepest funnel stage reveals a heavy concentration at the top: over half bounce immediately. But the engagement data tells a clear story: users who purchase spend ~19× more time on site than bouncers and interact with over 26 pages on average.
| Session Type | Sessions | Avg Pageviews | Avg Time on Site |
|---|---|---|---|
| Bounce | ~37,600 | ~1 | minimal |
| Product Viewer | ~6,900 | ~9 | ~340 sec |
| Cart Abandoner | ~3,000 | ~13 | ~613 sec |
| Checkout Abandoner | ~1,100 | ~19 | ~852 sec |
| Purchased | ~1,070 | ~26 | ~1,146 sec |
Pageviews and time on site act as strong proxies for purchase intent, suggesting these metrics could power engagement-based targeting or personalization triggers.
Desktop Converts 5× More Than Mobile
Desktop users convert at 2.05% while mobile and tablet sit below 0.41%. A gap this large is unlikely to be explained by intent differences alone, mobile usability friction is almost certainly a contributing factor.
Both convert below 0.41%. Combined, they represent a large share of total sessions but a negligible share of revenue.
Converts at 2.05%, more than 5× higher. Desktop users also reach deeper funnel stages at significantly higher rates.
Not All Traffic Is Equal
Organic Search drives the most visits by volume, but it's far from the most valuable source. Referral converts at 5.57%, nearly 4× the next best channel and 185× better than Social. Social traffic generates volume with almost zero commercial return.
The bounce rate chart reinforces the same ranking: Social bounces at 65.1% and barely converts, while Referral has the lowest bounce rate (29.9%) and the highest conversion, consistent with high-intent users arriving from trusted external sources.
Three Concrete Opportunities
Increase product discovery at the top of the funnel
With 85% of sessions never reaching a product page, the biggest lever is improving how users engage with the catalog, through better landing pages, navigation structure, or personalized entry points.
Fix mobile conversion performance
The 5× gap between desktop and mobile warrants a dedicated UX audit. Simplified checkout flows, faster load times, and mobile-specific A/B tests are the highest-priority levers.
Reallocate acquisition budget toward high-intent channels
Shifting investment from Social (0.03% conversion, 65% bounce) toward Referral and Paid Search could improve overall conversion rates without requiring any product changes.
Tools Used
Google BigQuery was used to query the Google Analytics Sample dataset at scale. Data was then processed in Python with Pandas and NumPy, and visualized with Matplotlib.