Problem:
The marketing landscape is dynamic, and success depends on finding the most impactful messaging and creatives. Traditional methods often rely on guesswork or past experiences, leading to suboptimal campaign performance and wasted resources.
The companies are interested in answering two questions:
- Would the campaign be successful?
- If the campaign was successful, how much of that success could be attributed to the ads?
Solution:
A/B testing offers a data-driven approach to campaign optimization. It allows for the simultaneous testing of different campaign variations (web page elements, banners, etc.) with different audience segments. This allows us to identify which version resonates best and drives the most significant impact on key business metrics.
Methodology:
- Exploratory Data Analysis
- Conduct A/B Testing using Chi-square Test of Independence
Conclusion: The exploratory data analysis reveals the ad group generated a significant increase in conversions (43% more) compared to the PSA group. Additionally, the chi-square test statistic and its associated p-value (4.51e-11) provide strong evidence to reject the null hypothesis. This indicates a statistically significant relationship between the test group and conversion rates. In other words, the observed uplift in conversions can be attributed to the ad campaign, and not simply due to chance.
Recommendations: Ads need a sweet spot for exposure. Too few and viewers miss them, too many and they get annoyed. Data suggests 64 total ads is ideal. Consider showing them less often, but at better times (like 11am-3pm) to save cost without sacrificing results.
Expand/Collapse
Expand/Collapse
Expand/Collapse
- Python
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Scikit-Learn
The dataset and business case is from Kaggle. The majority of the people will be exposed to ads (the experimental group). And a small portion of people (the control group) would instead see a Public Service Announcement (PSA) (or nothing) in the exact size and place the ad would normally be.
Since no success criteria was provided, an assumption was made for the campaign's success criteria. The campaign is considered successful if conversion rate in ad group is at least 20% higher than psa group.
- dataset has a total of 588101 rows and 6 columns
- each row represents a viewer
- each column displays the viewer's ad experience
- 3 Categorical Variables: test group, converted, most ads day
- 3 Numerical Variables: user id, total ads,most ads hour
- There are no null values
User Id
- There are a total of 588,101 users included in this data set.
Test Group
- The data consists of users from two different test groups.
- The majority of users (564,577) belong to the ad group.
Converted
- There are two conversion categories: "True" for those who purchased the product and "False" for those who didn't.
- The majority of users (approximately 97%) fall under the "False" category, meaning they haven't converted yet.
- This suggests a low conversion rate overall.
Total Ads (watched by a user)
- On average, users watch nearly 25 ads.
- However, the median is only 13 ads, indicating that a small number of users watch a very high number of ads, skewing the average.
- The maximum number of ads watched by a single user is 2,065, further supporting the rightward skew in the data.
Most Ads (watched per) Day
- Data for seven days of the week is included (presumably Monday through Sunday).
- Friday is the day when users watch the most ads on average.
Most Ads (watched per) Hour
- The average number of ads watched per hour (14.5) is close to the median (14), suggesting a relatively balanced distribution of ad watching throughout the day.
total_ads
- LL: -30.5 | UL: 61.5
- Rows of outliers : 52057
- Percent outliers : 8.85%
most_ads_hour
- LL: 0.5 | UL: 28.5
- Rows of outliers : 5536
- Percent outliers : 0.94%
- there's a relatively high number of outliers for the
total_ads
feature. These outliers represent approximately 9% of the total data, which is quite substantial. - In contrast, the most_ads_hour has a much lower proportion of outliers, only around 0.94% of the total data
- the 2 test groups are: ad and psa
- 96% of viewers belong to the Ad group.
- Around 14,000 viewers purchased the product after seeing the Ad.
- Only 420 viewers purchased from the PSA group.
Calculate the conversion rate for each group
- Conversion rate in ad group: 2.55%
- Conversion rate in psa group: 1.79%
- % Difference between the conversion rates: 43.1%
Our analysis revealed that viewers who watched the advertisement (Ad group) achieved a conversion rate 43% higher compared to the Public Service Announcement (PSA group).
Success Criteria: We previously defined a success criterion for this campaign: the Ad group's conversion rate should be at least 20% higher than the PSA group.
Result: Since the Ad group's conversion rate surpasses the 20% threshold, exceeding it by an impressive 43%, we can confidently conclude that this campaign is a success!
- False indicate the viewers did not purchase the product
- While a significant portion (93.5%) of users saw the ads, the actual purchase rate was low.
- Only 2.5% of viewers converted into paying customers.
Plot for total_ads shows a log normal distribution with most viewers watching a total of 1-5 ads
The violin plot shows the distribution of the total ads watched by viewers in the each test group (ad/psa) and their conversion status (True/False)
- On average, people who made a purchase (converted) after seeing ads watched more ads overall compared to those who didn't buy anything.
- the viewers in ad group watch an average of 64 ads while those in the psa group watch 55 ads on average.
- For both groups, those who did not convert watch less ads in total (11-13 ads).
- 75% of viewers who converted in both ad and psa groups watched less than 112 ads in total.
- Additionally, the conversion rate also gradually decreased as viewers watch over 100 ads in total.
In other words, there seems to be a sweet spot in terms of ad exposure for achieving conversions. Viewers who see too few ads might not be sufficiently informed about the product, while those who are exposed to too many ads might become overwhelmed or tune out altogether. This suggests that it's important to find the right balance between ad exposure and viewer engagement to optimize conversion rates.
- Ad engagement is fairly consistent throughout the weekdays, with Fridays seeing a slight bump (15.7%) and Tuesdays seeing a slight dip (13.2%).
- Overall, ad viewership falls within a narrow range of 13% to 15.7% across all days.
Conversion rate per day
- Conversion rates remain consistently low across all days of the week
- Monday has the highest rate at a modest 3.3%
- Saturdays see the lowest conversion rate at just 2.1%
- Ad viewership are highest around noon time (11AM-3PM), potentially coinciding with lunchtime breaks.
- Engagement remains low throughout the overnight hours (12:00 AM to 7:00 AM) as viewers are probably asleep.
The ad group generated a significant increase in conversions (43% more) compared to the PDA group. However, is the increase statistically significant or due to chance? To examine if the test group and converted categories are independent (no association) or if there's a statistically significant relationship between them, let's conduct a hypothesis test.
is ideal because we want to assess the association between two categorical variables: being in the test group (exposed to ads) and conversion (purchased a product).
- Null: There IS NO association between
test group
andconversion
- Alternative: There IS association between
test group
andconversion
The chi-square test statistic and its associated p-value (4.51e-11) provide strong evidence to reject the null hypothesis. This indicates a statistically significant relationship between the test group and conversion rates. In other words, the observed uplift in conversions can be attributed to the ad campaign, and not simply due to chance.
The data suggests there is an optimal range of ad exposure for achieving conversions. Seeing too few ads might leave viewers uninformed, while ad overload can lead to disengagement. Ad spending is high and the data shows viewers are most receptive around the 64-ad mark. To optimize the budget, consider reducing ad frequency and potentially achieve similar results with a lower cost. Additionally, tailoring ad frequency based on viewer behavior like airing more during 11AM - 3PM and less in the early mornings could be beneficial.