<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Marketing Forward &#187; Adam Sugano</title>
	<atom:link href="http://www.experian.com/blogs/marketing-forward/author/adam-sugano/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.experian.com/blogs/marketing-forward</link>
	<description>Marketing insight and consumer trends from Experian Marketing Services</description>
	<lastBuildDate>Thu, 03 Jan 2013 23:14:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Statistical significance in a testing world</title>
		<link>http://www.experian.com/blogs/marketing-forward/2012/12/27/statistical-significance-in-a-testing-world/</link>
		<comments>http://www.experian.com/blogs/marketing-forward/2012/12/27/statistical-significance-in-a-testing-world/#comments</comments>
		<pubDate>Thu, 27 Dec 2012 06:00:43 +0000</pubDate>
		<dc:creator>Adam Sugano</dc:creator>
				<category><![CDATA[Digital Marketing]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[email marketing]]></category>
		<category><![CDATA[Experian CheetahMail]]></category>

		<guid isPermaLink="false">http://www.experian.com/blogs/marketing-forward/?p=6901</guid>
		<description><![CDATA[Here we examine why understanding the mechanics that lie beneath the phrase ‘statistically significant’ is essential when conducting tests or defending a hypothesis.]]></description>
			<content:encoded><![CDATA[<p>Statistically significant is a phrase that many people throw around to add a little gravitas to their arguments, but I often wonder how many of these people actually understand what this phrase means. Some of the more frequent explanations I hear are that it means the p-value is small, or it signifies an important outcome to take note of, or it is an outcome that is most likely not occurring by chance. All of these responses are true, but having a fundamental understanding of the mechanics that lie beneath this phrase is essential for anyone responsible for conducting A/B tests, multivariate tests or pretty much any situation where analytics are being leveraged to defend an idea or hypothesis.</p>
<p>Last month, my blog post focused on the concept of <a href="http://www.experian.com/blogs/marketing-forward/2012/11/19/cmstatistical-hypothesis-testing/">hypothesis testing in statistics</a>. Namely, defining what is meant by the null and alternative hypotheses, and why it is important to define these before any data have been collected. This month I will focus on how to interpret the results of your study after your hypotheses have been defined, the data collected and the results analyzed.</p>
<p>Let’s consider the following situation:</p>
<p>You have a friend who claims he has the ability to correctly predict the outcome of a coin toss more than 50% of the time. As a rational person you, of course, are skeptical, but you also realize that the only way to settle this assertion is to put your friend to the test and begin flipping coins. Wisely, however, you remember reading my last blog post on Marketing Forward about the importance of defining your hypotheses upfront before any data has been collected. So, you write the following on a piece of paper:</p>
<p><em>Null Hypothesis: My friend can only correctly predict the outcome of a coin flip 50% of the time</em></p>
<p><em>Alternative Hypothesis: My friend can correctly predict the outcome of a coin flip more than 50% of the time</em></p>
<p>Your friend agrees with your hypotheses and then you two begin a discussion around choosing an alpha level. The alpha level (α) is very important when it comes to statistical significance because it is the dividing line between what will be deemed statistically significant and what will not be deemed statistically significant. In a lot of ways this can make the phrase statistically significant seem rather arbitrary, which is why it is important to always choose your alpha level before any statistics are calculated. A typical alpha level is 5% or 0.05. What this means is that if the result of your experiment could only happen by chance less than 5% of the time, then the result is defined as being statistically significant. Similarly, if an alpha level of 1% or 0.01 is chosen (a stricter test) then statistical significance can only be claimed if the result should only happen by chance alone less than 1% of the time. But because your friend is so confident in his supernatural ability to correctly predict the outcome of a coin flip he is willing to meet that higher burden of proof and allows you to set the alpha level at 1%.</p>
<p>A visual way to understand the process described above would be as follows:</p>
<p><img class="alignnone size-full wp-image-6902" title="significance level" src="http://www.experian.com/blogs/marketing-forward/wp-content/uploads/2012/12/significance-level.jpg" alt="significance level" width="500" height="264" /></p>
<p>We all know there is natural variability involved with flipping a coin, but done repeatedly, we expect to arrive at a proportion of heads or tails that is very close to 50%. However, if your friend can truly predict the outcome of a coin flip at a rate <strong>significantly </strong>better than 50%, we are willing to reject the null hypothesis and conclude he does have some sort of supernatural gift or ability. But, we are only willing to give him this statistically significant designation if he lands in the top 1% (because we chose an α-level of 0.01) of the distribution shown above.</p>
<p>The exact value of this top 1% (rejection region) is dependent upon the number of coin flips your friend must try and predict. Let’s assume you guys both agree on 100 coin flips. In this case, the above graph can be updated with the following numbers.</p>
<p><img class="alignnone size-full wp-image-6903" title="significance level 100 flips" src="http://www.experian.com/blogs/marketing-forward/wp-content/uploads/2012/12/significance-level-100-flips.jpg" alt="significance level 100 flips" width="500" height="265" /></p>
<p>This graph adds two new points of reference. The first is at the center of the distribution where P = 50% (‘P’ here is notation for proportion). This corresponds to what is assumed under the null hypothesis. Remember that in the null hypothesis we are stating that our friend has no predictive ability, and so his chances of predicting correctly should be centered at 50%. The other reference point is P = 62.9%, this percentage represents the 99<sup>th</sup> percentile of the distribution and is the dividing line between a statistically significant result and a result that is not statistically significant.</p>
<p>Well, after 100 flips of the coin your friend records an impressive (but not statistically significant) result of a 60% success rate. In this situation a 60% success rate corresponds to a p-value of .0228 or 2.28%. <strong>The p-value is always defined as the probability of getting a result as extreme or more extreme than what is observed, assuming the null hypothesis is true.</strong> In other words, if your friend’s ability to correctly predict the outcome of a coin toss is only 50%, there is still a 2.28% chance that his 60% success rate happened purely by chance alone. And since this percentage is higher than our alpha level, he does not fall into the rejection region and we fail to reject the null hypothesis. In other words, we do not see enough evidence statistically to believe his claim of supernatural coin flipping prediction ability.</p>
<p>An interesting note to mention, however, is that if the α-level had been set at 0.05 or 5%, the p-value of 2.28% would be less than the threshold required for statistical significance (5%) and we would have concluded that our friend does possess special predictive powers. This highlights the need to always specify an alpha level before proceeding with any statistical calculations as practitioners may be tempted to adjust their original alpha levels after results have been calculated in order to reach a statistically significant result.</p>
<p>To summarize, statistically significant is a phrase that has an inherent meaning and interpretation. But for those without a firm understanding of what is meant by the terms alpha level, p-value and null and alternative hypotheses, this phrase should be used a little less liberally until proper background knowledge is comprehended. Gaining this understanding will aid you when interpreting and defending the results of any statistical tests you have performed.</p>
<div style="height:33px;" class="really_simple_share robots-nocontent snap_nopreview"></div>
		<div style="clear:both;"></div>]]></content:encoded>
			<wfw:commentRss>http://www.experian.com/blogs/marketing-forward/2012/12/27/statistical-significance-in-a-testing-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Statistical hypothesis testing</title>
		<link>http://www.experian.com/blogs/marketing-forward/2012/11/19/cmstatistical-hypothesis-testing/</link>
		<comments>http://www.experian.com/blogs/marketing-forward/2012/11/19/cmstatistical-hypothesis-testing/#comments</comments>
		<pubDate>Mon, 19 Nov 2012 06:00:22 +0000</pubDate>
		<dc:creator>Adam Sugano</dc:creator>
				<category><![CDATA[Digital Marketing]]></category>
		<category><![CDATA[email marketing]]></category>
		<category><![CDATA[Experian CheetahMail]]></category>

		<guid isPermaLink="false">http://www.experian.com/blogs/marketing-forward/?p=6469</guid>
		<description><![CDATA[Understanding hypothesis testing mechanics and how it translates to A/B or multivariate testing will help with the study design and reading results. ]]></description>
			<content:encoded><![CDATA[<p>In a couple of recent blog posts we highlighted some of the <a href="http://www.experian.com/blogs/marketing-forward/2012/09/05/cm-top-ten-email-marketing-ab-testing-rules/">best practices to keep in mind when implementing an A/B test</a>, as well as a <a href="http://www.experian.com/blogs/marketing-forward/2012/10/16/cm-a-common-testing-mistake-ab-testing-for-multiple-factors/">common misuse of A/B testing</a> that too many practitioners unfortunately employ when attempting to research a hypothesis.</p>
<p>In this post we will take a few steps back and discuss at a higher level the reasoning behind statistical hypothesis testing. Having a better understanding of the mechanics of what hypothesis testing is and how it translates directly to your A/B or multivariate testing endeavors will aid you in both the design of your study and the interpretation of your results.</p>
<p>Suppose you currently send out a weekly newsletter to your email subscribers every Friday at 9AM. Your manager is curious if open rates from this newsletter would increase significantly from your current level of 22% if you changed the delivery time to every Friday at 1PM. To address your manager’s question you randomly select 2000 subscribers from your list and email them this week’s newsletter at the adjusted time. After waiting the necessary amount of time for the data to accumulate, you see that the subscribers who received the newsletter at 1PM had an open rate of 24%.</p>
<p>Can you now conclude that emailing all newsletter subscribers at 1PM would significantly increase open rates from the historical rate of 22%? Because the result is based on a sample, there is a possibility that the observed open rate (24%) may have occurred just by the luck of the draw. If your entire subscriber population were actually sent the newsletter at 1PM, how likely is it that your open rate would still be 24% or better?</p>
<p><strong>Hypothesis testing uses data from a sample</strong> (2000 subscribers) <strong>to judge whether or not a statement about a population </strong>(your entire newsletter subscriber list) <strong>may be true</strong>. Many of the questions that researchers ask can also be expressed as questions about which of two statements might be true for a population. In the example just given, we are essentially just asking the question of, ‘Does mailing my newsletter on Friday at 1PM significantly increase my open rate from its current level of 22%?’ This question can be answered with either a ‘no’ or a ‘yes’ and each possible answer makes a specific statement about the situation.</p>
<p>Hypothesis 1: Mailing at 1PM does not change my current open rate of 22%</p>
<p>Hypothesis 2: Mailing at 1PM does significantly increase my current open rate of 22%</p>
<p>In the language of statistics, the two possible answers to the question we just encountered are called the null hypothesis (Hypothesis 1) and the alternative hypothesis (Hypothesis 2). The null hypothesis is a statement that there is nothing happening, or the status quo is intact. In most situations, the research hopes to disprove or reject the null hypothesis. The alternative hypothesis is a statement that something is happening, in most situations this hypothesis is what the research hopes to prove.</p>
<p>The logic of statistical hypothesis testing is similar to the presumed innocent until proven guilty principle of the U.S. judicial system. In hypothesis testing, we assume that the null hypothesis is a possible truth until the sample data conclusively demonstrate otherwise. The ‘something is happening’ hypothesis (alternative hypothesis) is chosen only when the data show us that we can reject the ‘nothing is happening’ hypothesis (null hypothesis).</p>
<p>The hypothesis testing method is a somewhat indirect strategy for making a decision. We are not able to determine the chance that a hypothesis statement is either true or false. We can only assess whether or not the observed data are consistent with an assumption that the null hypothesis is true about the population, within the reasonable bounds of sampling variability. <strong>If the sample data collected were unlikely to materialize just by chance when the null hypothesis is true, we reject the statement made in the null hypothesis.</strong></p>
<p>When we do a hypothesis test, the objective is to decide if the null hypothesis should be rejected in favor of the alternative hypothesis. The process is as follows:</p>
<ul>
<li>Define your null and alternative hypotheses.</li>
<li>Compute the data summary that is used to evaluate the two hypotheses, called the test statistic</li>
<li>Compute the likelihood that we would have observed a test statistic as extreme as what we did, or something more extreme, if the null hypothesis is true, called the p-value</li>
<li>The decision is made to accept the alternative hypothesis if the p-value is smaller than a designated level of significance, denoted by the Greek letter alpha (α), and usually set by researchers at .05, less commonly at .10 or .01. If the p-value &lt; alpha than we have achieved <em>statistical significance</em> (more on this next time)</li>
</ul>
<p>All of this may sound rather academic and unnecessary, but it is always important to formalize what it is that you are testing. If your hypotheses are unclear or undefined then you are essentially saying you don’t know the question that you are seeking an answer to and any kind of data or results that flow from your study become void of meaning.</p>
<div style="height:33px;" class="really_simple_share robots-nocontent snap_nopreview"></div>
		<div style="clear:both;"></div>]]></content:encoded>
			<wfw:commentRss>http://www.experian.com/blogs/marketing-forward/2012/11/19/cmstatistical-hypothesis-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A common testing mistake: A/B testing for multiple factors</title>
		<link>http://www.experian.com/blogs/marketing-forward/2012/10/16/cm-a-common-testing-mistake-ab-testing-for-multiple-factors/</link>
		<comments>http://www.experian.com/blogs/marketing-forward/2012/10/16/cm-a-common-testing-mistake-ab-testing-for-multiple-factors/#comments</comments>
		<pubDate>Tue, 16 Oct 2012 06:00:18 +0000</pubDate>
		<dc:creator>Adam Sugano</dc:creator>
				<category><![CDATA[Digital Marketing]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[email marketing]]></category>
		<category><![CDATA[Experian CheetahMail]]></category>

		<guid isPermaLink="false">http://www.experian.com/blogs/marketing-forward/?p=6227</guid>
		<description><![CDATA[Here, we’ll demonstrate one of the more common misuses of A/B testing and introduce a better option when testing with multiple factors. ]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://www.experian.com/blogs/marketing-forward/2012/09/05/cm-top-ten-email-marketing-ab-testing-rules/">previous blog post</a>, we highlighted some key rules to consider when implementing a valid A/B test. The strength of A/B testing lies in its simplicity and in its ability to identify changes in performance across a singular factor. However, when a testing design becomes complicated by more than one factor, new methods need to be applied.</p>
<p>Marketers often still try and fit the A/B testing mold into situations where it is not well suited. Here, we will demonstrate one of the more common misuses of A/B testing and introduce a better option when testing with multiple factors.</p>
<p><strong>Sequential factor testing</strong></p>
<p>A common mistake when implementing A/B testing is when sequential A/B tests are performed in an effort to arrive at optimal levels for multiple factors. These experiments often start with the standard or status quo settings of the key factors to be tested. The levels of the factor that is believed to be the most responsible for performance are tested first, while the other factor levels remain constant. After the responses have been gathered and the optimal level for the first factor is determined, the factor regarded as the second most influential is tested next, with the ‘optimal’ first level factor remaining fixed for the rest of the experiment. This process continues to repeat itself until each factor level has been individually tested.</p>
<p>To better illustrate this flawed process, consider the following example with 2 factors, each with just 2 levels (the simplest case possible). Suppose an organization wants to determine the best image and ad copy to put in an email with the click-to-open ratio being the metric to maximize. We will denote the different images as I<sub>1</sub> and I<sub>2</sub> and the different ad copy as C<sub>1</sub> and C<sub>2</sub>. In their first email blast to 20,000 customers, the company decides to send half of these customers an email with the combination (I<sub>1</sub>, C<sub>1</sub>) and the other half with the combination of (I<sub>1</sub>, C<sub>2</sub>). The results are as follows:</p>
<p style="padding-left: 30px;">Click to Open %</p>
<p style="padding-left: 30px;">(I<sub>1</sub>, C<sub>1</sub>) = 7.5%</p>
<p style="padding-left: 30px;">(I<sub>1</sub>, C<sub>2</sub>) = 8.5%</p>
<p>Based on these results the company believes that Copy 2 is the preferred ad copy and fixes the next email blast to 10,000 more customers at this level so that the next test in the sequence will only vary the image level not already tested.</p>
<p style="padding-left: 30px;">(I<sub>2</sub>, C<sub>2</sub>) = 9.5%</p>
<p>Seeing these results, the company decides that (I<sub>2</sub>, C<sub>2</sub>) is the optimal combination in terms of being able to generate the highest click to open ratio.</p>
<p><strong>The problem is that the company may have missed out on finding the global optimum by not testing the fourth combination of (I<sub>2</sub>, C<sub>1</sub>) which may have yielded an even greater click to open ratio than 9.5%. </strong></p>
<blockquote><p>A/B tests assess one level of one factor versus the control group, but cannot measure the interaction effect across factors.</p></blockquote>
<p>The reason why this method of sequentially testing one factor at a time fails to find the optimal factor levels is that an interaction effect exists between factors 1 and 2. Meaning, the factor effects are not additive, but rather the combinations among different factors and their levels produce an additional effect (interaction) when used simultaneously. By not being able to capture interaction effects, this sequential approach may miss the optimum altogether.</p>
<p>In situations such as these, it is more appropriate to perform a multivariate test (factorial test to be exact), where all factors are changed together and all combinations are accounted for. There is a wide array of different kinds of multivariate tests available, but when the number of factors and the number of factor levels to be tested are limited, the full factorial approach is the best option, as it retains the most amount of information about the factors.</p>
<p>The example here of 2 factors each with 2 levels is the simplest case, but the same reasoning applies when testing with more than 2 factors or more than 2 levels so think ahead and plan your testing strategies with the care they deserve. If you don’t, you may end up drawing the wrong conclusion about what approaches work best for your company’s marketing efforts.</p>
<p>For more information on testing, <a href="http://www.experian.com/cheetahmail/strategic-services.html?intcmp=emsblog">Experian CheetahMail’s strategic services team</a> can assist you in choosing the best testing approach given your organization’s unique marketing challenges.</p>
<div style="height:33px;" class="really_simple_share robots-nocontent snap_nopreview"></div>
		<div style="clear:both;"></div>]]></content:encoded>
			<wfw:commentRss>http://www.experian.com/blogs/marketing-forward/2012/10/16/cm-a-common-testing-mistake-ab-testing-for-multiple-factors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>