Discrete
choice analysis has become the most fundamental way of predicting consumer
behaviour. A recent analysis of the typical sample size used by marketing
research firms reveals that the results from most marketing research discrete
choice analyses are worthless, because they are inaccurate.
Discrete choice
analysis has surpassed its predecessor, conjoint analysis, as the main
tool for modeling consumer choice. Unfortunately, conducting a discrete
choice experiment is expensive. Participants must answer many questions,
sometimes repetitive in nature. In some cases, participants may be asked
up to 64 or more questions in order to assess how they will react to future
changes in the consumer market. The more questions that are asked, the
higher the cost, and the less participants will be able to provide accurate
answers, thus negatively impacting the results.
In order to overcome
this problem, Markets Researchers use a block design, in which the list
of let's say 64 questions is broken down into blocks, so that Group A
gets a portion of the questions, and Group B gets another portion of the
questions, and Group C gets the final portion of the questions. In some
cases, some of the questions asked of all three groups overlap, but in
some cases they do not. The assumption is that and answers to questions
asked to group A and B but not to Group C will be identical to answers
that Group C would have provided if they were asked. Similar, those questions
asked to Groups B and C, but not A are assumed to be similar to what Group
A would have answered if they were asked similar questions. As is evident,
using a block design can reduce the number of questions each participant
is subjected to, and thus they will be able to provide more accurate answers
than would be the case if they were subjected to the tiring ordeal of
answering all of them.
Since even using
a block design subjects participants to as many questions as the market
research deems acceptable, the cost of conducting such experiments is
high. In order to reduce the cost and make bearing the cost more reasonable
to the client, market research companies typically use rather small sample
sizes. In most cases these experiments use between 200 and 300 individuals.
The assumption is that the learning how 200 to 300 individuals behave
can be generalized to the entire population of interest. Unfortunately
this assumption is wrong. In most cases, discrete choice experiments using
such small sample sizes are a waste of time for the researcher and a waste
of money for the client. What is actually learned from such experiments
is so froth with potentially poor estimates that in some cases if a firm
follows the advice based on such studies, they can do more harm than good.
This is especially true for the client who ends up paying between $100,000
to $200,000 for potentially worthless information.
Given that it is
not advisable to use small samples, what sample should one use? One does
not want too large of a sample, because then the client is wasting money.
On the other hand, you don't want to select too small of a sample otherwise
you will get bias estimates. Here is where we come in. At BRG we can help
determine what the sample size should be so you don't pay for redundant
information, while also not paying for bias information.
|
|
In order to support the argument that
discrete choice experiments with small sample sizes and a block
design is practically useless, here is the result of a monte carlo
experiment. In this experiment, a sample of 50 000 respondents is
created using a mixed logit discrete choice model.A monte carlo
simulation is an ideal to verify the bias nature of a small sample
design, because in a monte carlo experiment, we know the real answer,
because we create the data. in real life one never knows what the
real answer is when conducing researching. All we can do is try
to get close enough to the real answer, plus or minus some degree
of error. In the following monte carlo simulation, you will see
that the margin of error associated with a small sample design is
huge. This would be even larger if the number of independent variables
was greater, and even larger if a block design was used (this we
will verify shortly with another monte carlo simulation).
This monte carlo experiment entails
creating data using a mixed logit discrete choice model. This model
is similar to a hierarchical bayes model (used by most marketing
research firms). Using this model, we create data in which respondents
have 3 possible choices, and 3 independent variables (that are binary).
In essence this is a very simplistic design, far simpler than any
undertaken in marketing research. After creating the data of 50,000
individuals, each of whom are asked to select between Product A,
B, and C, given different combinations of the independent variables,
X1, X2, and X3. This results in each individual having to answer
8 questions in total.
From this sample of 50,000 individuals,
which represents the total population, random samples are selected.
These random samples are of different sizes, 100 people, 250 people,
500 people, 1000 people, 2000 people, and finally 50, 000 people.
In all, 10 random samples are selected for each (10 samples of 100,
10 samples of 250, etc). To each of these samples, both a mixed
logit and hierarchical bayes mixed logit are fit to the data. The
true parameter estimates are as follows: X1 = 0.5, X2= 0.5, X3 =
0.5. It should also be known that X1 and X2 and X3 are normally
distributed.
Let's note the results. There are three
graphs. the first corresponds to the first independent variable,
X1, the second to X2, and the third to X3. In each graph the Y axis
is the beta estimate. The X axis is the sample size. For each sample
size there are two beta estimates (therefore two dots on the graph),
one estimate resulting from a mixed logit and the other from a hierarchical
bayes model. The results from both models are close together for
each sample size, regardless of the sample size, that either method
could be used. For each sample size you will see a high and a low
estimate. This is the range of the standard error, with the actual
estimate lying in the middle of the range.
In Figures 1 to 3, you will see that
small samples of less than 500 are so erratic in their ability to
find the true parameters. Only when the random sample size is 500
or greater do the estimates start to cluster around the true parameters.
What is not shown in these graphs is that we have conducted other
research to show that when the independent variables are not normally
distributed, the sample size needed is even larger. We are presently
working on a monte carlo experiment using a real life marketing
research project, with many choices, many variables, and a block
design. The results from this study will provide further evidence
that, in most cases, small sample sizes should be abandoned.
Figure 1:
.
Figure 2

Figure 3

|
|
|