Q-YIELD FAQ
My results appear very sensitive to the
problem definition. Is something wrong?
There
are a number of reasons why this could be happening:
-
You have a data set containing
highly correlated variables, one of which appears in
the result.
-
If two variables are very highly
correlated, then our choice between them in formulating
a rule is essentially arbitrary. A small change in conditions
may cause one to be favored over another. It is always
worthwhile checking the correlation listing in the formatted
report of performing or using the linear correlation
tool (menu item Create/Linear Correlation) to check
whether this is indeed the case.
-
You have a small data set, a
large number of variables.
Although Q-YIELD normally warns about
the possibility of an accidental correlation when importing
data sets with a small number of records compared to
the number of variables, this warning is invariably
disregarded. (And if that is all the data you have,
you can do little else).
If the data set contains enough variables,
and these variables have random variations, then there
is a possibility of finding an accidental correlation
with the target problem. The chances of this increase
as the number of records decreases and the number of
variables grows. Although Q-YIELD performs some statistical
tests to try to eliminate accidental correlations, these
tests may fail when the data set size is small and the
number of variables is large. This effect may be exacerbated
by (3).
-
The problem is poorly defined.
A good problem definition discriminates
between exceptional conditions and normal conditions.
If the problem is not set up so that there is no clear
distinction between these cases, then Q-YIELD may be
trying to explain away a normal process variation. Under
these circumstances its susceptibility to the effect
of (2) will increase. What do you do if the abnormal
events are occurring at the edge of a normal process
variation? For example, suppose you have a graph something
like this:
This
appears to be a Poisson distribution at around 50 Failures
(which we would expect) with a second distribution superimposed
upon it. The answer is to make use of filters to
exclude those records which are in a so-called gray
area, where they might belong to either the Poisson
distribution or the abnormal distribution centered around
280 Failures. We could therefore use a problem definition
of (say) Failures > 200 but at the same time
use a filter to exclude all the records where the number
of Failures is between 150 and 250. The problem definition
then clearly distinguishes between records in each of the
two distributions, and Q-YIELD should be able to achieve
a more accurate result
Back
to FAQ
|