Home


Q-Yield
What's New?
Case Study
Yield ROI Calc.
Clients
Evaluate
Download

Support
FAQ
Training
Register

Quadrillion
Contact Us
10what?

Resources
Articles
Books
Search
 
 

Q-YIELD™ FAQ

My results appear very sensitive to the problem definition. Is something wrong?

There are a number of reasons why this could be happening:

  1. You have a data set containing highly correlated variables, one of which appears in the result.

  2. If two variables are very highly correlated, then our choice between them in formulating a rule is essentially arbitrary. A small change in conditions may cause one to be favored over another. It is always worthwhile checking the correlation listing in the formatted report of performing or using the linear correlation tool (menu item Create/Linear Correlation) to check whether this is indeed the case.

  3. You have a small data set, a large number of variables.

    Although Q-YIELD normally warns about the possibility of an accidental correlation when importing data sets with a small number of records compared to the number of variables, this warning is invariably disregarded. (And if that is all the data you have, you can do little else).

    If the data set contains enough variables, and these variables have random variations, then there is a possibility of finding an accidental correlation with the target problem. The chances of this increase as the number of records decreases and the number of variables grows. Although Q-YIELD performs some statistical tests to try to eliminate accidental correlations, these tests may fail when the data set size is small and the number of variables is large. This effect may be exacerbated by (3).

  4. The problem is poorly defined.

    A good problem definition discriminates between exceptional conditions and normal conditions. If the problem is not set up so that there is no clear distinction between these cases, then Q-YIELD may be trying to explain away a normal process variation. Under these circumstances its susceptibility to the effect of (2) will increase. What do you do if the abnormal events are occurring at the edge of a normal process variation? For example, suppose you have a graph something like this:



This appears to be a Poisson distribution at around 50 Failures (which we would expect) with a second distribution superimposed upon it. The answer is to make use of filters to exclude those records which are in a so-called gray area, where they might belong to either the Poisson distribution or the abnormal distribution centered around 280 Failures. We could therefore use a problem definition of (say)  Failures > 200 but at the same time use a filter to exclude all the records where the number of Failures is between 150 and 250. The problem definition then clearly distinguishes between records in each of the two distributions, and Q-YIELD should be able to achieve a more accurate result

Back to FAQ

© Quadrillion Corporation 2008