Reliability Data in Life Test – The difficulty of finding a needle in a haystack

I’m working on a project right now where it is a consumer kitchen appliance that is being life tested. To do so, the company has to do the equivalent of 10-20 years of usage of the product, but in a compressed timeframe (say 4 months). This is the same process that car companies do when they have professionals drive the car in difficult terrain 24/7 for several months up to 100,000 miles or more. The goal is to quickly use the product for its design life, and then see when failures start happening.

I’ve done software architecture on about 20 different life test projects in the last four years, with more than 15 different large OEM’s, in five different industries (Medical Device, Cell Phone, Semiconductor, Aerospace Component, Consumer Electronics). While they are all different, there are certain elements that are always the same.

The hardest thing is to figure out what specifically to measure, how to identify when something is a red flag, and how to identify when something should be called a FAILURE. Once things are identified as failures in a life test, everything gets easy. You plot the failures vs. the simulated years of life where the failure occurred, often using a Weibull plots, and this information helps you predict your product’s quality. If you do this for 10 or 20 units of the same model, then you can more accurately predict how any given product (with this design and production process) will perform during its life.

As I said, the easy part is plotting the stuff after you have a failure. The hard part is figuring out what to call a failure. Especially for a radically new product design, where you don’t have real-world product failure data to look at (for example, BMW has many years of real data for how often their drive trains have failed by customers). Once you have determined what constitutes a failure, you can figure out how to measure for that instance and then much of the measurement and determination of failure can be automated with software and instruments.

Back to the root issue – what is a failure? In the case of the consumer product I’m dealing with now, one idea we are implementing is to have a “20 questions” kind of diagnostic for operators. For example, as operators monitor the products during the four month testing, they may see certain things visually. When that happens, we want the system to track those issues. So, if an operator sees the display go blank, we want them to be able to not only report that the screen went blank, but also take certain steps to see if the screen will come back. We want to track the steps they took, and track if it worked or not. A design engineer can then review this data and decide if the issue constitutes a part failure that will show up on a Weibull plot.

So, in these cases, trying to enumerate all of the possible issues (and questions to ask when those issues arise) is a real challenge. When done right, the software and corresponding work flow process, can successfully find those “needles in the haystack” and feed the info back to R&D, therefore improving product quality and customer satisfaction.