By Marlene Cardin, Kristin Wallace, and Alexander Nguyen, ProSensus
In the era of big data and digitalization, applying AI in product development is becoming essential for manufactures to stay competitive in their industry. Gone are the days where a limited group of domain experts plan and execute design of experiments (DOE) until a set of target properties are achieved. R&D teams across many industries are transforming their workflows by embracing AI initiatives to intelligently glean information from their historical, experimental, and product databases.
AI models built on past data can be used to reduce experimentation through simulation and achieve optimal properties much faster using numerical optimization. In this article, a high-performance coating dataset will be used along with ProSensus’ FormuSense software. The article will examine the typical steps to effectively utilize R&D datasets including:
- Data preparation to ensure that the dataset is appropriate for model building. Common pitfalls such as missing data, insufficient variation, and data anomalies are investigated and resolved.
- Model building using multivariate analysis and its intuitive plots that help subject matter experts interpret their dataset highlighting key correlations and trade-offs.
- Simulation using the model to predict the outcome of potential experiments without running the actual physical experiment.
- Constrained numerical optimization to calculate the ideal formulation and processing conditions required to achieve a targeted quality or set of qualities.
Introduction
Coatings are commonly used in a variety of industries to enhance the performance, durability, and finish of painted surfaces. Coatings manufacturers can address rising raw material costs, unwanted inventory costs, supply chain shortages, and process complexities by reformulating existing products with fewer, cheaper, and more readily available ingredients. However, coating formulations are complex; formulators must carefully select numerous ingredients in hopes of achieving multiple performance targets in robust application conditions, while balancing cost and ingredient availability, and adhering to environmental regulations.
Product Development Data
Typically, a product development dataset includes three types of input (X) variables:
- Formulation ratios—how much of each ingredient is used (measured in percentage, fraction, or other quantity, such as kg)
- Ingredient properties—physical or chemical properties that characterize each ingredient, such as molecular weight or density
- Process conditions—the manufacturing conditions under which the ingredients are combined, such as temperature or mixing speed
The output (Y) variables in a product development dataset include any measured quality or performance properties, such as viscosity or hardness.
While a large number of X and Y variables may exist for one product development dataset, there are typically much fewer degrees of freedom due to the underlying chemistry. In other words, not every X variable can move independently of all other X variables, and not every Y can move independently of all other Y variables.
Predictive Modeling
Owing to these chemistry complexities, product development activities that follow a conventional approach (involving trial-and-error or design of experiments (DOEs) with numerous physical experiments), tend to become iterative and resource-intensive. By contrast, building a predictive model on historical formulation data, interpreting the model, and performing simulations (applying the model to predict the outcome of new formulations) can drastically reduce the number of required physical experiments and, ultimately, accelerate the time required to develop a new or reformulated product.
PLS for Product Development
The AI framework for a predictive model should be carefully selected. PLS (partial least squares) is well suited to product development applications because results are visualized with intuitive plots (explainable AI), and the models can be inverted in a constrained optimization framework. In addition, models can be built on both large and small datasets, and missing data and correlated measurements are handled.
The ability to interpret PLS models is a significant advantage over black-box methods because formulators can:
- build trust in the model by verifying existing domain knowledge;
- uncover new learnings from the intuitive plots; and
- use the identified correlations to more intelligently and efficiently plan physical experiments.
Constrained Optimization
The ability to invert a model in a constrained optimization framework represents a further advantage of selecting PLS for product development applications. In this framework, the PLS model is used in conjunction with a relevant objective function as well as bounds and constraints to determine the required ingredient and process combinations that will achieve specified performance property targets.
Examples of typical constraints for a product development optimization problem include:
- availability of each ingredient
- upper and lower bounds for how much of each ingredient can be used
- limits on the number of ingredients used per formulation
The optimization objective function for a product development application typically includes terms to:
- minimize overall formulation cost
- minimize deviation from performance property targets
- minimize extrapolation from historical design space
Coatings Case Study
The goal of this project was to reformulate existing high-performance coatings to use fewer ingredients while maintaining existing performance property (i.e., quality). The available data included:
- 450 historical formulations
- 81 unique ingredients from 5 ingredient classes (resin, solvent, catalyst, additives, isocyanates)
- 116 ingredient properties (not all available for each ingredient)
- 6 process conditions (such as ambient conditions and film thickness)
- 4 performance (quality) properties, including viscosity, hardness, glass-transition temperature, and gel fraction
Assemble Dataset
An important first step is to assemble the dataset. The goal is to evaluate the suitability of the structured data for predictive modeling and to calculate advanced input features such as mixture properties.
Examples of data assembly tasks include:
- correcting common anomalies (such as outliers, inconsistent measurement units, inconsistent nomenclature, etc.)
- calculating ingredient class use (summation of formulation ratios for all ingredients in a single ingredient class)
- assessing variation in ingredient properties, ingredient use, process conditions, and quality properties
- evaluating the amount of missing data
Mixture Properties
Mixture properties are a key concept of the PLS model structure. Mixture properties are calculated by combining ingredient properties and formulation ratios using appropriate mixing rules.
When mixture properties are available and sufficiently characterize the ingredients, formulation ratios can be excluded from a model. This variation of the model structure is very powerful, as it allows for new ingredients to be considered in future formulations, so long as the ingredient properties are known.
Mixture properties can be calculated per ingredient class or globally (across all ingredient classes). This flexibility is helpful in cases where some ingredient classes have more missing data than others, when the ingredient property measurements differ between ingredient classes, or when the formulator only wishes to consider new ingredients in specific ingredient classes.
In this dataset, 4 of the 5 ingredient classes contain ingredient properties; therefore, mixture properties were calculated per class, using a linear weighted-average mixing rule. This resulted in 58 new input (X) variables that contained sufficient variation for use in the model.
Build & Interpret PLS Model
With the assembled dataset, the next step is to build the PLS model. The PLS model structure for this dataset is shown in Figure 1.
FIGURE 1 PLS model structure.
Continue reading in the May-June digital issue of CoatingsTech