Using AI to Rapidly Develop New and Improved High-performance Coatings

By Marlene Cardin, Kristin Wallace, and Alexander Nguyen, ProSensus

In the era of big data and digitalization, applying AI in product development is becoming essential for manufactures to stay competitive in their industry. Gone are the days where a limited group of domain experts plan and execute design of experiments (DOE) until a set of target properties are achieved. R&D teams across many industries are transforming their workflows by embracing AI initiatives to intelligently glean information from their historical, experimental, and product databases.

AI models built on past data can be used to reduce experimentation through simulation and achieve optimal properties much faster using numerical optimization. In this article, a high-performance coating dataset will be used along with ProSensus’ FormuSense software. The article will examine the typical steps to effectively utilize R&D datasets including:

Data preparation to ensure that the dataset is appropriate for model building. Common pitfalls such as missing data, insufficient variation, and data anomalies are investigated and resolved.
Model building using multivariate analysis and its intuitive plots that help subject matter experts interpret their dataset highlighting key correlations and trade-offs.
Simulation using the model to predict the outcome of potential experiments without running the actual physical experiment.
Constrained numerical optimization to calculate the ideal formulation and processing conditions required to achieve a targeted quality or set of qualities.

Introduction

Coatings are commonly used in a variety of industries to enhance the performance, durability, and finish of painted surfaces. Coatings manufacturers can address rising raw material costs, unwanted inventory costs, supply chain shortages, and process complexities by reformulating existing products with fewer, cheaper, and more readily available ingredients. However, coating formulations are complex; formulators must carefully select numerous ingredients in hopes of achieving multiple performance targets in robust application conditions, while balancing cost and ingredient availability, and adhering to environmental regulations.

Product Development Data

Typically, a product development dataset includes three types of input (X) variables:

Formulation ratios—how much of each ingredient is used (measured in percentage, fraction, or other quantity, such as kg)
Ingredient properties—physical or chemical properties that characterize each ingredient, such as molecular weight or density
Process conditions—the manufacturing conditions under which the ingredients are combined, such as temperature or mixing speed

The output (Y) variables in a product development dataset include any measured quality or performance properties, such as viscosity or hardness.

While a large number of X and Y variables may exist for one product development dataset, there are typically much fewer degrees of freedom due to the underlying chemistry. In other words, not every X variable can move independently of all other X variables, and not every Y can move independently of all other Y variables.

Predictive Modeling

Owing to these chemistry complexities, product development activities that follow a conventional approach (involving trial-and-error or design of experiments (DOEs) with numerous physical experiments), tend to become iterative and resource-intensive. By contrast, building a predictive model on historical formulation data, interpreting the model, and performing simulations (applying the model to predict the outcome of new formulations) can drastically reduce the number of required physical experiments and, ultimately, accelerate the time required to develop a new or reformulated product.

PLS for Product Development

The AI framework for a predictive model should be carefully selected. PLS (partial least squares) is well suited to product development applications because results are visualized with intuitive plots (explainable AI), and the models can be inverted in a constrained optimization framework. In addition, models can be built on both large and small datasets, and missing data and correlated measurements are handled.

The ability to interpret PLS models is a significant advantage over black-box methods because formulators can:

build trust in the model by verifying existing domain knowledge;
uncover new learnings from the intuitive plots; and
use the identified correlations to more intelligently and efficiently plan physical experiments.

Constrained Optimization

The ability to invert a model in a constrained optimization framework represents a further advantage of selecting PLS for product development applications. In this framework, the PLS model is used in conjunction with a relevant objective function as well as bounds and constraints to determine the required ingredient and process combinations that will achieve specified performance property targets.

Examples of typical constraints for a product development optimization problem include:

availability of each ingredient
upper and lower bounds for how much of each ingredient can be used
limits on the number of ingredients used per formulation

The optimization objective function for a product development application typically includes terms to:

minimize overall formulation cost
minimize deviation from performance property targets
minimize extrapolation from historical design space

Coatings Case Study

The goal of this project was to reformulate existing high-performance coatings to use fewer ingredients while maintaining existing performance property (i.e., quality). The available data included:

450 historical formulations
81 unique ingredients from 5 ingredient classes (resin, solvent, catalyst, additives, isocyanates)
116 ingredient properties (not all available for each ingredient)
6 process conditions (such as ambient conditions and film thickness)
4 performance (quality) properties, including viscosity, hardness, glass-transition temperature, and gel fraction

Assemble Dataset

An important first step is to assemble the dataset. The goal is to evaluate the suitability of the structured data for predictive modeling and to calculate advanced input features such as mixture properties.

Examples of data assembly tasks include:

correcting common anomalies (such as outliers, inconsistent measurement units, inconsistent nomenclature, etc.)
calculating ingredient class use (summation of formulation ratios for all ingredients in a single ingredient class)
assessing variation in ingredient properties, ingredient use, process conditions, and quality properties
evaluating the amount of missing data

Mixture Properties

Mixture properties are a key concept of the PLS model structure. Mixture properties are calculated by combining ingredient properties and formulation ratios using appropriate mixing rules.

When mixture properties are available and sufficiently characterize the ingredients, formulation ratios can be excluded from a model. This variation of the model structure is very powerful, as it allows for new ingredients to be considered in future formulations, so long as the ingredient properties are known.

Mixture properties can be calculated per ingredient class or globally (across all ingredient classes). This flexibility is helpful in cases where some ingredient classes have more missing data than others, when the ingredient property measurements differ between ingredient classes, or when the formulator only wishes to consider new ingredients in specific ingredient classes.

In this dataset, 4 of the 5 ingredient classes contain ingredient properties; therefore, mixture properties were calculated per class, using a linear weighted-average mixing rule. This resulted in 58 new input (X) variables that contained sufficient variation for use in the model.

Build & Interpret PLS Model

With the assembled dataset, the next step is to build the PLS model. The PLS model structure for this dataset is shown in Figure 1.

FIGURE 1 PLS model structure.

Continue reading in the May-June digital issue of CoatingsTech

Compliance Date for National Aerosol Rule Moved to January 2027

PHMSA Issues Notices of Proposed Rulemaking Aimed at Reducing Regulatory Burdens

California’s DTSC Proposes to Add Microplastics to Candidate Chemicals List of Safer Consumer Products Program

Compliance Date for National Aerosol Rule Moved to January 2027

PHMSA Issues Notices of Proposed Rulemaking Aimed at Reducing Regulatory Burdens

California’s DTSC Proposes to Add Microplastics to Candidate Chemicals List of Safer Consumer Products Program

Beckers Group and Anodyne Chemistries Collaborate

Covestro Expands Coatings and Adhesives Lab in Shanghai

Stahl Inaugurates Customer Center of Excellence in Mexico

David Fairhurst Presented with ACA’s 2025 CoatingsTech Best Paper Award

Follow ACA as it Celebrates the Inaugural World Paint and Coatings Week

Dr. Lipiin Sung to Deliver Mattiello Memorial Lecture at ACA’s 2025 CoatingsTech Conference

Upcoming events

ACA Member Webinar: The Status of TSCA Reform

Coatings Trends & Technologies Summit

Asia Pacific Coatings Show