close
close

Optimization of drug solubility in the supercritical CO2 system through numerical simulation based on an artificial intelligence approach

Barnacle Mating Optimizer (BMO)

BMO draws inspiration from barnacle mating behavior, which is used to tune models in this work. These microorganisms are considered promising candidates in this algorithm (in this study, combinations of hyperparameters).21. The BMO process includes two main phases, namely selection and reproduction. For the selection phase, two barnacle parents are selected according to the length of their penis (pl).

The Hardy-Weinberg principle is used by the algorithm to produce offspring during the reproductive phase. If the pl If the father's barnacle falls within the selection range of the parent's barnacle, the father inherits P% the characteristics and the mother inherits (1-p)%. If the father pl If the mutation is outside the range of acceptable mutations, a new generation is created by changing only the maternal characteristics. This approach encourages father exploitation plis within the scope and will be explored if this is not the case22.

The formulation for producing offspring from the parents' mating process is expressed by the following equations:21.22:

$$\:{x}_{i}^{{N}_{n}ew}=p{x}_{barnacl{e}_{d}}^{N}+q{x}_{barnacl {e}_{m}}^{N}$$

In this process, the generation of offspring relies on two random numbers: P And Qboth are in the range of [0, 1]. Here “Barnacled” represents the solution for the father, and BarnaclesM represents the solution for the mother. When barnacles choose barnacles8, it oversteps the cap, resulting in the termination of the usual mating process.

Instead, the algorithm uses a method called “sperm casting,” a term coined in BMO, to create the offspring. This approach facilitates exploration during the mating process22:

$$\:{x}_{i}^{{N}_{n}ew}=rand\left(\right)\times\:{x}_{barnacl{e}_{m}}^{ n}$$

The function Edge() generates a random number within the interval [0, 1].

LASSO regression

LASSO (Least Absolute Shrinkage and Selection Operator) is an advanced statistical technique used in linear regression applications. It was introduced as a method to deal with multicollinearity and feature selection by imposing a penalty on the absolute values ​​of the regression coefficients. This model has gained popularity due to its ability to effectively process high-dimensional datasets and create interpretable and sparse models16.23.

The main goal of LASSO regression is to determine the best linear model by minimizing the sum of squared residuals while reducing the less meaningful coefficients to zero. This promotes the selection of the most relevant features and avoids overfitting, resulting in a more robust and generalizable model.

Let's consider a linear regression problem with N Observations and P Predictors. The model can be represented as follows:

$$\:y={{\upbeta\:}}_{0}+{{\upbeta\:}}_{1}{x}_{1}+{{\upbeta\:}}_{2 }{x}_{2}+\dots\:+{{\upbeta\:}}_{p}{x}_{p}+\:\epsilon\:$$

Where:

  • j represents the dependent variable,

  • \(\:{{\upbeta\:}}_{0}\) shows the section

  • \(\:{x}_{i}\)are the predictors,

  • \(\:{{\upbeta\:}}_{i}\)are the coefficients and

  • \(\:{\upepsilon\:}\) represents the error.

LASSO regression optimizes the following objective function24:

$$\:\text{arg}\underset{{\upbeta\:}}{\text{min}}\left\{{\sum\:}_{i=1}^{n}{\left( {y}_{i}-\left({{\upbeta\:}}_{0}+{\sum\:}_{j=1}^{p}{{\upbeta\:}}_{ j}{x}_{ij}\right)\right)}^{2}+{\uplambda\:}{\sum\:}_{j=1}^{p}\left|{{\upbeta \:}}_{j}\right|\right\}$$

The symbol \(\:\lambda\:\) represents the regularization factor. As \(\:{\uplambda\:}\) Increasingly, the penalty for non-zero coefficients increases, leading to greater shrinkage and feature selection.

Extreme gradient boosting (xgboost)

XGBoost has received wide recognition for its high predictive performance and versatility in a variety of areas. As an ensemble learning technique, XGBoost combines the strengths of gradient boosting and regularization to deliver robust and accurate regression models19.25.

The main goal of XGBoost regression is to build an optimized regression model that can effectively predict continuous numerical values. By using a combination of weak learners (in this study, decision trees), typically decision trees, XGBoost gradually improves its prediction ability through iterative boosting. The aim is to minimize the overall prediction error and provide better results compared to traditional gradient boosting algorithms. A flowchart for the overall process of XGBoost is shown in Fig. 126.

Fig. 1

Polynomial Regression (PR)

The PR method allows modeling nonlinear relationships between the inputs and outputs for complicated tasks. By introducing polynomial terms, this technique can capture complex patterns where linear models fail. In this model description, we explore the key concepts and benefits of polynomial regression27.28. This regression method uses the PR model to fit a polynomial function to the data to approximate the underlying relationship between the variables. Polynomial regression can capture curved and nonlinear trends in the data29.

The PR of order Dis given by12:

$$\:y={{\upbeta\:}}_{0}+{{\upbeta\:}}_{1}{x}_{1}+{{\upbeta\:}}_{2 }{x}_{2}+\dots\:+{{\upbeta\:}}_{p}{x}_{p}+{{\upbeta\:}}_{p+1}{x }_{1}^{2}+{{\upbeta\:}}_{p+2}{x}_{1}{x}_{2}+\dots\:+{{\upbeta\: }}_{p+n}{x}_{p}^{d}+\:\epsilon\:$$

Where D is the PR job, \(\:{x}_{i}^{d}\) represents the D-th power of I-th predictor and \(\:{\beta\:}_{p+1}\) To \(\:{\beta\:}_{p+n}\) are the additional parameters to be estimated.

For the ML modeling and optimization tasks: python Software was used along with machine learning, optimization and plotting libraries.