A multidisciplinary team led by Princess Nourah Bint Abdulrahman University has developed an AI-driven ensemble framework that integrates Gaussian process regression, Bayesian ridge regression, and K-nearest neighbors under AdaBoost to predict digitoxin solubility and solvent density in supercritical CO2. Using the Sailfish Optimizer for hyperparameter tuning, they achieve sub-10% average relative deviations, enhancing green pharmaceutical nanonization.

Key points

  • AdaBoost ensemble combines GPR, BRR, and KNN to predict digitoxin solubility with AARD% of 7.74 and CO2 density with AARD% of 2.76.
  • Sailfish Optimizer tunes hyperparameters automatically, optimizing learning rate, estimator count, and kernel settings for minimal prediction error.
  • Model uses temperature and pressure as inputs to predict both drug solubility and solvent density in supercritical CO2, supporting green pharmaceutical processing.

Why it matters: This AI-powered predictive approach enables efficient, precise drug solubility estimation, accelerating green pharmaceutical manufacturing and reducing costly experimental trials.

Q&A

  • What is supercritical CO2 and why is it used?
  • How does AdaBoost ensemble learning work?
  • What role does the Sailfish Optimizer play?
  • What does AARD% indicate in model performance?
Copy link
Facebook X LinkedIn WhatsApp
Share post via...


Read full article

Supercritical CO2 in Pharmaceutical Processing

Supercritical carbon dioxide is a state of carbon dioxide that occurs when it is held at temperatures and pressures above its critical point. In this state, CO2 exhibits properties of both a gas and a liquid, with low viscosity and high diffusivity like a gas, but with solvent properties like a liquid. These unique characteristics make it an attractive green solvent for pharmaceutical applications, where it can dissolve certain drug substances without the need for harmful organic solvents. Researchers use supercritical CO2 to enhance drug formulation processes by reducing particle size, improving solubility, and facilitating micronization of active pharmaceutical ingredients.

In pharmaceutical processing, supercritical CO2 can penetrate porous solids and extract or precipitate drug molecules in a controlled manner. This process often leads to powders with uniform particle sizes and improved bioavailability. By adjusting temperature and pressure parameters, scientists can fine-tune solubility, density, and diffusion characteristics to achieve desired outcomes. The ability to precisely predict how drugs will dissolve under varying conditions is critical for scaling production and ensuring consistent performance. As a result, combining supercritical fluid technology with data-driven modeling approaches accelerates the design of efficient and sustainable manufacturing methods for next-generation therapeutics.

Ensemble Machine Learning Methods

Ensemble learning refers to a set of techniques in machine learning where multiple models, called base learners, are combined to improve overall prediction accuracy and robustness. Instead of relying on a single model, ensemble methods aggregate the outputs of different learners, which can have complementary strengths and weaknesses. Common ensemble approaches include bagging, boosting, and stacking. AdaBoost is a boosting algorithm that sequentially trains base learners, giving more weight to observations that previous models mispredicted. This iterative focus on difficult examples often results in substantial gains in predictive performance compared to individual models.

AdaBoost works by assigning initial equal weights to all data points and then fitting a simple base learner, such as a decision tree or regression model. After each iteration, the algorithm increases the weights of mispredicted points, encouraging the next learner to concentrate on those harder cases. Final predictions are computed as a weighted sum of all base learners’ outputs, where the weights reflect each learner’s accuracy. By integrating diverse learners such as Gaussian process regression, Bayesian ridge regression, and k-nearest neighbors, ensemble frameworks can capture complex patterns in data that single models might overlook.

To optimize the performance of ensemble models, hyperparameters—settings that control model complexity and behavior—must be carefully selected. The Sailfish Optimizer is a nature-inspired metaheuristic algorithm that mimics the hunting strategies of sailfish. It operates by iteratively adjusting candidate parameter sets through simulated cooperative maneuvers, seeking to minimize an objective function such as prediction error. By leveraging this optimizer, researchers can automatically tune hyperparameters like the number of base estimators, learning rates, and kernel parameters, achieving precise fits to training data while avoiding overfitting. This combination of ensemble learning and metaheuristic optimization forms a powerful toolkit for predicting drug solubility and other critical properties in pharmaceutical research.

Computational machine learning estimation of digitoxin solubility in supercritical solvent at different temperatures utilizing ensemble methods