Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR replaces the abstract Hartmann function example with a realistic chemical optimization scenario from the Shields et al. direct arylation study. The rewrite demonstrates transfer learning by showing how historical experimental data from chemical reactions at different temperatures can accelerate optimization at a new target temperature.
Key Changes:
- Replaced synthetic Hartmann function with real chemical synthesis data
- Introduced chemical parameters (solvents, bases, ligands, concentration, temperature)
- Expanded simulation to demonstrate transfer learning across three different target temperatures
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
34e4eb5 to
bf60544
Compare
There was a problem hiding this comment.
please ensure this is stored as Optimized SVG (Inkscape) and is not a raw vanilla svg (size reasons). I can also do it if you like but it makes sense only after everything is finished
| # optimization experiments under certain reaction conditions can be transferred to | ||
| # accelerate optimization under different conditions. Specifically, we investigate a | ||
| # setting where: | ||
| # * we have **historical experimental data** from chemical reactions conducted at |
There was a problem hiding this comment.
sounds a bit like its some arb chemical reactions
this needs to be rephrased to say that its literally the exact same reaction just repeated at different temperatures
| # Imagine you're a chemist in a pharmaceutical company, tasked with optimizing a | ||
| # direct arylation reaction to maximize product yield. This reaction involves | ||
| # combining different chemical components (solvents, bases, ligands) under varying | ||
| # temperature and concentration conditions. Each experiment is expensive and |
There was a problem hiding this comment.
temperature is not optimized so it should be taken out in the list here
| # optimization campaigns at different temperatures in the past. The question arises: | ||
| # can you leverage this **historical data from related conditions** to accelerate | ||
| # optimization at your target temperature? This is where transfer learning becomes | ||
| # invaluable. |
There was a problem hiding this comment.
Thats quite a strong word for this eample. The unfortunate thign about the entire example is: we dotn need TL here as we can simply momdel the temperature as normal parameter, TL has no advantage compared to that
| ] | ||
|
|
||
| result = simulate_scenarios( | ||
| {f"{int(100 * fraction)}": campaign}, |
There was a problem hiding this comment.
would prefer if these labels read % to avoid confusing it with the number of points used as opposed to the fraction of available points used
| ax.legend().set_visible(False) | ||
| else: | ||
| ax.legend( | ||
| title="Training data used", bbox_to_anchor=(1.05, 1), loc="upper left" |
There was a problem hiding this comment.
this is not a good label as it doesnt get across that its the fraction of points
do you need to et a label at all? Sinc you rename the hue from Sccenario to % of data used thats already the perfect label (im not sure if its shown byd efault if you make this call without title= but if NOT then the renamings are also pointsless just saying
| final_results = results.groupby("% of data used")["yield_CumBest"].max() | ||
| baseline = final_results.loc["0"] | ||
| best_transfer = final_results.drop("0").max() | ||
| improvement = best_transfer - baseline |
There was a problem hiding this comment.
why is this here its not used anywhere
please do not add AI code that you don't check
| # 2. The magnitude of improvement varies by condition, reflecting differences in how | ||
| # well knowledge transfers between specific temperature pairs. | ||
| # | ||
| # 2. Even small amounts of training data can yield significant acceleration, making thi |
There was a problem hiding this comment.
| # 2. Even small amounts of training data can yield significant acceleration, making thi | |
| # 2. Even small amounts of training data can yield significant acceleration, making this |
|
|
||
| # The results reveal several key insights: | ||
| # | ||
| # 1. Transfer learning provides substantial improvements across all temperature |
There was a problem hiding this comment.
Most comments here are minor, but one big topic about the choice of example here is:
You've chosen a usecase where TL is not needed. We can simply model the temperature as explicit numerical parameter because thats the perfect quanitification of the different "contexts" here
Now dont get em wrong a study like this still makes sense, especially to gauge the TL results with the explicit-TL results (and baseline). But this is not done here in this example (and Im not sure it should as its an example and not a benchmark). If the results look OK then we could add them as second row of results in the plot, this would need more comment and explanation tho.
But without this context many readers might read this example an will think why I no model with normal simple parameter with no answers / explanations provided here anywhere
Co-authored-by: Martin Fitzner <martin.fitzner@merckgroup.com>
Co-authored-by: Martin Fitzner <martin.fitzner@merckgroup.com>
Co-authored-by: Martin Fitzner <martin.fitzner@merckgroup.com>
1875c34 to
5aa03e2
Compare
As discussed quite some time ago, I wanted to rewrite the transfer learning example such that we have a more meaningful one. This PR thus introduces the classical "Shields-Temperature-TL" example we have used in several other places.
I've tried to follow the same style as the other more recent examples (in particular paret o and laser tuning).
Link to compiled version: https://avhopp.github.io/baybe_dev/latest/examples/Transfer_Learning/basic_transfer_learning.html
TODO: Replace temperature by labs to make it more natural