Initial website optimization & first post added (see: https://github.…#6
Initial website optimization & first post added (see: https://github.…#6kamilpytlak wants to merge 8 commits intomainfrom
Conversation
❌ Deploy Preview for ttscienceblog failed.
|
…corrections in the post about t-SNE
salatak
left a comment
There was a problem hiding this comment.
I like the way the user is guided through the topic, but I would expect a clearer indication of the benefits that come from using the proposed solution.
| Title: "Hi there 👋" | ||
| Content: "Welcome to the TT Science Development Blog! Here, you'll find insights, tutorials, and updates about our work in clinical data science, including R, Python, and optimization techniques. Explore our posts, check out our projects on GitHub, or learn more about our team." | ||
| Title: "Hey there, data enthusiasts! 👋" | ||
| Content: "Here at TT Science, we are a dynamic team of individuals — like trees in a [random forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) 🌳🌲 — each bringing our own unique strengths and perspectives to the table. While we share a common goal of advancing clinical data science, it’s our diversity that makes us truly special! <br><br> 🌈 Join us on this exciting journey 🚀 as we share insights, tutorials, and updates on our innovative projects involving R, Python, and biostatistics. 💻🔍 Explore our posts, dive into our GitHub projects 📂, and get to know the passionate minds behind our work. Together, we’re harnessing the power of data to create impactful solutions in healthcare. 💡💖 Welcome to our forest of knowledge! 🌳🌟" |
There was a problem hiding this comment.
The intro sounds pretty good, but we're overusing emoticons. It makes it look a bit childish and overwhelming at the same time. I think it's better to aim for a 'casual elegant' tone in writing. I like the second part of the intro the most, and I'd consider tweaking the first part, although it's not essential.
There was a problem hiding this comment.
Also we have already "about us" + "authors" - plenty of places where we talk "about us". My proposition: let's revert to old version where this was just a friendly waving hand.
| We’re writing here as practitioners in clinical statistics, data science, and machine learning. Our team consists of Biostatisticians, Data Scientists, Bioinformaticians and Developers. We’re not marketers—we’re the ones in the trenches, working with data every day to solve real-world problems. We support researchers, scientists, and hospitals in medical and life sciences research projects by delivering relevant documentation, software, and statistical reports, utilizing programming languages such as R and Python. | ||
|
|
||
| “Data for good” is in our DNA. We’re passionate about doing meaningful work, which includes engaging in open-source projects. We don’t just crunch numbers; we aim to make a positive impact through our expertise. Our main goals include supporting investigators in the planning, execution, and finalization of clinical trials—covering tasks such as sample size calculations, study design, developing clinical trial protocols in compliance with EMA and FDA guidelines, creating Statistical Analysis Plans (SAPs), validating data, and generating statistical reports — as well as developing dedicated software, including applications such as to facilitate the analysis and validation of medical data. | ||
|
|
||
| We believe in science and want innovative scientists to tackle the questions and challenges of today’s world. We want scientists to always be able to rely on data—because we all rely on them. Our partnership approach is rooted in the scientific method—we combine data, our clients’ expertise, and our competencies to achieve the best results. | ||
|
|
||
| Our mission is to share knowledge with the broader community engaged in medical IT software development, clinical research, and the analysis of medical and biological data, as well as those just beginning their careers in these fields. Through our published content, we aim to showcase solutions for software development and provide guidance to researchers, particularly in the areas of statistics, machine learning, and the regulatory frameworks of the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA). | ||
|
|
||
| If you want to check our official page, visit Transition Technologies Science—you’ll learn that we provide software, data science services, and offer analytical support at every stage of clinical trials. | ||
|
|
||
| Reach out to us to explore the world of data - our email is office@ttsi.com.pl |
There was a problem hiding this comment.
The about me section is usually a bit more concise. Maybe we could shorten it and add some photos from conferences or our team-building events? I think if people see a wall of text, they might get discouraged. We don’t want an overly serious blog, but rather a professional one with a relaxed tone.
There was a problem hiding this comment.
The text itself is good, but I would place the third and fourth paragraphs as a welcome post that outlines our intentions. In the "about me" section, I would keep the focus on us and our work, rather than what’s on the blog.
There was a problem hiding this comment.
I would leave main page empty, without "about us" or just one sentence. I agree with shortening - removing 4th paragraph is fine.
...same-but-different-how-advanced-data-science-techniques-help-us-validate-drug-names/index.md
Show resolved
Hide resolved
kamilsi
left a comment
There was a problem hiding this comment.
Great work on this article—it’s thorough and informative! I have a few suggestions to refine it further:
-
Tone: Since we’re aiming for a 'serious friend' tone, I’d recommend avoiding casual language, such as in some headers (e.g., 'Pharmaceutical Gig'). Keeping a professional yet approachable tone would better suit our goals.
-
Focus: To keep the article sharp and engaging, I’d suggest cutting out anything that isn’t directly about fixing typos in drug names. For example, the
introduce_variationfunction for generating typos feels tangential and might distract readers from the main topic. -
Streamlining: There are a few opportunities to streamline the content:
- Avoid repetitions of ideas already covered, like the challenges with inconsistent drug names.
- Minimize code output, such as printing entire data frames or diagnostics, which can overwhelm readers. Summarizing key outputs or providing representative samples instead would keep the focus tight and improve readability.
These changes could make the article clearer, more concise, and more aligned with its purpose. Let me know what you think!
| @@ -1,3 +1,4 @@ | |||
| source("renv/activate.R") | |||
There was a problem hiding this comment.
I cannot work on my laptop already - some C compilation issues :-/ Probably renv was a good idea, but it's going to be complicated.
|
|
||
| .post-content { | ||
| color: var(--content); | ||
| text-align: justify; |
There was a problem hiding this comment.
I wouldn't say I like justified text aesthetically (especially on webpages, docs are OK). @salatak what do you think?
| categories: | ||
| - machine learning | ||
| - R | ||
| - statistics | ||
| - text analysis | ||
| tags: | ||
| - data visualization | ||
| - drug names | ||
| - eCRF | ||
| - data validation | ||
| - levenshtein distance | ||
| - NLP | ||
| - t-SNE |
There was a problem hiding this comment.
What is the difference between tags and categories? I think only one of these is presented.
| - NLP | ||
| - t-SNE | ||
| slug: same-same-but-different-how-advanced-data-science-techniques-help-us-validate-drug-names | ||
| ShowToc: yes |
There was a problem hiding this comment.
It's a very long post if you need a ToC :-)
...same-but-different-how-advanced-data-science-techniques-help-us-validate-drug-names/index.md
Show resolved
Hide resolved
| ``` | ||
| ## [1] "acetylsalicylic acid and corticosteroids" | ||
| ## [2] "aluminium preparations" | ||
| ## [3] "aminophylline" | ||
| ## [4] "amphotericin B" | ||
| ## [5] "antazoline" | ||
| ## [6] "artesunate and amodiaquine" | ||
| ## [7] "azacitidine" | ||
| ## [8] "benazepril and amlodipine" | ||
| ## [9] "benzocaine" | ||
| ## [10] "benzoyl peroxide" | ||
| ## [11] "betaine hydrochloride" | ||
| ## [12] "betamethasone" | ||
| ## [13] "betaxolol, combinations" | ||
| ## [14] "bexagliflozin" | ||
| ## [15] "biperiden" | ||
| ## [16] "bupivacaine and meloxicam" | ||
| ## [17] "buspirone" | ||
| ## [18] "calcium lactate" | ||
| ## [19] "calcium lactate gluconate" | ||
| ## [20] "captopril" | ||
| ## [21] "carumonam" | ||
| ## [22] "casopitant" | ||
| ## [23] "cefapirin" | ||
| ## [24] "ceftibuten" | ||
| ## [25] "chymopapain" | ||
| ## [26] "clotiazepam" | ||
| ## [27] "cyanocobalamin" | ||
| ## [28] "desonide and antiseptics" | ||
| ## [29] "dexamethasone and antiinfectives" | ||
| ## [30] "difluprednate" | ||
| ## [31] "digitalis leaves" | ||
| ## [32] "diisopromine" | ||
| ## [33] "eosin" | ||
| ## [34] "epinastine" | ||
| ## [35] "eplontersen" | ||
| ## [36] "eptifibatide" | ||
| ## [37] "ferric acetyl transferrin" | ||
| ## [38] "fluciclovine (18F)" | ||
| ## [39] "flumetasone" | ||
| ## [40] "fluorouracil, combinations" | ||
| ## [41] "flutrimazole" | ||
| ## [42] "folic acid" | ||
| ## [43] "fostemsavir" | ||
| ## [44] "gatifloxacin" | ||
| ## [45] "gefarnate, combinations with psycholeptics" | ||
| ## [46] "histapyrrodine, combinations" | ||
| ## [47] "Hyperici herba" | ||
| ## [48] "idrocilamide" | ||
| ## [49] "indometacin, combinations" | ||
| ## [50] "iodine iofetamine (123I)" | ||
| ## [51] "isoprenaline" | ||
| ## [52] "istradefylline" | ||
| ## [53] "kanamycin" | ||
| ## [54] "lactulose" | ||
| ## [55] "levodopa" | ||
| ## [56] "levonorgestrel" | ||
| ## [57] "lincomycin" | ||
| ## [58] "magnesium carbonate" | ||
| ## [59] "mecasermin" | ||
| ## [60] "megestrol and estrogen" | ||
| ## [61] "menadione" | ||
| ## [62] "methaqualone" | ||
| ## [63] "micafungin" | ||
| ## [64] "moexipril and diuretics" | ||
| ## [65] "narcobarbital" | ||
| ## [66] "nebivolol and amlodipine" | ||
| ## [67] "nimetazepam" | ||
| ## [68] "odevixibat" | ||
| ## [69] "pegloticase" | ||
| ## [70] "perphenazine" | ||
| ## [71] "pethidine" | ||
| ## [72] "phenylephrine" | ||
| ## [73] "pipotiazine" | ||
| ## [74] "pirprofen" | ||
| ## [75] "plerixafor" | ||
| ## [76] "potassium citrate" | ||
| ## [77] "prazosin" | ||
| ## [78] "prednisone" | ||
| ## [79] "remoxipride" | ||
| ## [80] "reteplase" | ||
| ## [81] "rifamycin" | ||
| ## [82] "rivastigmine" | ||
| ## [83] "roxithromycin" | ||
| ## [84] "salsalate" | ||
| ## [85] "sorbitol" | ||
| ## [86] "streptokinase" | ||
| ## [87] "succinimide" | ||
| ## [88] "taurolidine" | ||
| ## [89] "technetium (99mTc) pertechnetate" | ||
| ## [90] "teneligliptin" | ||
| ## [91] "theophylline, combinations excl. psycholeptics" | ||
| ## [92] "ticarcillin" | ||
| ## [93] "tiemonium iodide and analgesics" | ||
| ## [94] "timolol, thiazides and other diuretics" | ||
| ## [95] "tolperisone" | ||
| ## [96] "tramadol" | ||
| ## [97] "tretoquinol" | ||
| ## [98] "trypsin, combinations" | ||
| ## [99] "ursodoxicoltaurine" | ||
| ## [100] "zidovudine" | ||
| ``` |
There was a problem hiding this comment.
Feels a bit redundant, as the structure of the codebook has already been presented earlier. Listing all the names in full doesn’t seem to add much new information and might distract the reader.
| We're looking to add a bit of confusion to our drug names, so we've created a function called `introduce_variation`. It takes a name and returns a new version with a duplicate, deleted, or rearranged character. | ||
|
|
||
|
|
||
| ``` r | ||
| introduce_variation <- function(name) { | ||
| # Randomly choose a type of modification to introduce a typo | ||
| modification <- sample(c("duplicate", "remove", "swap"), 1) | ||
| name_chars <- unlist(strsplit(name, "")) | ||
|
|
||
| if (modification == "duplicate") { | ||
| # Duplicate a random character | ||
| duplicate_pos <- sample(1:length(name_chars), 1) | ||
| name_chars <- append(name_chars, name_chars[duplicate_pos], after = duplicate_pos) | ||
| } else if (modification == "remove") { | ||
| # Remove a random character | ||
| remove_pos <- sample(1:length(name_chars), 1) | ||
| name_chars <- name_chars[-remove_pos] | ||
| } else if (modification == "swap") { | ||
| # Swap two adjacent characters | ||
| swap_pos <- sample(1:(length(name_chars) - 1), 1) | ||
| temp <- name_chars[swap_pos] | ||
| name_chars[swap_pos] <- name_chars[swap_pos + 1] | ||
| name_chars[swap_pos + 1] <- temp | ||
| } | ||
|
|
||
| return(paste(name_chars, collapse = "")) | ||
| } | ||
| ``` |
There was a problem hiding this comment.
The introduce_variation function adds complexity that might not be necessary for this article. Since the focus is on fixing errors in drug names, introducing random typos as part of the workflow could confuse readers. Pre-preparing a small set of intentional errors and using them consistently would simplify the explanation and keep the focus on error correction rather than error creation.
| # Add additional drug names in Polish | ||
| complete_drug_names <- c(complete_drug_names, c("Kwas acetylosalicylowy i kortykosteroidy", # Acetylsalicylic acid and corticosteroids | ||
| "Węglan magnezu", # Magnesium carbonate | ||
| "Kwas foliowy" # Folic acid | ||
| )) | ||
|
|
||
| unique(complete_drug_names) |> sort() |> head(10) | ||
| ``` |
| ``` | ||
| ## Perplexity: 2 | KL Divergence: 1.365567 | ||
| ## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567 | ||
| ## | ||
| ## Perplexity: 3 | KL Divergence: 1.467266 | ||
| ## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567 | ||
| ## | ||
| ## Perplexity: 4 | KL Divergence: 1.605605 | ||
| ## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567 | ||
| ## | ||
| ## Perplexity: 5 | KL Divergence: 1.612329 | ||
| ## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567 | ||
| ## | ||
| ## Perplexity: 6 | KL Divergence: 1.841238 |
There was a problem hiding this comment.
I would skip this, doesn't add much
| theme_bw() | ||
| ``` | ||
|
|
||
| <img src="{{< blogdown/postref >}}index_files/figure-html/unnamed-chunk-11-1.png" width="672" /> |
There was a problem hiding this comment.
better for SEO to name chunks



#2