Skip to content

Initial website optimization & first post added (see: https://github.…#6

Open
kamilpytlak wants to merge 8 commits intomainfrom
initial-content
Open

Initial website optimization & first post added (see: https://github.…#6
kamilpytlak wants to merge 8 commits intomainfrom
initial-content

Conversation

@kamilpytlak
Copy link

#2

@kamilpytlak kamilpytlak requested a review from kamilsi October 1, 2024 11:40
@netlify
Copy link

netlify bot commented Oct 1, 2024

Deploy Preview for ttscienceblog failed.

⚠️ Continuous deployment needs attention — organization-owned private repository detected.
Upgrade to Pro or change repository settings in order to continue deploying from this repository.
For more information, visit the deploy log and FAQ page.

Name Link
🔨 Latest commit
🔍 Latest deploy log https://app.netlify.com/sites/ttscienceblog/deploys/67407def3c5895917aff73f1

@kamilpytlak kamilpytlak requested review from salatak and removed request for kamilsi October 8, 2024 14:03
Copy link

@salatak salatak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way the user is guided through the topic, but I would expect a clearer indication of the benefits that come from using the proposed solution.

Title: "Hi there 👋"
Content: "Welcome to the TT Science Development Blog! Here, you'll find insights, tutorials, and updates about our work in clinical data science, including R, Python, and optimization techniques. Explore our posts, check out our projects on GitHub, or learn more about our team."
Title: "Hey there, data enthusiasts! 👋"
Content: "Here at TT Science, we are a dynamic team of individuals — like trees in a [random forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) 🌳🌲 — each bringing our own unique strengths and perspectives to the table. While we share a common goal of advancing clinical data science, it’s our diversity that makes us truly special! <br><br> 🌈 Join us on this exciting journey 🚀 as we share insights, tutorials, and updates on our innovative projects involving R, Python, and biostatistics. 💻🔍 Explore our posts, dive into our GitHub projects 📂, and get to know the passionate minds behind our work. Together, we’re harnessing the power of data to create impactful solutions in healthcare. 💡💖 Welcome to our forest of knowledge! 🌳🌟"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intro sounds pretty good, but we're overusing emoticons. It makes it look a bit childish and overwhelming at the same time. I think it's better to aim for a 'casual elegant' tone in writing. I like the second part of the intro the most, and I'd consider tweaking the first part, although it's not essential.

Copy link
Contributor

@kamilsi kamilsi Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we have already "about us" + "authors" - plenty of places where we talk "about us". My proposition: let's revert to old version where this was just a friendly waving hand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of such a variation (without “Content” is left a little blank)?:
image

Comment on lines +7 to +17
We’re writing here as practitioners in clinical statistics, data science, and machine learning. Our team consists of Biostatisticians, Data Scientists, Bioinformaticians and Developers. We’re not marketers—we’re the ones in the trenches, working with data every day to solve real-world problems. We support researchers, scientists, and hospitals in medical and life sciences research projects by delivering relevant documentation, software, and statistical reports, utilizing programming languages such as R and Python.

“Data for good” is in our DNA. We’re passionate about doing meaningful work, which includes engaging in open-source projects. We don’t just crunch numbers; we aim to make a positive impact through our expertise. Our main goals include supporting investigators in the planning, execution, and finalization of clinical trials—covering tasks such as sample size calculations, study design, developing clinical trial protocols in compliance with EMA and FDA guidelines, creating Statistical Analysis Plans (SAPs), validating data, and generating statistical reports — as well as developing dedicated software, including applications such as to facilitate the analysis and validation of medical data.

We believe in science and want innovative scientists to tackle the questions and challenges of today’s world. We want scientists to always be able to rely on data—because we all rely on them. Our partnership approach is rooted in the scientific method—we combine data, our clients’ expertise, and our competencies to achieve the best results.

Our mission is to share knowledge with the broader community engaged in medical IT software development, clinical research, and the analysis of medical and biological data, as well as those just beginning their careers in these fields. Through our published content, we aim to showcase solutions for software development and provide guidance to researchers, particularly in the areas of statistics, machine learning, and the regulatory frameworks of the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA).

If you want to check our official page, visit Transition Technologies Science—you’ll learn that we provide software, data science services, and offer analytical support at every stage of clinical trials.

Reach out to us to explore the world of data - our email is office@ttsi.com.pl
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The about me section is usually a bit more concise. Maybe we could shorten it and add some photos from conferences or our team-building events? I think if people see a wall of text, they might get discouraged. We don’t want an overly serious blog, but rather a professional one with a relaxed tone.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text itself is good, but I would place the third and fourth paragraphs as a welcome post that outlines our intentions. In the "about me" section, I would keep the focus on us and our work, rather than what’s on the blog.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave main page empty, without "about us" or just one sentence. I agree with shortening - removing 4th paragraph is fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

Copy link
Contributor

@kamilsi kamilsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this article—it’s thorough and informative! I have a few suggestions to refine it further:

  1. Tone: Since we’re aiming for a 'serious friend' tone, I’d recommend avoiding casual language, such as in some headers (e.g., 'Pharmaceutical Gig'). Keeping a professional yet approachable tone would better suit our goals.

  2. Focus: To keep the article sharp and engaging, I’d suggest cutting out anything that isn’t directly about fixing typos in drug names. For example, the introduce_variation function for generating typos feels tangential and might distract readers from the main topic.

  3. Streamlining: There are a few opportunities to streamline the content:

    • Avoid repetitions of ideas already covered, like the challenges with inconsistent drug names.
    • Minimize code output, such as printing entire data frames or diagnostics, which can overwhelm readers. Summarizing key outputs or providing representative samples instead would keep the focus tight and improve readability.

These changes could make the article clearer, more concise, and more aligned with its purpose. Let me know what you think!

@@ -1,3 +1,4 @@
source("renv/activate.R")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot work on my laptop already - some C compilation issues :-/ Probably renv was a good idea, but it's going to be complicated.


.post-content {
color: var(--content);
text-align: justify;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say I like justified text aesthetically (especially on webpages, docs are OK). @salatak what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison:

image

vs.

image

Please decide, it's all the same to me.

Comment on lines +6 to +18
categories:
- machine learning
- R
- statistics
- text analysis
tags:
- data visualization
- drug names
- eCRF
- data validation
- levenshtein distance
- NLP
- t-SNE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between tags and categories? I think only one of these is presented.

- NLP
- t-SNE
slug: same-same-but-different-how-advanced-data-science-techniques-help-us-validate-drug-names
ShowToc: yes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very long post if you need a ToC :-)

Comment on lines +180 to +281
```
## [1] "acetylsalicylic acid and corticosteroids"
## [2] "aluminium preparations"
## [3] "aminophylline"
## [4] "amphotericin B"
## [5] "antazoline"
## [6] "artesunate and amodiaquine"
## [7] "azacitidine"
## [8] "benazepril and amlodipine"
## [9] "benzocaine"
## [10] "benzoyl peroxide"
## [11] "betaine hydrochloride"
## [12] "betamethasone"
## [13] "betaxolol, combinations"
## [14] "bexagliflozin"
## [15] "biperiden"
## [16] "bupivacaine and meloxicam"
## [17] "buspirone"
## [18] "calcium lactate"
## [19] "calcium lactate gluconate"
## [20] "captopril"
## [21] "carumonam"
## [22] "casopitant"
## [23] "cefapirin"
## [24] "ceftibuten"
## [25] "chymopapain"
## [26] "clotiazepam"
## [27] "cyanocobalamin"
## [28] "desonide and antiseptics"
## [29] "dexamethasone and antiinfectives"
## [30] "difluprednate"
## [31] "digitalis leaves"
## [32] "diisopromine"
## [33] "eosin"
## [34] "epinastine"
## [35] "eplontersen"
## [36] "eptifibatide"
## [37] "ferric acetyl transferrin"
## [38] "fluciclovine (18F)"
## [39] "flumetasone"
## [40] "fluorouracil, combinations"
## [41] "flutrimazole"
## [42] "folic acid"
## [43] "fostemsavir"
## [44] "gatifloxacin"
## [45] "gefarnate, combinations with psycholeptics"
## [46] "histapyrrodine, combinations"
## [47] "Hyperici herba"
## [48] "idrocilamide"
## [49] "indometacin, combinations"
## [50] "iodine iofetamine (123I)"
## [51] "isoprenaline"
## [52] "istradefylline"
## [53] "kanamycin"
## [54] "lactulose"
## [55] "levodopa"
## [56] "levonorgestrel"
## [57] "lincomycin"
## [58] "magnesium carbonate"
## [59] "mecasermin"
## [60] "megestrol and estrogen"
## [61] "menadione"
## [62] "methaqualone"
## [63] "micafungin"
## [64] "moexipril and diuretics"
## [65] "narcobarbital"
## [66] "nebivolol and amlodipine"
## [67] "nimetazepam"
## [68] "odevixibat"
## [69] "pegloticase"
## [70] "perphenazine"
## [71] "pethidine"
## [72] "phenylephrine"
## [73] "pipotiazine"
## [74] "pirprofen"
## [75] "plerixafor"
## [76] "potassium citrate"
## [77] "prazosin"
## [78] "prednisone"
## [79] "remoxipride"
## [80] "reteplase"
## [81] "rifamycin"
## [82] "rivastigmine"
## [83] "roxithromycin"
## [84] "salsalate"
## [85] "sorbitol"
## [86] "streptokinase"
## [87] "succinimide"
## [88] "taurolidine"
## [89] "technetium (99mTc) pertechnetate"
## [90] "teneligliptin"
## [91] "theophylline, combinations excl. psycholeptics"
## [92] "ticarcillin"
## [93] "tiemonium iodide and analgesics"
## [94] "timolol, thiazides and other diuretics"
## [95] "tolperisone"
## [96] "tramadol"
## [97] "tretoquinol"
## [98] "trypsin, combinations"
## [99] "ursodoxicoltaurine"
## [100] "zidovudine"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit redundant, as the structure of the codebook has already been presented earlier. Listing all the names in full doesn’t seem to add much new information and might distract the reader.

Comment on lines +283 to +310
We're looking to add a bit of confusion to our drug names, so we've created a function called `introduce_variation`. It takes a name and returns a new version with a duplicate, deleted, or rearranged character.


``` r
introduce_variation <- function(name) {
# Randomly choose a type of modification to introduce a typo
modification <- sample(c("duplicate", "remove", "swap"), 1)
name_chars <- unlist(strsplit(name, ""))

if (modification == "duplicate") {
# Duplicate a random character
duplicate_pos <- sample(1:length(name_chars), 1)
name_chars <- append(name_chars, name_chars[duplicate_pos], after = duplicate_pos)
} else if (modification == "remove") {
# Remove a random character
remove_pos <- sample(1:length(name_chars), 1)
name_chars <- name_chars[-remove_pos]
} else if (modification == "swap") {
# Swap two adjacent characters
swap_pos <- sample(1:(length(name_chars) - 1), 1)
temp <- name_chars[swap_pos]
name_chars[swap_pos] <- name_chars[swap_pos + 1]
name_chars[swap_pos + 1] <- temp
}

return(paste(name_chars, collapse = ""))
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduce_variation function adds complexity that might not be necessary for this article. Since the focus is on fixing errors in drug names, introducing random typos as part of the workflow could confuse readers. Pre-preparing a small set of intentional errors and using them consistently would simplify the explanation and keep the focus on error correction rather than error creation.

Comment on lines +328 to +335
# Add additional drug names in Polish
complete_drug_names <- c(complete_drug_names, c("Kwas acetylosalicylowy i kortykosteroidy", # Acetylsalicylic acid and corticosteroids
"Węglan magnezu", # Magnesium carbonate
"Kwas foliowy" # Folic acid
))

unique(complete_drug_names) |> sort() |> head(10)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polish is confusing

Comment on lines +443 to +456
```
## Perplexity: 2 | KL Divergence: 1.365567
## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567
##
## Perplexity: 3 | KL Divergence: 1.467266
## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567
##
## Perplexity: 4 | KL Divergence: 1.605605
## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567
##
## Perplexity: 5 | KL Divergence: 1.612329
## Best Perplexity So Far: 2 | Best KL Divergence: 1.365567
##
## Perplexity: 6 | KL Divergence: 1.841238
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would skip this, doesn't add much

theme_bw()
```

<img src="{{< blogdown/postref >}}index_files/figure-html/unnamed-chunk-11-1.png" width="672" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better for SEO to name chunks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants