LogP, LogD, pKa and LogS: A Physicists guide to basic chemical properties

Michael Green
13 min readJan 11, 2024

Introduction

Ever since I got into AI Based Drug Discovery a lot of terminology has been thrown my way. The idea to write this came from (Bhal 2021) which outlines that it’s important to know the fundamentals of molecular properties when doing drug discovery and development. Now, as a theoretical physicist, I’m no stranger to mathematical concepts, but there are a lot of them in Chemistry. In the beginning the most prevalent ones were LogP, LogD, pKa and LogS. I quickly grasped sort of what they were all about but I never really dug into what they actually mean and why they are so important. Hence this blogpost. I hope it will be as useful to you reading it as it was for me writing it. I certainly got a lot of answers to my many questions.

A Quick Primer on Acidity

As most of you probably remember from chemistry in high school, knowing the acidity of a solution is quite useful. Or well, at least we were forced to measure it often enough.. Anyway, to quantify this, the metric pH was introduced by a Danish chemist named Søren Peder Lauritz, who worked at the Carlsberg Laboratory in Copenhagen at the time. It measures the acidity or alkalinity of a solution. It’s essentially a scale we use to quantify just how acidic or basic (alkaline) a water-based solution is. You may hove noticed that we snuck two terms in for the same thing, namely Alkaline and Basic. I will refer to them as equal for the remainder of the post.

The pH scale ranges from 0 to 14:

  • Acidic Solutions: These have a pH less than 7. The lower the pH, the more acidic the solution. For example, lemon juice and vinegar are acidic, with pH values typically around 2 to 3.
  • Neutral Solution: A pH of exactly 7 indicates a neutral solution. Pure water is a classic example of a neutral solution.
  • Basic Solutions: These have a pH greater than 7. The higher the pH, the more basic the solution. For instance, baking soda in water is basic, with a pH around 8 to 9.

Ok so now you know the interpretation of the pH value but we still haven’t defined what it is. So let me do that right now.

pH = -log₁₀ [H⁺]

Here you can see that pH is defined on a log scale of the concentration of hydrogen ions ([H⁺]) in the solution where the concentration is given in moles per liter. One mole (“Mole (Unit) — Wikipedia — En.wikipedia.org” 2023) of any substance contains exactly 6.022⋅ 1023 entities (atoms, molecules, ions, etc.). This number is what the scientific community refers to as Avogadro’s number. You might wonder why the “-” sign in front of the logarithm? Well, it’s simply because we want the pH expressed in positive numbers. The logarithm of a number less than one, which the concentration of anything in a solution always is, is always negative. So the “-” sign converts that to a positive number instead. Ok dude, I hear your inquiring minds cry. Why did you say that the pH is always between 0 and 14 when it’s clear from your definition that we in principle can get a number much higher than 14? Well, it’s a technicality. While mathematically correct, in practice, we never see values outside of the 0–14 range. This is because, in order to reach pH levels below 0 or above 14, one would require extraordinarily acidic or basic solutions, respectively (Lunawat 2023). An example of the pH scale can be seen in Figure 1 where everything from a really basic solution to a pure acid is illustrated.

Figure 1: A pH scale with annotated examples of chemicals at each integer pH value. Illustration by sciencelearn.org.nz.

Ok fine, but so what? Quite a bit actually. pH is important in many areas of science but when it comes to drug discovery and development, for instance, the pH of different parts of the body can affect the absorption and efficacy of drugs. That’s why it’s crucial to know how our molecules react in different pH regimes.

My last point about the pH metric is that since it’s a logarithm of base 10 it means that a pH of 4 is 10 times more basic than a pH of 3 which in turn is 100 times more acidic than a pH of 5.

Ready, Set, Get to the Point!

Armed with the primer on acidity we can start diving into the metrics I started this whole blog post about. Let’s begin with talking a bit about what we want to know about potential molecules that might be useful to distinguish possible drug candidates from other molecules. One important aspect of molecules is how much they like water (hydrophilic) or fat (lipophilic). This matters because a potential drug needs to be able to pass lipid membranes and it cannot very well do that if it prefers the water outside.

LogP: A Measure of Lipophilicity

LogP measures the logarithm of ration between the concentration of a compound in a non-polar solvent and in water. It is common practice to measure this using octanol (PubChem 2003) as the non-polar solvent. Mathematically LogP is defined like

LogP = log₁₀ [Drug in non-polar] — log [Drug in water]

where [Drug] represents the concentration of the drug candidate (molecule). Why does this tell you anything about whether the molecule likes fat or water? Well, to understand this it’s best to take a look at how we can measure this experimentally. Imagine you take a flask consisting of water in the bottom and an oily alcohol (octanol) on the top. Put the molecule inside and shake well. After shaking measure how much of the drug is left in the octanol vs the water. This ratio tells you about the preference of the molecule. If most of it is residing in the octanol it’s more lipophilic. If more of it is in the water it’s more hydrophilic.

Typical values that are considered good for a drug candidate resides somewhere between 2 and 5 which means at least a factor 100 more lipophilic than hydrophilic. This might sound a bit strange if you’re new to this field. Why would a range be the optimal? Why not all lipophilic or all hydrophilic? Well it turns out that a drug, and I’m thinking oral drug here, needs to overcome two main challenges.

  1. It needs to be polar (hydrophilic) to a certain extent since blood mainly consists of water.
  2. It needs to be non-polar (lipophilic) since it needs to be able to cross lipid membranes.

Why does it matter for drug discovery? A drug’s absorption, distribution, metabolism, excretion, and toxicity (ADMET) are profoundly influenced by its logP. Generally, a moderate logP (not too high, not too low) is desirable for good bioavailability. Another term entered into the mix here: Bioavailability. This is a bit fluffy in my opinion but Lin (2022) has a nice definition of it.

Bioavailability is the amount of the drug that enters the blood and produces a therapeutic effect, compared to the total administered amount. It is usually expressed as a fraction.

I will just leave it at that. Exploring the effect a given drug might have is way beyond the scope of this little writeup.

LogD: Because pH Just Ain’t the Same

As I might have hinted at above in my little pH primer, pH is of some significance to drug discovery. Mainly because pH is not the same in all parts of our body. In fact it can differ quite wildly. We have a pH of around 7.4 in our blood while our intestines are more acidic with a pH between 6 and 7.4. Our stomach sports a whoppin’ pH between 1.5 to 3.5 making it a strongly acidic environment. See Figure 1 for examples of the pH scale again.

Ok, so different areas of the body have rather different environments as highlighted by the different levels of pH. But why does this matter to us when developing drugs? Well, because a molecule is not only one molecule. Molecules are susceptible to ionization, a process that adds/removes a hydrogen to/from a functional group. In Figure 2 we can see a molecule in three different states. It features two ionizable groups. Which can be ionized individually and simultaneously depending of the pH of the solvent. All more than one ionization state of the molecule can exist in different concentrations depending on the pH of the solvent.

Figure 2: Illustration of the concentration of a molecule in three different states of ionization as a function of pH.

As it is, these ionizations also matter for the bioavailability which means that pH can directly affect the properties a given molecule has. That’s why it’s a good idea to include the pH level of the part of the body where we want the molecule to be soluble. I know I’m skipping a lot of important details here and am being a bit handwavy but I hope you’ll forgive me. So with apologies prepared let’s move on to see how we can fix the shortcomings of our LogP metric which only measures the concentration of the unionized form of the molecule.

First let’s go back to the definition of LogP which was as given below.

LogP = log₁₀ [Drug in non-polar] — log₁₀ [Drug in water]

Thus LogP is just the logarithm of the partition coefficient P. So as this only looks at the unionized form of the molecule we would like to extend the equation to also take into account the concentration of it’s ionized form in water. This can be accomplished like so.

LogD = log₁₀ [Drug in non-polar] — log₁₀ ([Drug in water] + [Ion in water])

Here D is known as the dissociation coefficient. This also the reason why LogD is said to take pH into account since the amount of the molecule that is ionized depends heavily on the pH of the solvent. In Figure 3 I’m illustrating how LogD varies for Piroxicam (Multum 2022) at different levels of pH of the solvent.

Figure 3: Illustration of how the experimentally determined LogD values of Piroxicam vary at different pH values. The data from this plot was taken from Ulrich, Goss, and Ebert (2021).

Here you can see something quite typical which is that an acidic drug has the highest lipophilicity (high LogD) when the environment is acidic as well. In our case Piroxicam is indeed an acidic drug which is ionized at higher pH values leading to a lower liphophilicity (lower logD).

pKa: All Acids Are Not Equally Strong

Now let’s turn the attention to pKa which connects, in a way, LogP, LogD and pH. The metric pKa tells us how strong an acid is. The lower the value the stronger the acid. The pKa value tells us something about how easily an acid gives away a hydrogen ion. We define pKa like the equation below.

pKa = −log⁡₁₀ Kₐ = −log⁡₁₀ [H⁺][A⁻] + log₁₀ [HA] = pH − log₁₀ ([A⁻]/[HA])

I guess you’re starting to see the trend with the logarithms entering all of our metrics. But, I digress. Now, the Kₐ is called the acid dissociation constant which measures how easily an acid can dissociate in water. Then we just take the negative base 10 logarithm of that. That’s pKa for you! As you can see in the equation above it’s also related to the pH of the solvent. This relationship is known as the The Henderson-Hasselbalch equation. So in plain language the pKa is the negative base 10 logarithm of the concentration ratio between the acid (HA) and its conjugate base (A⁻). Extending this reasoning you can also see that pKa is equal to pH when precisely half of the acid has dissociated. This follows from the fact that half of the acid dissociating means [A⁻]/[HA] = 1. This means that we can interpret pKa as the pH at which half of the acid has dissociated.

I realized I skipped something again as I was reading what I just wrote. The term “conjugate base” just found it’s way into the mix. Let me clarify what I mean by that by quoting “PH and pKa — Shiken.ai” (2024).

Conjugate acids are bases that gained a proton H⁺. Conversely, Conjugate bases are acids that lost a proton H⁺.

One important aspect of pKa that eluded me for the longest of time was that it is intrinsically a local metric. What I mean by that is that it is a given functional group of the molecule which has a pKa value. If the pKa of a functional group is less than 7 it means that it will be positively charged at pH=7 thus being acidic. A pKa larger than 7 means it would be negatively charged at pH=7 which means being more basic.

Figure 4: A pKa table showing the strength of functional groups of acids and their conjugate bases. Graphics taken from Ashenhurst (2010).

So how do we actually use the pKa value in our daily work you say? I said in the beginning of this section that the lower the pKa the stronger the acid. This also means that the higher the pKa the weaker the acid. That makes sense. But what is also important to remember is that pKa is also useful for evaluating the strength of bases. Wait what? Yeah indeed but a common trap here is to just invert the meaning pKa holds for acids. Unfortunately it’s not that simple. Almost, but not quite. The interpretation is the higher the pKa the stronger the conjugate base! Ashenhurst (2010) gives a nice explanation of this.

LogS: Water solubility is key

So far I talked a bit about why we need to know how strong an acid or a base is and that we need to know the lipophilicity of a molecule. This is all well and good but we also need to know how soluble a given molecule is in water. This is where LogS comes in. LogS is as the logarithm of the solubility of a molecule measured in mol/L. Now to complicate matters a bit there are two kinds of LogS that you need to know about. The first is “Intrinsic solubility” which is the solubility measured after equilibrium between the dissolved and solid state at a pH where the compound is neutral (DrugBank 2023). The second is pH-dependent solubility which refers to the solvation equilibrium being affected by the pH of the solution.

Why does LogS matter? Simply because a low solubility goes hand in hand with bad absorption and is also indicative of distribution problems. Therefor most drug hunters avoid molecules with low LogS scores. More than 80% of all drugs in the market have a LogS > −4 (OrganicChemistryPortal). In Figure 5 you can see the LogS vs. pH for five different ionizable drugs. Solubility is the lowest at a pH of around 4. The highest solubility is achieved around pH 7. The species are colored in the original data. Please check out Shoghi et al. (2013) for a nicer visual. I was too lazy to create all the series. Apologies.

Figure 5: The solubility vs. pH profiles of five ionizable drugs of different nature (a monoprotic acid, a monoprotic base, a diprotic base and two amphoteric compounds showing a zwitterionic species each one). The data from this plot was taken from Shoghi et al. (2013).

The actual drug in this data is Sulfadimethoxine (DrugBank 2007) which approval in the US was revoked. It’s unionized form happens at pH of around 4 which means it’s non-polar which leads to the poor solubility in water. Making the solvent more acidic creates a positively charged ion which increases solubility in water. Conversely, making the solvent more basic creates a negatively charged ion which in turn also increases solubility.

Wrapping Up

Well, that was a lot to take in. At least for me. I hope you have gotten a better understanding of LogP, LogD, pKa and LogS and how they are affecting modern drug discovery. I will be the first to admit there’s still a lot more to understand, but this should be a good starting point for diving further into these metrics.

I would like to end this post with a quick cheat sheet in Table 1 for you regarding the 4 properties I have been talking about. But before checking that out I would like to draw your attention to the importance of understanding the uncertainty of all measurements you might do on molecules and proteins. There are very few precise values in the microscopic world and everything is constantly moving. As such any measurement we take will have a significant error attached to it. The size of this error is important to quantify. Especially when we start to train AI models on these type of data. Don’t hesitate to reach out if I have made any mistakes or blunders in my explanation. So whenever you see someone claim that a drug has a certain effect, like killing cancer cells in a petri dish, remember that the problem is never killing the cancer cells. It’s avoiding to kill everything else in the process.

With no further ado, here’s the table.

Happy hacking!

If you like this content give a like and check my other posts. Originally published at https://doktormike.gitlab.io/posts/navigating-logp-logd-pka-and-logs-a-physicists-guide/

References

Ashenhurst, James. 2010. “How to Use a pKa Table — Masterorganicchemistry.com.” https://www.masterorganicchemistry.com/2010/09/29/how-to-use-a-pka-table/.

Bhal, Sanji. 2021. “Partitioning (LogP or LogD) — Are You Using or Measuring the Right Descriptor? — ACD/Labs — Acdlabs.com.” https://www.acdlabs.com/blog/partitioning-logp-or-logd-are-you-using-measuring-the-right-descriptor/.

DrugBank. 2007. “Sulfadimethoxine: Uses, Interactions, Mechanism of Action | DrugBank Online — Go.drugbank.com.” https://go.drugbank.com/drugs/DB06150.

“Log S | DrugBank Help Center — Dev.drugbank.com.” https://dev.drugbank.com/guides/terms/log-s.

Lin, Sean. 2022. “Using Log P and Log D to Assess Drug Bioavailability — Ftloscience.com.” https://ftloscience.com/log-p-log-d-drug-bioavailability/.

Lunawat, Rajat. 2023. “Why Are pH Values Only In A Range Of 0–14? — Scienceabc.com.” https://www.scienceabc.com/pure-sciences/can-ph-have-values-out-of-the-0-14-range.html.

“Mole (Unit) — Wikipedia — En.wikipedia.org.” 2023. https://en.wikipedia.org/wiki/Mole_(unit).

Multum, Cerner. 2022. “Piroxicam Uses, Side Effects & Warnings — Drugs.com.” https://www.drugs.com/mtm/piroxicam.html.

OrganicChemistryPortal. “LogS Calculation — Osiris Property Explorer — Organic-Chemistry.org.” https://www.organic-chemistry.org/prog/peo/logS.html.

“PH and pKa — Shiken.ai.” 2024. https://shiken.ai/chemistry/ph-and-pka.

PubChem. 2003. “1-Octanol — Pubchem.ncbi.nlm.nih.gov.” https://pubchem.ncbi.nlm.nih.gov/compound/1-Octanol.

Shoghi, Elham, Elisabet Fuguet, Elisabeth Bosch, and Clara Ràfols. 2013. “Solubility–pH Profiles of Some Acidic, Basic and Amphoteric Drugs.” European Journal of Pharmaceutical Sciences 48 (1): 291–300. https://doi.org/https://doi.org/10.1016/j.ejps.2012.10.028.

Ulrich, Nadin, Kai-Uwe Goss, and Andrea Ebert. 2021. “Exploring the Octanol–Water Partition Coefficient Dataset Using Deep Learning Techniques and Data Augmentation.” Communications Chemistry 4 (1): 90. https://doi.org/10.1038/s42004-021-00528-9.

--

--

Michael Green

A technology driven artificial intelligence evangelist and machine learning expert, trying to do my part in moving this world forward through science.