SURFACTANT QUANTITATIVE STRUCTURE-PROPERTY RELATIONSHIPS
Introduction
In principle, the structural formula of a surfactant should encode all of the information that will determine its physical properties. In practice, though, our knowledge of chemistry and intermolecular forces is incomplete and does not allow us to calculate all properties from first principles. The use of quantitative structure-property relationships (QSPR) lies between the two extremes in determining surfactant physical properties.
On the one hand, given accurate thermodynamic models, along with accurate calculations of molecular orbitals, molecular conformations, and the interactions among molecules, all physical properties should be precisely determinable. On the other hand, given no such knowledge, physical properties can be measured experimentally. With our QSPR approach we attempt to stake out a middle ground, combining databases of physical properties with quantum chemical and topological calculations based on the molecular structure, in order to find relationships that can be used to make property predictions for structures without such property measurements. This can be summed up by the following quote:
Correlations of surfactant properties to aspects of molecular structure has been in a much more primitive state than that seen in general physical chemistry, and mostly limited to correlations of the carbon number of surfactants to properties, within a limited subclass of compounds. I have taken the QSPR tools developed by others and successfully applied them to predicting a number of properties, creating predictive formulas with a much more general applicability. In the following sections I will cover more details of the prediction of nonionic surfactant critical micelle concentration (CMC) [Huibers et al., 1996b], anionic CMC [Huibers et al., 1996c], nonionic cloud point [Huibers et al., 1996d], anionic Krafft point, and the molecular volume of a variety of hydro- and fluorocarbon surfactant tails.
Previous work. Only recently has there been QSPR-like work related to surfactants, besides simple relationships between properties such as the CMC and chain length for surfactants with linear alkyl tails, as reviewed by Rosen [1989] and covered by Huibers et al. [1996b, 1996c, 1996d]. Two such QSPR works are known. Lindgren [1995], Lindgren and Sj` str` m [1994], and Lindgren at al. [1995] performed a multivariate QSPR study correlating nonionic surfactant properties to empirically derived parameters. The surfactants considered were commercial nonionics of the linear alkyl and alkylphenyl ethoxylate classes. Properties correlated were toxicity, detergency and CMC. Parameters used were constitutional descriptors based on the carbon number, HLB, and cloud point. [Richter et al., 1996] correlated three detergency parameters to nonlinear functions of the Kier and Hall molecular connectivity indices calculated for the surfactant head and tail.
Nonionic Surfactant CMC
Introduction. Until the 1980s, the only structure-property relationship in the surfactant literature was the Klevens’ rule, after Klevens [1953] noted the relationship between the logarithm of the CMC and the carbon number of series of related surfactants. Some examples are given in Rosen [1989] for several different classes of surfactants, at different temperatures (see also Huibers et al. [1996b]), for example the following equation for CnE6 at 25oC.
log10CMC = 1.8 - 0.49 C# [1]
Becher [1984] first introduced a relationship similar to the Klevens rule, with a term for the number of ethylene oxide residues (EO#) in addition to the number of carbon atoms (C#). Ravey et al. [1988] achieved even better results by including a nonlinear term, the product between EO# and C#, arriving at the following equation for alkyl ethoxylates:
log10CMC = 1.77 - 0.52 C# + 0.032 EO# + 0.002 C#EO# [2]
From the theoretical direction (as opposed to these strictly empirical relationships), both Puvvada and Blankschtein [1990] and Nagarajan and Ruckenstein [1991] proposed thermodynamic models that could predict surfactant physical properties such as the CMC. These models are somewhat limited by the lack of understanding of head group interactions, and thermodynamics of tail interactions (limiting the models to linear alkyl tails only), resulting in predictions of limited accuracy, for a limited set of surfactants.
The QSPR approach takes the middle road between these two extremes, developing empirical rules similar to the Klevens rule, but with much more complicated descriptors than the simple carbon number (C#), and also applying several descriptors in one multiple linear regression. The empirical rules developed here are valid for a broader range of structures than either the Klevens relationships or the thermodynamic models are capable of handling. The full details of this effort are described in Huibers et al. [1996b]. The best three descriptor relationship developed for 77 diverse nonionic surfactants (Figure 2-1) at 25oC is
log10CMC = -1.80 - 0.567 t-KH0 + 1.054 t-ASIC2 + 7.51 RNNO [3]
R2 = 0.983, F = 1433, s2 = 0.0313, N = 77
(Figure 2-1. Structures of the surfactants included in the nonionic CMC correlations.)
This formula uses two topological descriptors calculated from the hydrophobic tail, the Kier and Hall index of zero order (t-KH0) and the average structural information content index of second order (t-ASIC2), which together capture the influence of structural variation of the tail on CMC. The final descriptor, the relative number of nitrogen and oxygen atoms in the molecule (RNNO), captures the influence of the hydrophilic head group. The scatter plot of this correlation can be seen in Figure 2-2. The selection of these three descriptors over the hundreds of others available (with literally millions of combinations) was achieved by a heuristic search algorithm coded into the CODESSA program [Katritzky et al., 1994], which was also used for the generation of descriptors and the calculation of statistical parameters for the resulting regressions.
The two major accomplishments of this research effort were the first
application of the QSPR techniques developed in other fields of chemistry
to surfactant chemistry, and the successful application of descriptors
calculated from molecular fragments to a QSPR problem. Surfactant physical
properties were an excellent choice for the use of fragment descriptors,
because what makes surfactants unique is the existence of two distinct
domains, one hydrophobic and one hydrophilic, within the same molecule.
Also, the changes that occur in the description of a physical property,
such as CMC, effect the two fragments differently. The CMC is a perfect
example, as when surfactant molecules form micelles, the hydrophobic domain
undergoes a major environmental change, from aqueous to micellar interior
(hydrophobic), while the hydrophilic fragment stays in the aqueous environment.
(Figure 2-2. Scatter plot for calculated vs. experimental nonionic
CMC for 77 surfactants, using three descriptors.)
Thus, it was no surprise that the descriptors calculated from the two fragments performed better than those calculated in the traditional method from the whole molecule.
The success of this initial study of QSPR techniques applied to surfactants [Huibers et al. 1996b] led us to try CMC prediction for additional surfactant classes, as well as prediction of different surfactant properties. There is value in this analysis besides the regressions themselves. Ideally the choice of descriptors by CODESSA will give insight into what aspect of the molecule is influencing the property of interest, increasing the knowledge of surfactant structure effects on properties.
Anionic Surfactant CMC
The success with the nonionic surfactant CMC prediction led us immediately to try the other most important class of surfactants, the anionics. The details of the anionic CMC predictions can be found in Huibers et al. [1996c]. Together, the nonionics and anionics make up over 90% of the production (and use) of surfactants. A search of the literature resulted in 119 anionic surfactants with CMC values at or near 40oC. Although many CMC values are available at both 25 and 40oC, the higher temperature was used here because the solubility (Krafft point) is often limited for surfactants with larger hydrophobic domains at the lower temperature, and thus CMC is not reached before the surfactant precipitates out of the solution. As the head group for most of the anionics was either a sulfate or sulfonate group (many with additional oxygen or nitrogen atoms in the head group), the majority of the variation seen in the data set was in the hydrophobic domain. Again, we were able to come up with good three descriptor relationships. The first relationship is for the overall data set of 119 structures (Figure 2-3),
log10CMC = (1.89±0.11) - (0.314±0.010)t-sum-KH0
- (0.034±0.003)TDIP - (1.45±0.18)h-sum-RNC
[4]
R2 = 0.940, F = 597, s2 = 0.0472, N = 119
and the second is for a subset of 68 where the head groups were only sulfates and sulfonates.
log10CMC = (2.42±0.07) - (0.537±0.009) KH1
- (0.019±0.002) KS3 + (0.096±0.005) HGP
[5]
R2 = 0.988, F = 1691, s2 = 0.0068, N = 68
This second relationship (Eq. 5) had a much higher correlation coefficient and smaller standard error in the predicted CMC value compared to the literature value. Both of these relationships have just two descriptors calculated from the hydrophobic tail, and one for the hydrophilic head group. In Eq. 4 the head group contribution is represented by the dipole moment of the molecule (this accounts for the position of the charged head group relative to the center of mass of the molecule) and in Eq. 5 the contribution is captured in the head group position, which is the number of the carbon where the head group is attached. The scatter plot for Eq. 4 can be seen in Figure 2-4.
The occurrence of the Kier and Hall molecular connectivity indices (of
order 0 and 1) in the regressions for both the nonionic and anionic surfactants
is important. In all three regressions, this is the dominant descriptor.
It has been noted [Kier and Hall, 1976] that this descriptor correlates
highly to both molecular volume and molecular surface area. This is a
(Figure 2-3. Structures of the surfactants included in the anionic
CMC correlations.)
(Figure 2-4. Scatter plot for calculated vs. experimental anionic
CMC for 119 surfactants, using three descriptors.)
satisfying result, as the qualitative picture for the solubilization of a hydrophobic entity such as the surfactant tail is that the water must form a cage around the solute. This cage results in water forming strained or broken hydrogen bonds, which are energetically unfavorable. These water molecules at the interface are also more restricted in their motion and orientational freedom, thus suffering a entropically unfavorable situation also. Both of these unfavorable situations would scale with the surface area of the interaction. Thus for hydrophobic solutes it is expected that the solubility or CMC (basically the solubility of the tail) would scale with the surface area of the hydrophobic domain. The two K&H indices correlate with surface area (R2 » 0.97) and thus corroborate this qualitative picture of micellization.
Temperature Dependence of Anionic CMC
In order to allow the inclusion of more anionic surfactants in the QSPR study described in the previous section and in Huibers et al. [1996c], it was important to try to develop some scaling laws for CMC as a function of temperature. This would also be useful in applying the QSPR modeling results to other temperatures. There is various experimental evidence for a parabolic relationship between the critical micelle concentration (CMC) and temperature for anionic surfactants. The temperature dependence is the result of entropic and enthalpic contributions from both the surfactant and the surrounding water. There is a minimum in CMC around 25oC, although the minimum temperature appears to be dependent on the size of the surfactant molecule. A general model is proposed for the temperature dependence of anionic surfactant CMC in the 10 to 70oC range, taking into account the size of the hydrophobic domain.
While not as temperature dependent as the nonionic surfactants, the CMC of anionic surfactants clearly has a temperature dependence. This dependence has been described as a complex competition between enthalpic and entropic effects, both of the surfactant molecules and the water surrounding them. Several thermodynamic models exist, though none are at the level where accurate temperature dependence can be predicted. Thus, it would be of value to define an empirical model for the temperature dependence of anionic surfactant CMC. This model should have certain features, such as a parabolic dependence of the CMC with temperature, and a shift in the minimum CMC with surfactant structure. It should also be simple to calculate and should be valid for a wide range of structural variation among the anionic surfactants.
Measurements of CMC for a large number of anionic surfactants have been performed by many research groups over the last several decades. The majority of these measurements were made at either 25 or 40oC. It would be valuable to have general scaling laws to make CMC estimates at other temperatures. A general model can be used to estimate CMC in the range of 10-70oC, given a single CMC value in that range. Care must be taken at the lower end of this range with respect to the Krafft point of the surfactant of interest. Below the Krafft point, the CMC may not be reached, as the surfactant will precipitate out of solution before sufficient concentration is achieved for micelles to be formed. For a means of predicting Krafft point from molecular structure, some models have been developed by Gu and Sjöblom [1991; 1992].
A parabolic model in the form of Eq. 6 has been proposed by van Os et al. [1987; 1988] for the sodium p-(x-decyl)benzenesulfonates and p-(3-alkyl)benzenesulfonates.
log CMC = a + bT + cT2 [6]
This functional form is an excellent fit to the data, but the three coefficients must be calculated for each surfactant structure.
Data and Methodology
Sources of CMC data. The CMC values for the various anionic surfactants were taken from the compilation by van Os et al. [1993] and Mukerjee and Mysels [1971]. A total of 103 CMC measurements for 16 different structures are summarized in Table 2-1, along with the Krafft points and estimated carbon number. All alkylbenzenesulfonate structures are substituted at the para position of the ring.
In order to develop the most accurate data set, CMC values were only used from researchers who conducted measurements at several temperatures. It is difficult to compare measurements at different temperatures from different researchers, as the systematic error in CMC between labs is often greater than the expected change due to temperature. CMC data with at least four temperature values have been measured by Klevens [1948] (C10) for the linear alkyl sodium sulfonates, Rouviere et al. [1983] (C7, C8) and van Os et al. [1988] (3-C9, 3-C12) [1987] (5-C10) for the alkyl sodium benzenesulfonates, and Moroi et al. [1975] (C8, C10, C12, C14) and Rassing et al. [1973] (C8) for the linear alkyl sodium sulfates.
|
C7
|
C8
|
3C9
|
2C10
|
3C10
|
5C10
|
3C12
|
6C12
|
|
|
T(oC)
|
BSO3
|
BSO3
|
BSO3
|
BSO3
|
BSO3
|
BSO3
|
BSO3
|
BSO3
|
|
10
|
||||||||
|
15
|
10.43
|
8.31
|
1.99
|
2.10
|
||||
|
20
|
10.40
|
4.35
|
6.03
|
8.14
|
2.16
|
2.29
|
||
|
25
|
22.8
|
11.4
|
10.27
|
4.63
|
6.02
|
8.01
|
2.22
|
2.38
|
|
30
|
24.0
|
12.0
|
10.41
|
4.86
|
6.04
|
8.13
|
2.35
|
2.50
|
|
35
|
25.0
|
|||||||
|
40
|
26.2
|
12.7
|
10.80
|
4.98
|
6.31
|
8.98
|
2.48
|
2.60
|
|
45
|
27.4
|
|||||||
|
50
|
13.5
|
11.64
|
5.50
|
6.78
|
9.06
|
2.69
|
||
|
55
|
||||||||
|
60
|
14.7
|
12.84
|
6.07
|
7.39
|
10.34
|
3.04
|
||
|
65
|
||||||||
|
70
|
15.8
|
14.13
|
6.92
|
8.35
|
11.39
|
3.30
|
||
|
KP
|
9
|
18.5
|
<0
|
22
|
4
|
<0
|
14
|
|
C8
|
C10
|
C8
|
C10
|
C12
|
C14
|
2C14
|
4C14
|
|
|
T(C)
|
SO3
|
SO3
|
SO4
|
SO4
|
SO4
|
SO4
|
SO4
|
SO4
|
|
10
|
141.6
|
35.0
|
8.66
|
|||||
|
15
|
136.8
|
33.9
|
8.43
|
|||||
|
20
|
133.3
|
33.3
|
8.25
|
|||||
|
25
|
155
|
41
|
130.2
|
33.0
|
8.16
|
2.05
|
3.27
|
5.12
|
|
30
|
131.8
|
32.9
|
8.24
|
2.08
|
||||
|
35
|
42
|
133.7
|
33.3
|
8.38
|
2.14
|
|||
|
40
|
162
|
135.9
|
33.7
|
8.56
|
2.22
|
|||
|
45
|
45
|
138.6
|
34.5
|
8.85
|
2.31
|
|||
|
50
|
177
|
142.1
|
35.5
|
9.18
|
2.43
|
|||
|
55
|
49
|
146.6
|
36.9
|
9.61
|
2.59
|
|||
|
60
|
4.04
|
5.85
|
||||||
|
65
|
55
|
|||||||
|
70
|
||||||||
|
KP
|
22
|
<0
|
<0
|
8
|
21
|
11
|
Note: Abbreviations for the structures are BSO3 (benzenesulfonate),
SO4 (sulfate), Cn (linear alkyl fragment of n carbons) and mCn (branched
alkyl fragment, n carbons, substitution at m carbon).
Statistical analysis. The determination of the optimum descriptor coefficients and the calculation of statistical parameters for the multiple regression were done using the CODESSA program [Katritzky et al., 1994]. This program has been designed for discovering general quantitative structure-property relationships (QSPR) and allows quick comparison of several different proposed models.
Results and Discussion
The functional form of the proposed model (Eq. 7) takes the form of a partition function, with a multiplicative factor (G) on the order of one calculated used to scale a known CMC value to another temperature. The right hand side of Eq. 8 is the logarithm (base 10) of the factor G. The difference of the logarithms is equivalent to the ratio of the CMC values.
CMCT = CMC25 G(Nc,T) [7]
log CMCT - log CMC25 = a0 + a1T + a2T2 + a3NcT + a4NcT2 [8]
or
log [CMCT/CMC25] = a0 + (a1 + a3Nc)T + (a2 + a4Nc)T2
Temperature values (T) have units of Kelvin, CMC has units of 10-3 moles/liter (mM), and Nc is the alkane carbon number of the surfactant. Estimation of Nc for other than linear alkyl sodium sulf(on)ates will be discussed later.
The best fit and statistical parameters of the coefficients can be seen in the equations below. Models with and without alkyl chain length dependence are presented.
logG(Nc,T) = 6.549 - 4.401´ 10-2 T + 7.400´ 10-5 T2
r = 0.959, F = 420, s2 = 0.0002, N = 76
Model One. No molecular size dependence.
logG(Nc,T) = 5.382 - 3.305´ 10-2 T + 5.042´ 10-5 T2 - 2.923´ 10-4 NcT + 9.762´ 10-7 NcT2
r = 0.982, F = 470, s2 = 0.0001, N = 76
Model Two. Molecular size dependence.
In these equations, r is the correlation coefficient, F is the F-statistic representing the quality of the model in fitting nonrandom changes in the data, s is the standard error, and N is the number of data points used in the calculation.
The quality of the fit of the model equations can be seen graphically in Figure 2-5. The calculated results clearly show both trends apparent in the data: the change in the parabolic fit, and a shift in the temperature of the minimum CMC value, with a change in the size of the hydrophobic domain of the surfactant. The model is a good fit for a wide variety of anionic surfactant structures, including linear, branched, and aromatic sulfates and sulfonates.
Estimation of carbon number. The carbon number (Nc)
is originally defined as the number of carbons in the linear alkyl portion
of a surfactant. It has been demonstrated that the logarithm of CMC is
proportional to Nc. For the linear alkyl sulfates and
sulfonates, the
(Figure 2-5. Normalized CMC vs. temperature for: a) sulfates and
b) sulfonates.)
following equation [Rosen, 1976] describes this relationship between CMC and Nc at 25oC. Coefficient values for Eq. 9 at temperatures are tabulated by Rosen [1989].
log CMC25 = 1.51 - 0.30 Nc [9]
A problem arises when the hydrophobic structure is more complex than a linear carbon chain. For branched and aromatic structures, an 'effective' carbon number can be defined, which when used in Eq. 8 predicts the CMC correctly. It is well established that branching and aromatic ring structures have lower effective carbon numbers than the actual number of carbon atoms present in the molecule.
Two methods are proposed for the estimation of carbon number for use in Eq. 8. The first is simply using a known CMC value of the surfactant and Eq. 10 to estimate Nc. Equation 10 is simply Eq. 9 solved for Nc.
Nc = 5.03 - 3.33 log CMC25 [10]
A second approach uses a topological index, the 0cn developed by Kier and Hall [1976]. This approach assigns a contribution to each carbon atom based on its valence and number of attached hydrogens and sums the contributions of all carbon atoms in the hydrophobic domain of the surfactant.
0cn = S (dni)-1/2 where dni = (Zni- Hi)/(Zi - Zni - 1) [11]
Zi is the total number of electrons in the ith atom, Zni is the number of valence electrons, and Hi is the number of hydrogens directly attached to the ith atom. For hydrocarbon chains, valence delta values are ((dni)-1/2) 1.0 for CH3, 0.707 for CH2, 0.577 for CH, 0.500 for C, and 3.308 for a phenyl group with two substitutions. For fluorocarbons, values are 1.634 for a CF3 group and 1.256 for a CF2 group. Given the calculated value of0cn for the hydrophobic domain, the carbon number can be estimated by the following equation.
Nc = 1.414 0cn - 0.414 [12]
The applicability of the molecular connectivity index for predicting the effect of the hydrophobic domain on CMC has been established by Huibers et al. [1996b]. The value of this approach is that it is independent of temperature and does not require a CMC value for determination, as it is calculated exclusively from molecular structure.
Conclusion. An empirical model has been developed for the temperature dependence of CMC for anionic surfactants. A high correlation was found for this model (r = 0.982) given a set of 103 CMC values for 16 surfactants, representing linear, branched, and aromatic sodium sulfates and sulfonates, at temperatures in the range of 10 to 70oC.
Nonionic Surfactant Cloud Point
With the success of applying the QSPR methodology to the prediction of CMC, we attempted another surfactant physical property. For the nonionic surfactants, a phase transition is sometimes observed upon heating of the surfactant solution. A transition occurs from a single phase micellar solution to surfactant rich and surfactant poor phases. This has been attributed to the weakening of the hydrogen bonds necessary for solubilizing the hydrophilic domain of the molecule. Again, details of this effort are described by Huibers et al. [1996d], and a brief summary follows.
Similar to the Klevens rule for CMC prediction, the relationships among the cloud point, the carbon number, and the ethylene oxide number have long been known. For the data considered here, we recalculate the equation, including error terms and other statistical parameters, for 46 linear alkyl ethoxylates:
CP = (87.1± 3.3) log EO# - (5.78± 0.38) C# - (40.7± 5.2)
R2=0.943, F=355, s2=40.4, N= 46 [13]
Given the success of the topological descriptors in accounting for the variation of the hydrophobic domains for the prediction of CMC, we believed that the same method would be successful for the prediction of cloud point for a wide variety of hydrophobic structural features. For 62 structures (Figure 2-6), the best relationship for cloud point (oC) was as follows:
CP = (-264.± 17.) + (86.1± 3.0) log EO#
+ (8.02± 0.78) 3k- (1284± 86) 0ABIC - (14.26± 0.73) 1SIC
R2=0.937, F=211, s2=42.3, N=62 [14]
This regression was found to fit a range of linear alkyl, branched alkyl,
cyclic alkyl, and alkylphenyl ethoxylates with a standard error of 6.5
oC. The scatter plot for this correlation can be found in Figure
2-7. Three descriptors account for the hydrophobic structure variation.
In Figure 2-8, we show graphically the specific values of the descriptors
for the different hydrophobic fragments, showing how they account for different
structural features with different weightings. The hydrophilic variation
is accounted for by the logarithm of the ethylene oxide number, which is
the descriptor used in the Klevens-like relationships. Although different
hydrophilic descriptors were tried, none were noticeably superior to this
descriptor.
(Figure 2-6. Structures of the surfactants included in the nonionic
cloud point correlations.)
(Figure 2-7. Scatter plot for calculated vs. experimental nonionic
cloud point for 62 surfactants, using three descriptors.)
(Figure 2-8. Topological descriptor values vs. carbon number for
the different hydrophobic structures in the cloud point correlation.)
Anionic Surfactant Krafft Point
The Krafft point is an important physical property of ionic surfactants, establishing the minimum temperature at which a surfactant can be used. The solubility of ionic surfactants in water is influenced by temperature, and the Krafft point is the temperature at which a hydrated surfactant crystalline solid melts and forms micelles in solution [Krafft and Wiglow, 1895]. Below this temperature, there are surfactant monomers in solution, in equilibrium with the solid, but the concentration is below CMC so the solubility is limited. If a micellar solution is cooled below the Krafft point, it will precipitate out of solution and the detergency of the solution will be lost.
Several aspects of the influence of molecular structure on Krafft point are known. Gu and Sjöblom [1991, 1992] established that for a wide variety of anionic surfactants, the Krafft point increases on average 5.5oC for each methylene (-CH2-) group in the hydrophobic tail. Also, for the one case available, there is an average decrease of 9oC per ethylene oxide residue (-CH2CH2O-) in the hydrophilic head group. Finally, no systematic formula could be developed for the influence of the head group, and so they developed a table of constant terms for the contributions from different head groups. As can be seen in these constants and also noted by others [Matsuki et al., 1996], the counterion plays a significant role and can change the Krafft point by over 10oC. It is interesting to note that as it has also been established that there is a linear relationship between the logarithm of CMC and the carbon number, it follows that there is a linear relationship between the Krafft point and log(CMC) for a given homologous series of surfactants.
Results. In order to study the influence of the molecular structure on Krafft point, it is desirable to develop the largest possible set of surfactants with reliable Krafft point measurements. Due to the limited amount of Krafft point data in the literature, this effort focused on anionic surfactants that were sodium salts. Sodium is the most prevalent counterion, and if structure-property relationships can be generally developed for this subset, it provides a good base to expand the study to examine specific counterion effects. Besides the anionic surfactants, there are extremely few published Krafft point values for cationic or zwitterionic materials.
Included in this investigation are 44 linear alkyl sulfates and sulfonates, branched alkylbenzene sulfonates, perfluorinated alkyl sulfonates and acids, and sulfates and sulfonates with an ether or ester linkage to the hydrophobic tail (Figure 2-9). Sources for the Krafft point data include the texts of Rosen [1989], van Os et al. [1993], and additional data from Pueschel and Todorov [1968], etc. and listed in Table 2-3. For each structure, descriptors were calculated for the entire molecule, as well as for the hydrophobic and hydrophilic fragments, as described in previous sections. In the regressions that were chosen as statistically the best, certain hydrophobic fragment descriptors appeared often, but no hydrophilic fragment descriptors were chosen for the final regressions. The influence of the head group was handled by whole molecule descriptors.
For the 44 surfactants in Table 2-2, a regression with a standard error
of 5.3oC has been developed, using four descriptors,
(Figure 2-9. Structures of the surfactants included in the Krafft point correlations.)
| Name |
|
| c10so4 | 8 |
| c11so4 | 7 |
| c12so4 | 16 |
| c13so4 | 20.8 |
| c14so4 | 30 |
| c15so4 | 31.5 |
| c16so4 | 45 |
| c18so4 | 56 |
| c10so3 | 22.5 |
| c12so3 | 38 |
| c14so3 | 48 |
| c16so3 | 57 |
| c17so3 | 62 |
| c18so3 | 70 |
| c07bso3 | 9 |
| c08bso3 | 18.5 |
| 2c10bso3 | 22 |
| 2c12bso3 | 31.5 |
| 2c14bso3 | 46 |
| 2c16bso3 | 54.2 |
| 2c18bso3 | 60.8 |
| 3c10bso3 | 4 |
| 3c12bso3 | 14 |
| 2c13cso4 | 11 |
| 2c15cso4 | 25 |
| 2c17cso4 | 30 |
| c16e1so4 | 36 |
| c16e2so4 | 24 |
| c16e3so4 | 19 |
| c18e3so4 | 32 |
| c18e4so4 | 18 |
| c08aeso3 | 0 |
| c10aeso3 | 8.1 |
| c12aeso3 | 24.2 |
| c14aeso3 | 36.2 |
| c08pso3 | 0 |
| c10pso3 | 12.5 |
| c12pso3 | 26.5 |
| c14pso3 | 39 |
| c16p2so4 | 19 |
| c18p2so4 | 31 |
| cf7so3 | 56.5 |
| cf8so3 | 75 |
| cf7coo | 8 |
Note: nCm is n-substituted Cm, B is phenyl [-C6H4-],
E# is ethoxylate [-(OC2H4)#-], A is [-C(O)-],
P is [-OC(O)C2H4-], P2 is [-(OCH2CH(CH3))2-].
All carbons are hydrogenated except fluorinated cases (CF).
KP = (231± 15) + (3.02± 0.23)f-KS1 - (41.7± 3.2)f-AIC2
+ (20.6± 2.2)NDB - (231± 15)AVO [15]
R2=0.932, F=134, s2=28.1, N=44
where f-KS1 is the Kier shape index of first order, f-AIC2 is the average information content of second order, NDB is the number of double bonds, and AVO is the average valence of an oxygen atom. The scatter plot of this correlation can be seen in Figure 2-10.
In an attempt to find better correlations for more limited sets of structures, the structures with linking ether and ester bonds between the head group and tail were eliminated, resulting in 28 linear branched, aromatic and perfluorinated sulfates and sulfonates. Using three descriptors (because of the limited number of structures), a regression of similar quality was achieved, with R2=0.947, F=143, s2=26.1, N=28, which was not a useful improvement on the regression of the 44 structures. As it is well established that the linear alkyl compounds have a strong linear relationship with chain length, a regression was developed for just the 16 linear alkyl and perfluorinated sulfates and sulfonates, resulting in the following:
KP = - (56.6±3.5) + (6.39±0.44)f-R2 + (1.09±0.57)f-KH2 [16]
R2=0.986, F=466, s2=7.46, N=16
where f-R2 is the Randic index of second order for the hydrophobic
fragment, and f-KH2 is the Kier and Hall molecular connectivity
index of second order for the hydrophobic fragment. This error of 2.7oC
is probably the best accuracy achievable, given the experimental error
of the measurements, and the difference between researchers.
(Figure 2-10. Scatter plot for calculated vs. experimental Krafft
point for 44 surfactants, using four descriptors.)
Conclusions. Kier and Hall [1976] noted the difficulty of achieving good correlations with melting point data, as compared to boiling point data, as the melting point transition involves ordering and more complex intermolecular interactions that must be accounted for, when compared to the boiling point transition. As the Krafft point is basically a melting phenomenon of hydrated surfactant crystals, it was expected that the correlations would not be as good as the CMC correlations. The results achieved, though, were of similar quality to the CMC correlations, although the data set was somewhat limited. From this work it appears that Krafft point can be estimated to within several degrees for sodium salts of anionic surfactants, given only the molecular structure. As the influence of the counterion is known to be significant, a systematic study of the counterion effect would make this a more complete predictive tool.
Molecular Volume of Alkanes and Surfactant Hydrophobic Tails
From the CMC investigations [Huibers et al., 1996b; 1996c], and from examining other works on aqueous solubility [Yalkowsky and Banerjee, 1992], it became clear that the knowledge of molecular volume and surface area of hydrophobic domains plays an important role in correlations of several molecular properties. For a hydrophobic molecule or hydrophobic domain of a surfactant, solubilization in water involves the formation of a ‘cage’ of hydrogen bonded water around the molecule with which water cannot form any specific interactions. Without these favorable specific interactions, such as polar-polar or hydrogen bonding interactions, the unfavorable interactions should be proportional to the surface area of contact. The molecular volume often appears to correlate well because it correlates highly to the molecular surface area.
Previous volume prediction methods. For theoretical studies of surfactant self assembly, a means of calculating the geometric aspects of the surfactant molecules is necessary. Geometrical parameters such as the head group surface area, tail length and tail volume have been applied to models that describe the optimal aggregation state, such as the Israelachvili and coworkers [1976, 1992] packing parameter. Tanford [1972] provided the following equation for his model of the optimum aggregation number of micelles to have a spherical shape, given the tail length and volume,
v = 27.4 + 26.9 Nc [17]
where v is the tail volume in C3 and Nc is the number carbons in the linear alkane tail. This equation has been referred to in many studies since, and most micellar models only consider surfactants with linear alkyl chains for the hydrophobic domain. In order to allow future models to be more general, volume calculation techniques need to be improved to allow for calculations on branched, aromatic, and fluorinated alkyl tails.
The most direct and simple means of estimating volume from group contribution methods, where each molecular subgroup is assigned an additive contribution. Tanford’s [1972] method is just that, with 26.9 C3 assigned to the CH2 group and 54.3 C3 assigned to CH3. van Krevelen [1990] collected and compared many previous efforts at group contribution methods for predicting molar properties.
While the group contribution methods provide a good estimate, it cannot be exact as molecular volume is not strictly an additive property, but rather a constitutive property, which is defined as having a dependence on the arrangement of constituent atoms. To address this molecular connectivity dependence, several different approaches have been used, as summarized in Table 2-3.
| · group contribution |
| · sum of overlapping van der Waals spheres |
| · group contribution corrected for number of gauche conformations |
| · topological indices (molecular connectivity indices) |
The method of overlapping van der Waals spheres is a means of calculating volume given only the van der Waals radii of the constituent atoms and the bond lengths. The molecular is simply the sum of the atomic volumes, minus the overlap in the spheres, as the bond lengths are always less than the sum of radii between two atoms. To be more accurate, precise methods have been developed assuming that no more than three spheres overlap at any one point in space.
One problem with the overlapping spheres method is that it does not accurately account for the void space between atoms, unreachable by solvent or neighboring molecules, that realistically makes up part of the molecular volume. Edward et al. [1978] developed a method for calculating the partial molal volumes of alkanes, accounting for these dead spaces by determining the number of gauche arrangements in the alkane. His formula for the partial molal volumes for linear and branched alkanes in carbon tetrachloride was
Vo = 26.85 CH3 + 17.36 CH2 + 10.35 CH + 3.40 C + 11.61 - 2.5 Zg [18]
where Vo is the partial molal volume in mL/mol, and Zg is the Pitzer steric partition function. It is interesting to note that many of these group contribution methods have a constant term, in this case 11.61 mL/mol, which is called the covolume, initially recognized by Traube [1899]. There has been much debate as to the meaning of this covolume, and it poses a compromise to the group contribution methods, as it depends on no group, and cannot be eliminated without making the regressions for molecular volume worse.
Application of topological indices. The method we consider here for calculation of molecular volume is that of Kier and Hall [1976], using their topological indices, the molecular connectivity indices, for predictions. This method also has the desirable property of accounting for the arrangement of the constituent atoms. As a data driven approach, where relationships between the topological indices and property values are developed, it allows direct prediction of molecular volume in the liquid phase.
In the initial text of Kier and Hall [1976] relationships related to the molecular volume were investigated, including liquid density and molar refraction. The molecular volume (Vo) can be derived from the molecular weight (M) and the liquid density (r) by Vo =M/r. The best relationship for 82 linear and branched alkanes was
r = 0.7348 - 0.2929 (1c)-1 + 0.0030 3cP [19]
R2=0.979, s=0.0046, N=82
The authors developed a second relationship, using only the density numbers where they were assured of values with five or six significant figure accuracy, and arrived at a five descriptor model with R2 = 0.994, s = 0.0024, N = 46. Density relationships were also developed for 40 alcohols, 13 ethers, and 20 aliphatic acids. Molar refraction (Rm) relationships were developed for 70 alkylbenzenes, where Rm is defined as
Rm= Vo(n2-1)/(n2+2) [20]
where n is the index of refraction. Again, an excellent formula could be found with just two descriptors:
Rm = 6.265 - 7.548 1cv + 2.501 2c [21]
R2=0.997, s=0.270, N=70
In their later text [Kier and Hall, 1986] they analyze the partial molal volume data from Edward et al. [1978], whose gauche arrangement correction methods were described in a previous section, to arrive at the following formula:
Vo = 39.79 + 24.87 1c+ 11.86 2c+ 2.844 4cpc [22]
R2=0.999, s=1.17, F=86615, N=37
The standard error of 1.17 is less than 1% of the molar volume, which ranges from 116 to 566 mL/mol for the data set. This method provides good estimates for alkanes with the exact same number and type of groups present, such as the dimethylpentanes and dimethylheptanes. In these classes there is a range of molar volumes, depending on the location of attachment of the two methyl groups, and different volumes for each unique structure are predicted, unlike the group contribution methods, which would predict the same volume for these different isomers.
Results. For modeling the molecular volume of surfactant tails, it is necessary to include more structural variation than just the linear and branched alkanes. We add alkenes, alkylbenzenes, fluorobenzenes, and perfluorinated alkanes to the set of structures considered, for a total of 78 structures (Table 2-4). The CODESSA program [Katritzky et al., 1994] is used to obtain
Vo = (25.2± 0.7) + (5.05± 0.05)NH + (0.0931± 0.0008) G [23]
- (136± 8) RNB - (6.0± 0.5) 3c
R2=0.999, s=1.32, F=21124, N=78
where NH is the number of hydrogen atoms, G is the gravitation index (all bonds), RNB is the relative number of benzene rings, and 3c is the Kier and Hall molecular connectivity index of third order. Here we come up with an excellent correlation (Figure 2-11), with a very small standard error, for a much more diverse data set than for the alkanes previously studied. We can do this by considering many more types of molecular descriptors than just the topological descriptors. Although the geometric, electrostatic, and quantum-chemical descriptors available in the CODESSA program did not show up in the best correlations, some simple constitutional descriptors (NH and RNB) allowed for the development of an excellent regression equation. If just topological descriptors were employed, the best four descriptor correlation using Kier and Hall molecular connectivity indices along with Kier shape indices had statistical parameters of R2=0.991, s=4.29, F=1992, N=78.
It is interesting to note that the best individual descriptor is the Kier and Hall molecular connectivity index of zero order, resulting in a good correlation of R2=0.97, in
| M.V.
(cm3/mol) |
M.W.
(g/mol) |
Density
(g/cm3) |
Structure Name |
| 116.4 | 72.15 | 0.6197 | 2-methylbutane |
| 115.2 | 72.15 | 0.6262 | n-pentane |
| 130.7 | 86.18 | 0.6594 | n-hexane |
| 146.6 | 100.21 | 0.6838 | n-heptane |
| 162.6 | 114.23 | 0.7025 | n-octane |
| 178.7 | 128.26 | 0.7176 | n-nonane |
| 194.9 | 142.29 | 0.7300 | n-decane |
| 227.5 | 170.34 | 0.7487 | n-dodecane |
| 114.6 | 104.14 | 0.9090 | styrene |
| 89.1 | 78.12 | 0.8765 | benzene |
| 106.3 | 92.13 | 0.8669 | toluene |
| 139.1 | 120.20 | 0.8642 | 1,3,5-trimethylbenzene |
| 149.0 | 134.21 | 0.9010 | 1,2,3,4-tetramethylbenzene |
| 94.0 | 96.11 | 1.0225 | fluorobenzene |
| 97.6 | 114.09 | 1.1688 | 1,4-difluorobenzene |
| 115.0 | 186.06 | 1.6184 | hexafluorobenzene |
| 143.5 | 100.21 | 0.6982 | 3-ethylpentane |
| 144.2 | 100.21 | 0.6951 | 2,3-dimethylpentane |
| 144.5 | 100.21 | 0.6933 | 3,3-dimethylpentane |
| 145.9 | 100.21 | 0.6868 | 3-methylhexane |
| 147.7 | 100.21 | 0.67869 | 2-methylhexane |
| 148.7 | 100.21 | 0.6739 | 2,2-dimethylpentane |
| 149.0 | 100.21 | 0.6727 | 2,4-dimethylpentane |
| 157.0 | 114.23 | 0.7274 | 3-ethyl-3-methylpentane |
| 157.3 | 114.23 | 0.7262 | 2,3,3-trimethylpentane |
| 158.8 | 114.23 | 0.7193 | 3-ethyl-2-methylpentane |
| 158.8 | 114.23 | 0.71923 | 3,4-dimethylhexane |
| 158.9 | 114.23 | 0.7191 | 2,3,4-trimethylpentane |
| 159.5 | 114.23 | 0.7161 | 2,2,3-trimethylpentane |
| 160.4 | 114.23 | 0.71214 | 2,3-dimethylhexane |
| 160.9 | 114.23 | 0.71 | 3,3-dimethylhexane |
| 161.8 | 114.23 | 0.70583 | 3-methylheptane |
| 162.1 | 114.23 | 0.70463 | 4-methylheptane |
| 163.1 | 114.23 | 0.70036 | 2,4-dimethylhexane |
| 163.7 | 114.23 | 0.6979 | 2-methylheptane |
| 164.3 | 114.23 | 0.69528 | 2,2-dimethylhexane |
| 164.7 | 114.23 | 0.69354 | 2,5-dimethylhexane |
| 165.1 | 114.23 | 0.6918 | 2,2,4-trimethylpentane |
Table 2-4 (continued).
| M.V.
(cm3/mol) |
M.W.
(g/mol) |
Density
(g/cm3) |
Structure Name |
| 175.6 | 128.25 | 0.7304 | 3,3-dimethylheptane |
| 176.4 | 128.25 | 0.727 | 2,3-dimethylheptane |
| 178.2 | 128.25 | 0.7198 | 2,5-dimethylheptane |
| 178.5 | 128.25 | 0.7185 | 2,2,4,4-tetramethylpentane |
| 179.2 | 128.25 | 0.7158 | 2,4-dimethylheptane |
| 178.2 | 128.26 | 0.7199 | 4-methyloctane |
| 180.5 | 128.26 | 0.7107 | 2-methyloctane |
| 188.4 | 140.27 | 0.7445 | cis-5-decene |
| 189.3 | 140.27 | 0.7408 | 1-decene |
| 189.5 | 140.27 | 0.7401 | trans-5-decene |
| 192.9 | 142.28 | 0.7377 | 2,3-dimethyloctane |
| 193.5 | 142.28 | 0.7354 | 3-methylnonane |
| 194.2 | 142.28 | 0.7326 | 5-methylnonane |
| 194.3 | 142.28 | 0.7323 | 4-methylnonane |
| 194.6 | 142.28 | 0.7313 | 2,6-dimethyloctane |
| 195.4 | 142.28 | 0.7281 | 2-methylnonane |
| 196.5 | 142.28 | 0.7240 | 2,7-dimethyloctane |
| 210.6 | 156.31 | 0.7422 | 3-methyldecane |
| 212.1 | 156.31 | 0.7368 | 2-methyldecane |
| 222.0 | 168.33 | 0.7584 | 1-dodecene |
| 243.7 | 184.37 | 0.7564 | tridecane |
| 260.1 | 198.40 | 0.7628 | tetradecane |
| 276.4 | 212.42 | 0.7685 | pentadecane |
| 292.8 | 226.45 | 0.7733 | hexadecane |
| 327.6 | 254.51 | 0.7768 | octadecane |
| 224.3 | 388.7 | 1.7333 | perfluoroheptane |
| 132.5 | 118.18 | 0.8920 | allylbenzene |
| 139.4 | 120.20 | 0.8620 | propylbenzene |
| 154.9 | 134.22 | 0.8665 | tertbutylbenzene |
| 155.7 | 134.22 | 0.8621 | secbutylbenzene |
| 156.1 | 134.22 | 0.8601 | butylbenzene |
| 157.3 | 134.22 | 0.8532 | isobutylbenzene |
| 170.2 | 148.25 | 0.8710 | 1-butyl-2-methylbenzene |
| 172.1 | 148.25 | 0.8612 | 1-tertbutyl-4-methylbenzene |
| 172.6 | 148.25 | 0.8590 | 1-butyl-3-methylbenzene |
| 172.7 | 148.25 | 0.8586 | 1-butyl-4-methylbenzene |
| 172.7 | 148.25 | 0.8585 | pentylbenzene |
| 205.8 | 176.30 | 0.8567 | heptylbenzene |
| 238.1 | 204.35 | 0.8584 | nonylbenzene |
| 288.2 | 246.43 | 0.8551 | dodecylbenzene |
(Figure 2-11. Scatter plot for the four descriptor regression of
calculated to experimental molar volume of 78 diverse alkanes and alkenes.)
accordance with the comments of Kier and Hall [1976] that the zero and first order indices correlate most highly with molecular volume and surface area (Figure 2-12).
Vo = (4.7± 3.4) + (23.81± 0.47) 0c[24]
R2=0.971, s=7.48, F=2572, N=78
This result is quite reasonable when it is considered that 0c is calculated from individual contributions of each atom, where the contributions are weighted by the number of valence electrons [Huibers et al., 1996c]. Thus, a methyl group (CH3) will have a larger group contribution than a methylene (CH2), which will have a larger contribution than a methine (CH), etc. Also, as 0c is calculated from the (hydrogen free) carbon backbone, the fluorinated groups have much larger contributions then hydrogenated groups (a CF2 group is 1.7 times the contribution of a CH2 group), as fluorines are counted where hydrogens are not in the index contributions [Huibers et al., 1996c]. Also, the error on the constant term is large with respect to the coefficient, indicating that 0c is practically directly proportional to the molecular volume. By examining the residual volume plot (calculated minus experimental volume) for this descriptor, the structures which have volumes most influenced by steric factors are highlighted.
The 0c correlation can be greatly improved by the addition of the Kier shape index of third order (3k), which results in the best two descriptor correlation using topological descriptors:
Vo = (16.8± 2.0) + (20.06± 0.36) 0c+ (2.63± 0.19) 3k[25]
R2=0.992, s=3.92, F=4788, N=78
(Figure 2-12. Scatter plot for the one descriptor (0c)
regression of calculated to experimental molar volume of 78 diverse alkanes
and alkenes, showing structures with greatest steric influence on the molecular
volume.)
The Kier shape index [Kier, 1990] was developed specifically to handle aspects of the molecular shape, allowing a number to be calculated to quantify the extended vs. bulky nature of the molecule. As can be seen in its ability to improve the 0c correlation, it serves this role well in improving the 0c (essentially a group contribution method) correlation to handle steric influences.
Surfactant tail volume. Ideally, volume data would be available for a wide variety of fragments, and thus correlations could be developed directly for the fragment volume. Unfortunately, no such comprehensive measurements are known to the author. The best compromise is to use the means developed for predicting alkane volumes to predicting the volumes of the hydrophobic surfactant tails.
In order to calculate the molecular volume of surfactant hydrocarbon (and fluorocarbon) tails, the same four descriptor relationship developed for the liquid alkanes can be used. The major issue with the direct application of the formula is the handling of the constant term. As with the group contribution techniques, with their covolume term, this term is apparently a property of the whole molecule, and cannot be partitioned among the substituent atomic groups in the molecule. When considering the hydrophobic domain of the surfactant molecule, we are dealing with a fragment of the molecule rather than the whole. How, then, should this constant term be divided? It makes no sense to include the entire term, since if a molecule was split in half and the formula applied to each half, the resulting volume would potentially be more than the entire molecule. Also at issue is that some of the descriptors are based on normalizing by whole molecule parameters, such as the gravitation index (G), or include contributions from neighboring groups, and thus would be a problem for the calculation of contributions from atoms near the division between fragments, such as the higher order molecular connectivity indices (ex. 3c). Another possible estimate would be to weight the constant term by the size, mass, or some other term, so that if, for example, 75% of the mass of the molecule was in the hydrophobic domain, then 75% of the constant term should be applied to the volume estimate. The major problem with any weighting method is that it contracts the basic nature of the covolume term, as by definition it is a whole molecule property, and not an additive property of the components of the molecule.
The best estimate offered here would be to use half of the constant term. Recognizing that the sum of the volume estimates of two halves of any of the molecules used here, using one half of the constant term for each, will almost certainly not result in the same volume as that calculated for the whole, as the descriptors used are not strictly additive. Assuming that the magnitude of the error is small, this leads to the best available estimate for the volume of surfactant hydrophobic tails. A compromise may be to use correlations that have poorer statistical results than the best correlation above, but use descriptors that are more additive in nature, assuring that the constant term for the fragments is more related to the covolume. Another possible way to address this in the future would be to examine data from molar volumes of alcohols, where the volume of the hydroxyl group could be accounted for by group contribution estimates, allowing the hydrophobic fragment molecular volume to be studied. Possible sources of data are in Kier and Hall [1976, 1986] and more recently data from Sakurai et al. [1994].
Conclusion. An excellent correlation has been developed for estimating
the molecular volume, allowing the prediction of volume to a standard error
of 1.32 mL/mol, for a set of 78 molecules of diverse structures, including
linear and branched alkanes, alkenes, alkylbenzenes, fluorobenzenes, and
perfluorinated alkanes. The possibilities of applying this formula to the
prediction of the molecular volume of surfactant hydrophobic tails is discussed,
with the major issue being the handling of the constant (covolume) term
in the fragment.
This QSPR approach has been successfully applied to a wide variety of chemical properties such as boiling point, GC retention time, refractive index, and partition coefficient; and a diverse set of activities, such as corrosion, anesthetic effect, pollutant spreading, and carcinogenicity. The subset of topological descriptors have been applied for over twenty years in different areas, so it was just a matter of time before someone applied them to surfactant chemistry. This can be summed up by a quote from a decade ago:
The major conclusions that can be drawn from this research are as
follows:
2. Good correlations have been found for nonionic and anionic surfactant CMC, nonionic cloud point and anionic Krafft point, and molecular volume of hydrocarbons and fluorocarbons.
3. Better correlations are found if descriptors are considered that have been calculated for the head group and tail fragments of the molecule, rather than for the whole molecule.
4. Topological descriptors are the most useful class in predicting surfactant properties.
5. General QSPR’s allow the estimation of molecular properties for compounds that have not yet been synthesized, as they can be calculated solely from the molecular structure. This also opens up the possibility for molecular design to achieve certain values of properties.