What is the physical meaning of atomic charges?

Atomic charges, or atomic partial charges, are non-integer numbers quantifying the balance of positive (nuclear) charge and negative (electronic) charge associated with each atom. In the 3D space, atomic charges represent points placed at the position of the atomic nuclei, and may be termed atomic point charges. The molecular representation based on atomic point charges is thus a very basic abstraction of the molecular electron density.

Atomic charges are conceived to reflect the uneven distribution of electron density in the molecule. While atomic charges are merely concepts and not physical observables, they have been used heavily in theoretical and applied chemistry due to their highly intuitive character and correlation with measurable quantities such as the electrostatic potential, polarity, reactivity, etc. Nowadays, atomic charges are still integral parts of many modeling applications, and are still used in reasoning basic chemical processes.

When employing atomic charges, you must be aware of the limitations inherent to the atomic point charge model. A single number can give an idea about whether there is more electron density around some atoms compared to others, but it cannot characterize the actual distribution of electron density in the space between the atomic nuclei. Thus, all properties which flow from this distribution (such as multipole moments) are generally not well described using atomic charges.

There are more NMR states (models) in my file. Can I run ACC on all, or just a few selected states?

By default, only the first NMR model is loaded from files with multiple models annotated as such. The same holds true for multiple molecules in sdf format - only the first molecule will be loaded. However, ACC can run on any number of molecules at a time, as long as each molecule is uploaded in a separate file.

Therefore, if your input file contains more NMR states, you must first separate your initial file into multiple files, each containing a single NMR state of interest to you. Compress these files as .zip. Upload the .zip archive with all models into ACC, and you can compute atomic charges for all models in a single ACC run.

For example, say you have a .pdb file containing 15 NMR models, and you wish to run ACC for models 1-5. Copy the records belonging to model 1 into a file called model1.pdb. Then the records belonging to model 2 into a file called model2.pdb. Continue till model 5, either manually or via a script. Put these 5 files with unique names into a .zip archive, which you can then upload into ACC. Once you upload, you will see that ACC has detected each of the models separately.

After uploading my molecule, I got a warning about "missing hydrogen atoms". How will it affect my calculation?

EEM, the empirical approach used by ACC to calculate atomic charges, produces atomic charges which respond to changes in conformation and chemical environment. In order to produce chemically relevant atomic charges using EEM, it is necessary that the structure of the molecule be complete. All protons should be present according to the relevant protonation state. Since ACC does not currently include functionality for editing the molecular structure, you must address these issues prior to uploading the molecule into ACC. For example, you may use a server like pdb2pqr to assign protonation states, add protons and subsequently estimate the total molecular charge.

ACC produces a missing H warning if no H are found in the input file (note that the warning does not appear if at least one H is present in the input file). Despite the missing H warning, ACC allows to proceed with the charge calculation step, as it might not always be possible to obtain a perfect structure (e.g., when working with low resolution structures of extremely large complexes). The results from such calculations may not have chemical meaning in their absolute values, but they can be very useful when comparing sets of charges (open vs closed conformation, free vs bound state, etc.).

After uploading my molecule, I got a warning about "unknown chemical element names". How will it affect my calculation?

EEM, the empirical approach used by ACC to calculate atomic charges, operates with atom types based on chemical elements. When uploading the input file, ACC needs to establish the chemical element of each atom, so that it can assign a suitable atom type, and subsequently EEM parameters. ACC expects to find the chemical element at a pre-defined position in the input file, which depends on the formal guidelines established for each file format. For example, in .pdb files, ACC looks for the chemical element in the column after occupancy and temperature factor (positions 77-78).

ACC holds a predefined list of chemical elements from the periodic table of chemical elements. If ACC does not recognize the chemical element for an atom at the expected position in the input file, it will not include this atom in the atomic charge calculation because it cannot assign an atom type, and therefore EEM parameters. This means that the calculation will run only on the remaining atoms, and when you view the results the atomic charge value for the atoms with unknown chemical elements will be "NaN". If none of the chemical elements are recognized (e.g., the file format guidelines are not respected, or the input file comes from a modeling program which uses the element column to store its own atom types), you will get an error for the entire calculation. Finally, if the input file comes from a modeling program which uses the element column to store its own atom types, and these atom types overlap with known chemical elements, it is possible that ACC includes the atoms in the calculation, but it assigns wrong atom types. For example, if "Ca" appears in the element column, ACC will interpret it as calcium even if it was originally meant as C-alpha.

You can circumvent these issues and make sure these atoms are included in the calculation correctly either by fixing the input file, or using a custom EEM parameter set. The first solution is to adjust the element column in the input file so that it really displays chemical elements. Additionally, make sure the input file format follows the formal guidelines. The second solution is to use an EEM parameter set in which you include EEM parameters for the atoms with unknown chemical elements. By doing so you actually define new atom types and corresponding EEM parameters. For example, say your input file contains hydrogen atoms identified in the element column as "H1" and "Ho", depending on their binding partner. ACC will report them as unknown chemical element names warnings. You may create a new EEM parameter set based on one of the built-in sets already available in ACC (see below how to do that). Copy/paste the parameter information for hydrogen (all text enclosed in the Element tags) twice more into this new parameter set. Then change the Element name tag in one case to "H1", and in the other to "Ho". Save the new set with a unique name and select it from the list. When you start your computation, the atoms with the previously unknown element names "H1" and "Ho" will now be included in the calculation, and treated by EEM parameters suitable for hydrogen.

After uploading my molecule, I got the warning "Atoms in the residue contain multiple names". How will it affect my calculation?

This generally occurs if the chain ID is not explicitly included in the input file, but the molecule contains multiple chains with overlapping residue serial numbers. Since no chain IDs are available, ACC assumes that everything belongs to one chain. When it reads atoms with a residue serial number that has already been loaded, it basically overwrites the composition of the residue with that serial number. Consequently, the computation will run, but the results will not be meaningful for the affected residues, and possibly even for neighboring residues.

ACC provides check chain ID warnings both before and after the computation if this problem is detected, so that the input file can be corrected. Depending on how you generated the input file, this problem may or may not occur. For example, if your structure has multiple chains, and you plan to use the pdb2pqr server to add H to your structure and save it in .pqr format, you must remember to tick the option Add/Keep chain IDs in the pqr file in order to produce correct output.

All of the built-in EEM parameter sets report warnings about Missing Atoms. How does it affect my calculation?

EEM, the empirical approach used by ACC to calculate atomic charges, operates with atom types based on chemical elements. When uploading the input file, ACC needs to establish the chemical element of each atom, so that it can assign a suitable atom type, and subsequently EEM parameters. ACC expects to find the chemical element at a pre-defined position in the input file, which depends on the formal guidelines established for each file format.

It may happen that ACC does not recognize the chemical element (see above why), or there are no EEM parameters associated with that chemical element. In this case, the atom in question will not be included in the atomic charge calculation. This means that the calculation will run only on the remaining atoms, and when you view the results the atomic charge value for the atoms for which there were no EEM parameters will be "NaN". If none of the atoms can be assigned suitable EEM parameters, you will get an error for the entire calculation.

If none of the built-in EEM parameter sets contains parameters for all the atoms in your input molecule, you can circumvent this issue and make sure all atoms are included in the calculation by using a custom EEM parameter set in which you manually include EEM parameters for atoms listed as Missing Atoms. By doing so you actually define new atom types and corresponding EEM parameters. For example, say your input file contains phosphorus. ACC will report Missing Atoms. You may create a new EEM parameter set based on one of the built-in sets already available in ACC (see below how to do that). Copy/paste the parameter information for one of the atoms already present (all text enclosed in the Element tags) once more into this new parameter set. It is good to choose some atom that has similar chemical properties to phosphorus - especially electronegativity and hardness (S would probably be the best choice if available). Change the Element name tag to "P". Inside each Bond Type tag, you will find the values for parameters A and B. You may keep these values, or modify them (see below how to do that). Save the new EEM parameter set with a unique name and select it from the list. When you start your computation, the phosphorus atoms will now be included in the calculation, although the EEM parameters might not be optimal.

How do I choose a suitable EEM parameter set?

EEM, the empirical method used by ACC to calculate atomic charges, relies on empirical parameters. Many EEM parameter sets have been published in literature. They are available in ACC as built-in sets, each having a unique identifier. By default, ACC tries to suggest some suitable set of EEM parameters based on the type and atomic composition of the molecule you loaded. If ACC is unable to perform this default selection, or you feel the default choice is not optimal, you will need to make sense of the various sets available.

You will notice that the table with parameter sets is organized first according to the target. This is because the applicability domain of a given EEM parameter set is generally limited to the target molecules. Therefore, in general, one should prefer an EEM parameter set which is meant for the type of molecules of interest. This is not an absolute rule. In fact, we have observed that some EEM parameter sets developed for biomolecules perform very well for organic molecules as well. However, the opposite is not true. Therefore, it's better to choose an EEM parameter set according to the type of input molecule.

Target
Description: type of molecules that are likely to be well described using a specific set of EEM parameters
Possible values: organic molecules, drug-like molecules, proteins, etc.

Another requirement is that the set cover all atom types in the input file. This means that all chemical elements present in the input molecule should be on the list of Atoms, and nothing should be listed at Missing atoms. If all built-in sets report Missing atoms, you will probably have to Add a new EEM parameter set where all necessary EEM parameters are provided (see below how).

Atoms
Description: List of atom types covered by the EEM parameter set. Depends on the type of molecules used to produce reference data during the development of the EEM parameters.
Possible values: H, C, N, O, Cl, etc.

Missing Atoms
Description: List of atom types present in the input file but not covered by the EEM parameter set. These atoms will not be included in the atomic charge calculation using this EEM parameter set.

Further, one should consider the approach used during the development of the parameters. EEM parameters are generally developed based on reference quantum mechanical (QM) calculations. A QM calculation is characterized by the setup of the wave function calculation (theory level, basis set, environment), and the type of observables that will be calculated and interpreted. Most commonly, QM reference data used for the development of EEM parameters consists of atomic charges, which are derived from the observable electron density according to a specific charge definition, meaning a procedure used to partition the molecular electron density, or to deduce the electrostatic contribution of each atom. Because atomic charges are not physical observables and have only a conceptual character, there is no unique charge definition that is universally accepted. Rather, a score of charge definitions have been published and are in use, each with their own strengths and weaknesses.

We denote as approach any association of a QM calculation setup and charge definition.

Approach: QM Method, Basis Set, Population Analysis
Description: association of a QM calculation setup and charge definition. Gives the nature of the reference QM data used during the development of the EEM parameters. The applicability domain of an EEM parameter set is closely related to the applicability domain of the reference QM data.
Possible values: QM Method - level of theory used to solve Schrödinger's equation - HF, B3LYP, etc. Basis Set - set of basis functions used to solve Schrödinger's equation - 6-31G, STO-3G, etc. Population Analysis* - charge definition used after solving Schrödinger's equation to partition the molecular electron density, or to deduce the electrostatic contribution of each atom - MPA (Mulliken population analysis), NPA (Natural population analysis), MK (Merz-Kollman scheme for fitting to electrostatic potentials), etc.

The applicability domain and maximum expected accuracy of an EEM parameter set is closely related to the corresponding QM charges obtained by that particular approach. The maximum accuracy for a particular application of any set of EEM parameters is given by the charge definition used during its development. Therefore, if available, pick an EEM parameter set with a higher level of theory, and most importantly a charge definition suitable for the subsequent application of the atomic charges (what you have in mind to do with the charges). For example, pick MK charges if you plan to run simulations, NPA charges if you plan to interpret reactivity, MPA charges if you plan to do QSPR, etc.

The performance of a given EEM parameter set is further influenced by the procedure used when fitting the EEM parameters to the reference data (size and nature of the QM reference dataset, fitting and optimization algorithms, etc.). Sets with higher training set size should theoretically be more robust. The data source should refer to molecules of the same type as your molecule of interest.

Training Set Size, Data Source
Description: Number and type of molecules used to produce reference data during the development of the EEM parameters.

Finally, to help you make decisions faster, we have included a very basic grading system in the form of the priority descriptor given for each EEM parameter set. When in doubt, pick a parameter set with a low value of the priority descriptor (1,2..).

Priority
Description: Very basic grading system. Serves mainly to identify a suitable default setup. Currently curated manually.
Possible values: For EEM parameter sets focused on biomolecules, priorities are assigned based on their performance in the external validation stage of their development. For sets focused on organic molecules, priorities are assigned based on year of publication, level of theory of the QM reference data, and the results of a small in-house QM benchmark on paracetamol. Lower values are preferred.

All in all, it is always good to try several parameter sets, and draw conclusions based on the trends observed in most sets of results. The EEM implementations in ACC are all computationally efficient, so running calculations with multiple parameter sets is not a problem.

How do I read the XML file with EEM parameters?

EEM, the empirical method used by ACC to calculate atomic charges, relies on empirical parameters. Many EEM parameter sets have been published in literature, and are available in ACC as built-in sets. EEM parameter sets are stored in XML format, where the information is organized using tags. Each type of useful information is marked by a start and an end tag, may have attributes and may contain sub-elements or text.

Each EEM parameter set is marked by the ParameterSet tag and the attribute Name, which encodes a unique identifier. Further, each EEM parameter set is described by properties marked by the Properties tag, which provide the literature reference and basic information about the development of the EEM parameters. Please read the section on how to choose a suitable EEM parameter set in order to better understand the importance of the information given by Properties.

Next, the EEM parameters are given under the tag Parameters, which has three attributes. Target and Priority are important for choosing a suitable EEM parameter set. The attribute Kappa is actually a special EEM parameter which, conceptually, modulates the electrostatic interaction of each atom with the surrounding charges.

<ParameterSet Name="E-NPA_6-31Gd_gas">
 <Properties>
   <Property Name="Author">Ionescu, C. M., Geidl, S., Svobodova Varekova, R., Koca, J.</Property>
   ...
   <Property Name="Priority">2</Property>
 </Properties>
 <Parameters Target="Atoms" Priority="0" Kappa="0.00700000">
   <Element Name="C">
     <Bond Type="1" A="2.46311000" B="0.00876600" />
     <Bond Type="2" A="2.46311000" B="0.00876600" />
   </Element>
   <Element Name="H">
     <Bond Type="1" A="2.45929400" B="0.01921200" />
   </Element>
   ...
 </Parameters>
</ParameterSet>

The rest of the EEM parameters operate with atom types based on chemical elements, and are marked by the Element tags. However, some EEM parameter sets available in literature employ atom types which depend not only on chemical element, but also on the maximum bond multiplicity. In such EEM parameter sets there are, for example, different EEM parameters for carbon atoms with sp³ hybridization, than for carbon atoms with sp² hybridization. In order to keep a consistent scheme of storing and assigning parameters, ACC implements by default an EEM parameter scheme which supports bond information via the Bond tag. Thus, for each chemical element there will be one Element tag, and at least one Bond tag. EEM parameter sets which are based solely on chemical elements and no bond information contain multiple Bond tags as well, but the parameters associated with different bond multiplicities (the attribute Type) are actually merely copies, as seen in the example above.

The attribute Type encodes the maximum bond multiplicity. In general, sp³ hybridization is encoded as Type=1, sp² hybridization as Type=2, and sp hybridization as Type=3. These values might seem unintuitive, but they are based on connectivity information from the input file, or computed based on interatomic distances. Type=0 encodes a coordinated metal ion.

The actual EEM parameters are encoded in the attributes A and B, conceptually related to electronegativity and hardness, respectively.

How do I add EEM parameters if they do not exist?

Explain how to approximate parameters (e.g., based on electronegaitivity) Add procedure, copy/paste.

Can I combine EEM parameter sets?

Say I have a biomacromolecule which binds a drug like molecule. Not really necessary (the parameters for biomolecules seem good enough). Explain how to use chemical elements efficiently.

The calculation ran, but I got a warning that some atoms were skipped.

EEM, the empirical approach used by ACC to calculate atomic charges, operates with atom types based on chemical elements. When uploading the input file, ACC needs to establish the chemical element of each atom, so that it can assign a suitable atom type, and subsequently EEM parameters. If ACC does not recognize the chemical element (see above why), or no EEM parameters are available for that chemical element, ACC cannot assign EEM parameters to that atom and it will thus be unable to include it in the EEM calculation.

In the final results, the atomic charge for such atoms that are skipped will appear with the value "NaN". These values will not contribute to any of the statistics computed by ACC.

The calculation ran, but I did not obtain any charges.

EEM, the empirical approach used by ACC to calculate atomic charges, operates with atom types based on chemical elements. When uploading the input file, ACC needs to establish the chemical element of each atom, so that it can assign a suitable atom type, and subsequently EEM parameters. If ACC does not recognize the chemical element (see above why), or no EEM parameters are available for that chemical element, ACC cannot assign EEM parameters to that atom and it will thus be unable to include it in the EEM calculation. If this happens with all the atoms in the input file, then no values of atomic charges will be available in the results. In such situations, ACC will report an error.

The calculation ran, but I got the warning "Missing parameters for symbol ... and multiplicity .... Using value for multiplicity ... instead."

EEM, the empirical approach used by ACC to calculate atomic charges, operates with atom types based on chemical elements. Furthermore, some EEM parameter sets available in literature employ atom types which depend not only on chemical element (or symbol), but also on the maximum bond multiplicity. This means that in such EEM parameter sets there are, for example, different EEM parameters for carbon atoms with sp³ hybridization, than for carbon atoms with sp² hybridization. In order to keep a consistent scheme of storing and assigning parameters, ACC implements by default an EEM parameter scheme which supports bond multiplicity information via the tag Bond and its attribute Type. EEM parameter sets which are based solely on chemical elements contain multiple Bond tags as well, but the parameters associated with different bond multiplicities (Type attribute) are actually merely copies.

Because the unified parameter scheme employs bond multiplicity information, ACC needs to establish the chemical element of each atom, as well as its maximum bond multiplicity, so that it can assign a suitable atom type, and subsequently EEM parameters. When uploading the input file, ACC expects to find the chemical element at a pre-defined position in the input file, which depends on the formal guidelines established for each file format. For example, in .pdb files, ACC looks for the chemical element in the column after occupancy and temperature factor (positions 77-78).

As soon as the calculation starts, ACC' attempts to obtain bond information. First, it searches the input file for connectivity information. If this is not present, ACC attempts to establish the maximum bond multiplicity based on interatomic distances. This algorithm generally has trouble when interatomic distances vary significantly from the expected norms, or when handling coordinated atoms, and it may generate unexpected bond multiplicities for which no EEM parameters are available. To overcome such situations, ACC falls back to the EEM parameters for the nearest bond multiplicity available.

This fallback also happens when the input molecule contains an atom type with a maximum bond multiplicity which was indeed not covered during the development of the EEM parameters used in a given calculation. For instance, if the reference data used for the development of the particular EEM parameter set you chose did not contain any sp² nitrogen (Bond Type="2"), ACC will fall back to the EEM parameters for sp³ nitrogen (Bond Type="1"). ACC will produce a warning to inform you of this fact, and in the final results the atom type will still be N:2, as originally detected in the input file. It is up to you to decide if the values of atomic charges for these problematic atoms are acceptable.

Why are the charges from different sets so different, and how are these differences relevant?

Explain charge definitions, QM reference data and EEM parameters...

Approach: QM Method, Basis Set, Population Analysis
Description: The nature of the reference data used during the development of the parameters. Reference data generally comes from high level Quantum Mechanical (QM) calculations. The applicability domain of an EEM parameter set is closely related to the applicability domain of the reference QM data used during the development.
Possible values: QM Method - level of theory used to solve Schrödinger's equation - HF, B3LYP, etc. Basis Set - set of basis functions used to solve Schrödinger's equation - 6-31G, STO-3G, etc. Population Analysis* defines how the reference data (most commonly atomic charges) were obtained after solving Schrödinger's equation - MPA (Mulliken population analysis), NPA (Natural population analysis), MK (Merz-Kollman scheme for fitting to electrostatic potentials), etc.

Why do residues have non-integer charge?

EEM, the empirical approach used by ACC to calculate atomic charges, works at the atomic level, and does not see the electronic structure. Nonetheless, due to the principle of electronegativity equalization, EEM allows electron density to spread across the molecule in a manner which depends on the nature of the atoms and the chemical environment created by the surrounding atoms. The degree to which this happens also depends on the charge definition and fitting algorithms used during the development of the EEM parameters (see above).

This means that atomic charges in a residue depend on the conformation of the residue, as well as the nature and conformation of nearby residues. The total charge on each residue may differ from the expected formal charge (-1, 0, +1) due to charge transfer to the surrounding residues, ligands, ions, water, etc. While this behavior is realistic, it may not be desired for some applications (e.g., some modeling programs expect integer charge on each residue).

Can I get good electrostatic potentials?

Whether or not atomic charges can generate accurate electrostatic potentials depends on several factors. First, certain charge definitions (see above) are based on principles which relate atomic charges to electrostatic potentials. Therefore, if you expect to compute potentials based on the atomic charges, you should probably pick an EEM parameter set developed for a suitable charge definition (MK, CHELPG...). Second, the concept of atomic point charges is inherently limited with respect to describing charge gradients, therefore the resulting potentials in some areas of the 3D space around the molecule will be better described than in some other areas.

Further, one should keep in mind that EEM, the method used by ACC to calculate atomic charges, is an empirical approach. EEM parameters available in ACC were mostly fitted to reference data in the form of atomic charges from quantum mechanical (QM) calculations. EEM is an approximation meant to keep as much accuracy as possible (compared to reference data) while maximizing computational efficiency. Therefore, the maximum accuracy to be expected for EEM atomic charges cannot excede the accuracy of the corresponding QM charges in reproducing electrostatic potentials.

Finally, note that some papers provide straightforward evaluations of the ability of their EEM parameter sets to reproduce electrostatic potentials from QM calculations. So follow the citation of the EEM parameter set you plan to use, and see if this information is available in the original paper. Note that the current implementation of ACC provides only the values of atomic charges, and you will have to compute the electrostatic potentials yourself (e.g., on the pdb2pqr server). In the future ACC might support such functionality.

Can I get good dipole moments?

Atomic charges are non-integer numbers quantifying the balance of positive (nuclear) charge and negative (electronic) charge associated with each atom. In the 3D space, atomic charges represent points placed at the position of the atomic nuclei, and may be termed atomic point charges. The molecular representation based on atomic point charges is thus a very basic abstraction of the molecular electron density.

Atomic charges are conceived to reflect the uneven distribution of electron density in the molecule. When employing atomic charges, you must be aware of the limitations inherent to the atomic point charge model. A single number can give an idea about whether there is more electron density around some atoms compared to others, but it cannot characterize the actual distribution of electron density in the space between the atomic nuclei. Thus, all properties which flow from this distribution are generally not well described using atomic charges.

Dipoles and higher order multipoles are known to be poorly approximated by a point charge model. Dipole moments measure the degree of separation of positive and negative charge in the molecule (polarity), and are, by definition, very sensitive to small variations in the distribution of electron density. Even atomic charges computed at the quantum mechanical (QM) level have trouble reproducing dipole moments, though some charge definitions (see above) are less unsuccessful than others (MK charges can be satisfactory for estimating dipole moments for small molecules). It is thus clear that one cannot expect the accuracy of empirical models fitted to QM reference charges to excede the accuracy of QM atomic charges in reproducing dipole moments.

Nonetheless, it is not unlikely that you obtain reasonable results for relative dipole moments in series of derivatives of a certain chemical compound. In other words, the atomic charges obtained from ACC calculations may not provide accurate dipole moments for a single molecule, but it is possible to compare the polarities of many kinds of derivatives of this molecule. Note that the current implementation of ACC provides only the values of atomic charges, and you will have to compute the dipole moments yourself. In the future ACC might support such functionality.

Start by having a look at the main terms used by ACC, or return to the Table of contents.