Skip to content

Semantics of the _chemical_formula.sum and related data items #533

@vaitkus

Description

@vaitkus

The problem

The _chemical_formula.sum data items is intended to report the summary formula. However, for some time now I have been wondering if it is:

  1. Formula of the compound we expect to crystallise.
  2. Formula of what was actually modelled.

That is, quite often some hydrogen atoms or highly disordered solvent moieties cannot be resolved and are therefore not represented by atomic coordinates in the model. Should they still be included in the formula?

I always assumed that option 1 was the correct one, and I think that most structures published in IUCr journals follow(ed) this rule. If I understand correctly, checkCIF also assumes option 1?

However, structures coming from other journals, especially those dealing with large organometallic complexes, often tend to ignore the smaller solvent molecules or even the unmodelled parts of the main molecules and simply calculate the summary formula from the atomic coordinates.

@nautolycus, @jamesrhester maybe you know which interpretation is the proper one?

Potential contradictions in the definitions

I am also asking, since in the DDLm version of the dictionary the definitions of other data items and their dREL code seem to now also lean more towards interpretation 2.

My line of thinking:

A. The _chemical_formula.weight definition states that:

    Mass corresponding to the formulae _chemical_formula.structural,
    *_IUPAC, *_moiety or *_sum and, together with the Z value and cell
    parameters yield the density given as _exptl_crystal.density_diffrn.

B. Then if we look at the definition of _exptl_crystal.density_diffrn:

Crystal density calculated from crystal unit cell and atomic content.

Looks good so far in support of option (1). However, if we look at the dREL code of this item:

_exptl_crystal.density_diffrn = 1.6605 * _cell.atomic_mass / _cell.volume

It is calculated from the _cell.atomic_mass and not directly from the chemical formulae.

C. The _cell.atomic_mass item seems to be calculated from the ATOM_TYPE loop.
Definition:

    Atomic mass of the contents of the unit cell. This is calculated
    from the atom sites present in the ATOM_TYPE list, rather than
    the ATOM_SITE lists of atoms in the refined model.

dREL:

    mass = 0.

    Loop t as atom_type  {

                   mass += t.number_in_cell * t.atomic_mass
    }
      _cell.atomic_mass = mass

D. Here, again, I am unsure whether the _atom_type.number_in_cell
should hold the number of atoms we managed to model, or the number
of atoms we expected to see in the cell. Judging from the dREL, it is
the former, since this number can be calculated from the ATOM_SITE
loop (is this compatible with the definition of _cell.atomic_mass?):

    With t as atom_type

    cnt =  0.

    Loop a  as  atom_site  {

       if ( a.type_symbol == t.symbol ) {

          cnt +=  a.occupancy * a.site_symmetry_multiplicity
    }  }
    _atom_type.number_in_cell =  cnt

From all this, it would follow that _exptl_crystal.density_diffrn is always true only if all the different chemical formulae were also calculated from the atomic coordinates. While this might make sense for the summary formula, the structural and moiety formulae deal with interatomic connectivity and are thus more likely to reflect the expected composition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions