Besides data extracted from source compound set files, there is a lot of data harvested from external data sources (names, external identifiers, bioactivity data, associated pathways, target and pathway ontologies).
In the table below, all used external data sources and services with information about harvested data are described:
All potency values (pValues) for binding and functional assays are extracted for concrete targets, cell-lines as well as whole organisms. When more than one value for a ligand-target complex is available, the average of these values is calculated.
All activity data are extracted, even the ones where a bioactivity value is not known.
Reactome pathways are matched according to target UniProt IDs.
Data are matched according to target UniProt IDs.
UniChem service is used to harvest external IDs from all available external sources. For harvesting the chembl_webresource_client python package is used.
[DISCONTINUED] Compounds are matched using the ChemSpiPy python package.
Discontinued in P&D 02.2021
PubChem is the main source for the manual extraction of compound structures. Generally, when a compound misses its structure (or the structure is wrong), it is found according to a name provided by a supplier/provider. In many cases, compounds are also identified by their PubChem CIDs/SIDs.
From MolPort, information about the in-stock availability of compounds is used.
From Mcule, information about the in-stock availability of compounds is used.
Compounds are matched according to their BindingDB ligand IDs harvested from UniChem.
From Chemspace, information about the in-stock availability of compounds is used.