Data sources
Besides data extracted from source compound set files, there is a lot of data harvested from external data sources (names, external identifiers, bioactivity data, associated pathways, target and pathway ontologies).
In the table below, all used external data sources and services with information about harvested data are described:
Source
Version
Comment
License
25
All potency values (pValues) for binding and functional assays are extracted for concrete targets, cell-lines as well as whole organisms. When more than one value for a ligand-target complex is available, the average of these values is calculated.
2019.03
All activity data are extracted, even the ones where a bioactivity value is not known.
68
Reactome pathways are matched according to target UniProt IDs.
07.2019
Data are matched according to target UniProt IDs.
07.2019
UniChem service is used to harvest external IDs from all available external sources. For harvesting the chembl_webresource_client python package is used.
07.2019
PubChem is the main source for the manual extraction of compound structures. Generally, when a compound misses its structure (or the structure is wrong), it is found according to a name provided by a supplier/provider. In many cases, compounds are also identified by their PubChem CIDs/SIDs.
07.2019
From MolPort, the information about the availability of compounds is used. All external IDs to MolPort are harvested through the UniChem service.
07.2019
From Mcule, the information about the availability of compounds is used. All external IDs to Mcule are harvested through the UniChem service.
07.2019
Compounds are matched according to their BindingDB ligand IDs harvested from UniChem.