Subject: CASP: Call for targets for the 2024 CASP modeling experiment
Date: Sat, 16 Mar 2024 18:04:05 -0700 (PDT)
From: Prediction Center < casp@predictioncenter.org >
CASP (Critical Assessment of Structure Prediction) experiments are held every two years. Recent rounds have seen dramatic increases in modeling accuracy, resulting from the introduction of deep learning methods: In 2018, for the first time, the folds of most proteins were correctly computed1; in 2020, the accuracy of many computed protein structures rivaled that of the corresponding experimental ones2; in 2022, there was an enormous increase in the accuracy of protein complexes3.
We have seen the beginning of what deep learning methods may achieve in structural biology. In addition to further increases in the accuracy of protein complexes, methods are being developed for RNA structures, organic ligand-protein complexes, and for moving beyond single macromolecular structures to compute conformational ensembles. Accurate computational methods together with experimental data also offer the prospect of probing previously inaccessible biological systems. CASP has expanded its scope to provide critical assessment in all these areas.
CASP is only possible with the generous participation of the experimental structural biology community in providing suitable targets: A total of over 1100 targets have been obtained over the previous CASP rounds. We are now requesting targets for the 2024 CASP16 experiment. We need challenge targets in the following areas:
Single protein structures: The 2020 and 2022 CASPs showed that, so far, Alphafold2 and methods built around it are by far the most accurate4. But there are limitations, particularly for some proteins where only a shallow sequence alignment is available and for very large proteins (more than 1000 amino acids). The best results also require substantial amounts of computing resources, well beyond that of the AlphaFold2 default settings. Many new methods are continuing to appear and these may remove some of the remaining difficulties. All types of protein targets are needed, but especially those with shallow sequence alignments, without structural templates, and large proteins.
Protein complexes: In the 2022 CASP15, advanced deep learning methods were applied to protein complexes for the first time5. The result was a huge improvement in accuracy compared with classical docking approaches. But overall, the results are still not at the level achieved for single proteins. So, in CASP16 we need all sorts of targets in this area so as to determine progress since then. We particularly need complexes where there is no evolutionary information across the protein-protein interfaces, for example, antibody-antigen complexes. (This CASP category is conducted in close collaboration with our colleagues at CAPRI - Critical Assessment of protein interactions6).
Nucleic acid structures and complexes: In recognition of the major role nucleic acid structures and complexes play in biology, CASP now includes this class of target. A number of papers claiming successful RNA structure computation using deep learning methods have been published, but those participating in the 2022 CASP RNA category performed less well than classical approaches, and no methods were able to effectively address the two RNA protein-complexes included7. CASP needs a wide variety of RNA, DNA, and complexes as targets to see if this situation has changed. (This CASP category is conducted in close collaboration with RNApuzzles8).
Organic ligand-protein complexes: This area is of major importance for computer-aided drug discovery. Earlier, there have been community experiments to assess the accuracy of methods, particularly SAMPL, CSAR, D3R, and a new one, CACHE, has recently started (cache-challenge.org). These challenges have drawn strong international participation from researchers in both academia and industry. Here too, a number of promising deep learning papers have appeared, but in the 2022 CASP15 pilot, classical methods were still superior9. So, we need appropriate targets to see if progress has been made since. Ideally, these should be sets of three-dimensional protein-ligand complexes from drug discovery projects, but single targets would also be appreciated. Additionally, where available, we will assess non-structural quantities such as affinities or affinity rankings and other properties of pharmaceutical interest when these are available (small molecule pKs, and DMPK related properties).
Ensembles of macromolecule conformations: It is now widely recognized that proteins and nucleic acids often adopt multiple conformations that can underpin their functions. In these cases, considering only a single protein or RNA conformation may be a significant oversimplification. The 2022 CASP15 included a pilot experiment to assess methods for computing multiple conformations, with encouraging results10, but with limitations imposed by the available experimental data. For 2024, we seek not only cases of multiple experimental three-dimensional structures for the same macromolecule but also other types of data that might be used for assessment of computed conformation ensembles such as cryoEM, NMR, X-ray crystallography, SAXS, and/or cross-link data.
Integrative modeling: The more powerful computational methods open up new possibilities for combination with sparse or low-resolution experimental data to investigate previously inaccessible biological structures and machines. CASP is interested in exploring these possibilities and so requests experimentally difficult targets where structure has nevertheless been obtained. In appropriate cases, we expect to be able to collaborate with other experimental groups to provide appropriate data from NMR, cross-linking or SAXS.
There are three avenues to contribute a target to CASP:
The timeline for the 2024 CASP requires that targets are submitted starting now and until July 1. We would like to hear from you as soon as possible if you may have something suitable or have suggestions about other target sources. In order to maintain rigor, the experimental data for a target must not be publicly available until after computed structures have been collected. For assessment, CASP requires the experimental data by August 15, but the data can remain confidential after that. Target providers are invited to contribute to papers11-15 for a special CASP issue of the journal Proteins.