Predictive Modeling in Cheminformatics

Virtual screening is the computational or in-silico screening of biological compounds and complements the HTS process. It is used to aid the selection of compounds for screening in HTS bioassays or for inclusion in a compound-screening library.Virtual screening can utilise several computational techniques depending on the amount and type of information available about the compounds and the target. Protein-based methods are employed when the 3D structure of the bioassay target is known and computational techniques involve the docking (virtual binding), and subsequent scoring, of candidate ligands (the part of the compound that is capable of binding) to the protein target.Ligand-based approaches are usually used when there are compounds known to be active or inactive for a specific target. If a few active compounds are known then structure-similarity techniques may be used; if the activity of several compounds is known then discriminant analysis techniques, such as machine learning approaches, may be applied. This is achieved by choosing several compounds that have known activity for a specific biological target and building predictive models that can discriminate between the active and inactive compounds. The goal is to then apply these models to several other unscreened compounds so that the compounds most likely to be active may be selected for screening. This is the approach taken in this research.The rationale behind the use of machine learning is to discover patterns and signatures in data sets from high throughput in-vitro assays.

In this module, Abhik Seal describes the technical process of bringing together a variety of computational tools in the R statistics package, to enable predictive modeling of compound-target interaction using supervised machine learning methods.

If you understood the video how to perform predictive modeling then give a shot at this KDD dataset i.e (Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin) and try to post the results. This is not an assignment nor the results will be graded.

Max Kuhn ( Director at Pfizer) who is the developer of Caret package shows the uses of R Caret of package. This is quite popular package for predictive modeling.

Links to the papers for Predictive Modeling.

There are various kind of classification models .Below I listed some of the classification models and its it different properties from Tom Mitchell's Book here is the link

Resources for Learning Machine Learning

The following knowledge is prerequisite to make any sense out of Machine learning
Once the prerequisites are complete, the following are good series of lectures on Machine Learning.

Basic ML:

Advanced ML:

Most of the above links have been filtered from

Other Important Links:

Lectures 21-28 by Gilbert Strang, linear algebra way of optimization.