Skip to main content
Get your brand new Wikispaces Classroom now
and do "back to school" in style.
Data Science | DSDHT
Are there any new drugs left?
Boiling the ocean - integrating everything for patient care
Boiling the ocean - integrating everything together
Gene expression data analysis
Information-based Drug Discovery
Introduction to R Shiny
Managing disease with data science
Mapping Structure to function
Add "All Pages"
Practical drug-protein predictive modeling with R
Predictive Modeling in Cheminformatics
Virtual screening is the computational or in-silico screening of biological compounds and complements the HTS process. It is used to aid the selection of compounds for screening in HTS bioassays or for inclusion in a compound-screening library.Virtual screening can utilise several computational techniques depending on the amount and type of information available about the compounds and the target. Protein-based methods are employed when the 3D structure of the bioassay target is known and computational techniques involve the docking (virtual binding), and subsequent scoring, of candidate ligands (the part of the compound that is capable of binding) to the protein target.Ligand-based approaches are usually used when there are compounds known to be active or inactive for a specific target. If a few active compounds are known then structure-similarity techniques may be used; if the activity of several compounds is known then discriminant analysis techniques, such as machine learning approaches, may be applied. This is achieved by choosing several compounds that have known activity for a specific biological target and building predictive models that can discriminate between the active and inactive compounds. The goal is to then apply these models to several other unscreened compounds so that the compounds most likely to be active may be selected for screening. This is the approach taken in this research.The rationale behind the use of machine learning is to discover patterns and signatures in data sets from high throughput in-vitro assays.
In this module, Abhik Seal describes the technical process of bringing together a variety of computational tools in the R statistics package, to enable predictive modeling of compound-target interaction using supervised machine learning methods.
If you understood the video how to perform predictive modeling then give a shot at this
Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin) and try to post the results. This is not an assignment nor the results will be graded.
Max Kuhn ( Director at Pfizer) who is the developer of Caret package shows the uses of R Caret of package. This is quite popular package for predictive modeling.
Links to the papers for Predictive Modeling.
Introduction to ROC analysis
Virtual Screening of Bioassay Data
In-silico predictive mutagenicity model generation using supervised learning approaches
Pubchem as a source of polypharmacology
Open Source platform to benchmark fingerprints for ligand based virtual screening
Modeling of non-additive mixture properties using the Online CHEmical database and Modeling environment (OCHEM)
There are various kind of classification models .Below I listed some of the classification models and its it different properties from
Tom Mitchell's Book
here is the
Resources for Learning Machine Learning
The following knowledge is prerequisite to make any sense out of Machine learning
Linear Algebra by Gilbert Strang:
Convex Optimization by Boyd
Probability and statistics for ML:
Some mathematical tools for ML:
Video+Audio Very bad quality
Probability primer (measure theory and probability theory) :
Machine Learning Cheat sheet
Once the prerequisites are complete, the following are good series of lectures on Machine Learning.
Andrew Ng’s Video Lectures(CS229) :
Andrew Ng’s online course offering:
Learning from Data by Yaser Abu-Mostafa
Tom Mitchell’s video lectures(10-701) :
Videos on Machine Learning
Clustering , EM, SVM, Naive Bayes,PCA
SVMs and kernel methods , Scholkopf:
basics for Support Vector Machines and related Kernel methods. Video+Audio Very bad quality
Kernel methods and Support Vector Machines, Smola:
Introduction of the main ideas of statistical learning theory, Support Vector Machines, Kernel Feature Spaces, An overview of the applications of Kernel Methods.
Easily one of the best talks on SVM. Almost like a run-down tutorial.
Introduction to Learning Theory, Olivier Bousquet.
This tutorial focuses on the “larger picture” than on mathematical proofs, it is not restricted to statistical learning theory however. 5 lectures.
Statistical Learning Theory, Olivier Bousquet,
This course gives a detailed introduction to Learning Theory with a focus on the Classification problem.
Statistical Learning Theory, John-Shawe Taylor, University of London. 7 lectures.
Advanced Statistical Learning Theory, Oliver Bousquet. 3 Lectures.
Most of the above links have been filtered from
Other Important Links:
Channel for probability primer and Machine learning . :
A comprehensive blog comprising of best resources for ML :
Another great blog for ML
Lectures 21-28 by Gilbert Strang, linear algebra way of optimization.
help on how to format text
Turn off "Getting Started"