Friday, 12 June 2015

Bioinformatics Assignment: The Bioinformatics Software and Its Applications.


SCHOOL OF PHARMACY

Course Title: Bioinformatics
Course Code: SPH 1062

Assignment 2
Title: The Bioinformatics Software and Its Applications.


Prepared by
:
Arivalagi A/P Sabaramanian (012014052239)
Ashwarnee A/P Saravana Murthy (012014052232)
Nareeta Kaur A/P Narinder Singh (012014052234)
Sivasankari A/P Raman (012014052494)
Subashini A/P Nadaraja (012014052407)
Course
:
Bachelor of Pharmacy (BPH)
Intake
:
August 2014
Lecturer
:
Mr Mohammed Kaleemullah
Date of Submission
:
12th June 2015





1.0 INTRODUCTION
1.1 Definition of Bioinformatics
Generally, Bioinformatics develops knowledge from computer study of biological data. It consists of the information that is stored in the genetic code, experimental results from various sources, patient statistics, and scientific literature (Nilges & Linge, 2010). Furthermore, research in bioinformatics comprises method development for storage, retrieval, and analysis of the data (Nilges & Linge, 2010). Bioinformatics is a fast developing branch of biology which is highly interdisciplinary by using the techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics (Bioinformaticsweb.tk, 2005).


Figure 1: *Adapted from B.E. Biotechnology, Department of Biotechnology, SIT, Tumkur, Karnataka. (2014). What is Bioinformatics?



1.2 Goals of learning Bioinformatics
Formally speaking, bioinformatics is the application of mathematics and computer science methodology to solve problems in biology, and in particular, molecular biology. Thus, there are many goals of learning Bioinformatics which are beneficial for us in daily life and research.
A central goal of Bioinformatics for Biologists is therefore to convey why inherited methods are so vital to a thorough understanding of molecular biology (Howard Hughes Medical Institute, 2012). This is because biologists discover more life processes that require a surprising level of structure as they learn more about the nature of life on the molecular level. For instance, DNA replication, in which an organism's "genetic code" is reproduced, shows such process. The word "code" indicates that mathematics is at hand, where the structure of DNA is based upon the ordering of only four nucleotide bases (Howard Hughes Medical Institute, 2012).
In addition, development and implementation of computer programs that enable efficient access to, use and management of various types of information is considered to be one of the important goals of learning Bioinformatics (Rai University, 2015). It also helps in enhancing the development of new algorithms and statistical measures with which to assess relationships among members of larger data (Rai University, 2015).
Besides that, learning Bioinformatics provides understanding in various biological processes. The examples of biological processes are as follows (Rai University, 2015):
o   Pattern reorganization
o   Sequence alignment
o   Gene finding
o   Assembly
o   Drug designing
o   Protein structure alignment
o   Gene expression
o   Genome annotation

Moreover, Bioinformatics provide better understanding in a living cell and how it functions at the molecular level (B.E. Biotechnology, Department of Biotechnology, SIT, Tumkur, Karnataka, 2014). It also enhances the analysis of raw molecular sequence and structural data; generate new insights and provide a global perspective of the cell (B.E. Biotechnology, Department of Biotechnology, SIT, Tumkur, Karnataka, 2014).


Figure 2: *Adapted from B.E. Biotechnology, Department of Biotechnology, SIT, Tumkur, Karnataka (2014). The central dogma.
Furthermore, Bioinformatics provides integrated knowledge and technical skills gained from diverse scientific disciplines of biochemical, mathematical, computational and life sciences which improves understanding in the key problems, possible solutions, and latest advances in bioinformatics(Howard Hughes Medical Institute, 2012). Another goal of Bioinformatics is to provide understanding of the process of scientific inquiry, preparation for rigorous research, quantitative problem solving skills, data analysis and interpretation of results (Nilges & Linge, 2010).

2.0 APPLICATIONS OF BIOINFORMATICS

2.0 APPLICATIONS OF BIOINFORMATICS

Particularly, bioinformatics utilizes a broad range of computational methods including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, gene finding, expression data clustering, and prediction of protein structure (Department of Molecular Biophysics and Biochemistry, 2001). In general, the applications of bioinformatics are extremely crucial in today’s modern world. There are many uses of the bioinformatics software that had assisted the world in many ways for better human lives. Below are some of the examples that indicate the applications of bioinformatics software in certain fields.

  • Application of bioinformatics in drug discovery
  • Application of bioinformatics in cancer research
  • Application of bioinformatics in genetics
  • Application of bioinformatics in medicine
  • Application of bioinformatics in biotechnology
  • Application of bioinformatics agriculture
  • Application of bioinformatics in microbiology
  • Application of bioinformatics in molecular medicine

Nevertheless, all the applications of bioinformatics are classified under three main groups which are sequence analysis, function analysis and structure analysis (Shrestha, 2010). Firstly, sequence analysis is where the application that analyzes various types of sequence information and it is able to evaluate between similar types of information that are grouped under this category. Secondly, function analysis is where the application is able to examine the function engraved within the sequence and assists in predicting the functional interaction between various genes or protein. Thirdly, structure analysis involves the structure of proteins in which the area of RNA and proteins plays a major role in the interaction with any other thing. Also, it has the ability to predict the structure and the possible roles of the structure of proteins or RNA (Shrestha, 2010). Below indicates the summary of the application of bioinformatics classification.

Table 1:
Summary of the application bioinformatics classification
Sequence Analysis
Function Analysis
Structure Analysis
Sequence Database Searching
Sequence Alignment
Genome Comparison 
Gene & Promoter Prediction
Phylogeny
Motif Discovery
Gene Expression Predicting
Metabolic Pathway Modeling
Protein Interaction Prediction
Protein Sub-cellular    Localization Prediction
Nucleic Acid Structure Prediction
Protein Structure Prediction
Protein Structure Classification
Protein Structure Comparison
*Adapted from Shrestha, R. (2010, June 5). What is Bioinformatics? – A general perspective.

In regards to bioinformatics for pharmacy, it is linked to pharmaceutics bioinformatics. Pharmaceutics bioinformatics is defined as a wide scientific area of computer based technologies, informatics and computational methods that interacts with all areas linked to the discovery and development of drugs (Pharmaceutics Bioinformatics, 2012). Besides, it is a tool for mapping processes of cells and understanding how to use these properties efficiently for the development of new drugs. In other words, pharmaceutics bioinformatics is the solution to rational drug discovery as it decreases the number of trials in the screening of the drug compound (Babu, n.d.). Consequently, pharmaceutics bioinformatics is able to identify potential drug targets for a particular disease via high power computing workstations and software. Also, this kind of applications has lead to a new area in pharmacogenomics using genome sequence in which potential targets for drug development is assumed from the genome sequences (Babu, n.d.). For instance, BLAST is used in relation to pharmacogenomics.
3.0 EXAMPLES OF THE BIOINFORMATICS APPLICATION
3.1 Computer Aided Drug Design
Computer-aided drug design or CADD is specialised software that uses the computational methods to stimulate the drug-receptor interactions (Casey, 2005). Basically, this software mostly depends on the bioinformatics tool, applications and database to it to function. CADD exploits the state of the art technologies to speed the drug development process. Apart from that, this CADD was established on 1900 with the information of the receptor and lock-and-key concept. Further development has been done to improve the quality and function of this CADD. The latest CADD software comes with information of the human genome, bioinformatics, combinatorial chemistry and high-throughput screening.


Basically, CADD using variety of different algorithms and approximations of the binding free energy of chemical compound to a molecular target can be generated in silico. Besides that, CADD allows user to speed up the task of developing new drugs and reduces the cost for the research. Apart from that, CADD enables user for a rapid testing of new, unsynthesised classes of the compounds.

3.2 Rational Drug Design
Rational drug designs are known as focused approach which uses information about the structure of a drug receptor or its natural ligands to identify or create candidate drugs. Basically, three-dimensional structure of a protein can be determined by using methods such as X-ray crystallography or nuclear magnetic resonance spectroscopy (Twyman, 2002). With the presence of this information, the researchers in pharmaceutical industry can use powerful computer programs to search through database containing the structure of many different chemical compounds.
 Rational drug design can be divided into two categories. Firstly, the category A is divided as the development of small molecule with desired properties for targets, biomolecules (protein or nucleic acid), whose functional roles in cellular processes and 3D structural information are known (Soma Mandal, 2009). In addition, this approach in drug design is well established and is being applied extensively by the pharmaceutical industries. The second categories B is development of small molecules with the predefined properties for targets, whose cellular functions and their structural information may be known or unknown (Soma Mandal, 2009). Besides that, knowledge of unknown target (genes and proteins) can be obtained by analysing global gene expression data of samples untreated and treated with a drug using advanced computational tools. Steps related to these two approaches and evaluations of other properties in rational drug design are presented in the following flow charts 1, 2, and 3.








Basically when the target is identified, then both approaches A and B for the development of small molecules requires some examination which stated in the flow chart 3. These aspect includes the evaluation of the binding scores such as affinity and specificity, balance between hydrophobicity and lipophilicity, absorption, distribution, metabolism and excretion (ADME), electrophilic, nucleophilic, and radical attack (biodegradation)(Soma Mandal, 2009). Besides that, evaluation of toxicity of the parent small molecules and products due biotransformation in the different phases of metabolism, quantitative structure–activity relationship (QSAR), and quantitative structure–property relationship (QSPR) respectively.
In addition, designing of the small molecule could be performed initially using computational tools. After the initial evaluation and identification of lead molecules, gene expression profiling and bioinformatics analysis would be particularly important to gain insight in gene expression patterns.  Apart from that, this knowledge can be utilized to improve drugs to accomplish desirable attributes such as disease free survival, eradication of disease, elimination or minimization of toxic side effects, reduction of undesirable biotransformation, improvement in distribution (bioavailability), overcoming of drug resistance, and improvement of immune responses(Soma Mandal, 2009). Therefore, rational drug design would be an integral approach to drug development and discovery.

SANJEEVINI (A Complete Drug Designing Software Suite)


SANJEEVINI (A Complete Drug Designing Software Suite)


    SANJEEVINI software has been developed as a computational pathway paving the way expressly towards automating lead design, making any number of known or new candidate molecules out of a small but versatile set of building blocks called templates, screening them for drug likeness, optimizing their geometry, determine partial atomic charges and assigning other force field parameters (Prof B. Jayaram & Co-workers., 2011). Docking the candidates in the active site of a given biological target , estimating the interaction/binding energy, performing molecular dynamics simulations with explicit solvent and salt on the biomolecule target, the candidate and the complex followed by a rigorous analysis of the binding free energy for further optimization.
     Recently, they have coupled Sanjeevini with AMBER and GAMESS for molecular mechanics and quantum mechanics calculations, respectively. There are total of six modules which makes Sanjeevini a complete drug design software. The source codes for all modules are written in FORTRAN, C and C++ computer languages with numerous interfacial UNIX based shell scripts which makes all the modules work like a pipeline such that output of the previous step becomes the input for the next step. The modules under Sanjeevini can also be used independent of the pathway.







The Six SANJEEVINI Modules

Module 1 : Template Library
Chemical templates are conceived as building blocks/structural frameworks for assembly and generation of new molecules

Module 2 : Molecule Generator
As a step towards de novo lead design, candidates are generated from chemical templates introduced in previous step

Module 3: Molecular Descriptors and drug like filters
A successfully lead discovery strategy must ensure bio-availaibility from the very start in generating leads while eliminating wrong candidates from considerations

Module 4: Molecular Docking
The drug activity is obtained through the molecular binding of one molecule(the ligand) to the active site of another molecular(the receptor), which in majority of cases is a protein. Computer aided methods at this stage involves- Docking and Scoring

Module 5: Energy Minimization of the Resultant Complexes
The structures of the complexes generated above are subjected to energy minimization (1000 steps of hydrogen minimization followed by 2000 steps of all atom minimization) using the SANDER module of the AMBER molecular modelling package (50).
Module 6: Binding Affinity Computations on Energy Minimized Complexes
The statistical mechanics of binding and approximations inherent in elucidating free energies from single points in configuration space are assessed.

                                       



SANJEEVINI Pathway: Active site directed lead compound molecule in silico.








       Apart from that, SANJEEVINI is also is comprehensive active site directed lead compound design software, based on the on-going research in their laboratory. The computational pathway integrates several protocols proceeding from the design of chemical templates to lead-like molecules, given the three dimensional structure of the target protein and a definition of its active site (Prof B. Jayaram & Co-workers., 2011). A conscious attempt has been made to handle the target biomolecule and the candidate drug molecules at the atomic level retaining system independence while providing access for systematic improvements at the force field level. Concerns related to geometry of the molecules, partial atomic charges, docking of candidates in the active site, flexibility and solvent effects are accounted for at the current state-of-the-art. To ensure theoretical rigor, binding free energy estimates are developed for candidate molecules with the target protein within the framework of statistical mechanics

3.3 Pharmacogenomics in Bioinformatics

3.3 Pharmacogenomics in Bioinformatics
Pharmacogenomics is defined as the study of how genes affect a person’s response to drugs. This relatively new field combines pharmacology and genomics to develop effective, safe medications and doses that will be specifically with the different genetic makeup. (Genetics Home Reference, 2015)
Many drugs that are currently available are “one size fits all,” but they don’t work the same way for everyone. It can be difficult to predict who will benefit from a medication, who will not respond at all, and who will get adverse drug. These genetic differences will be used to predict whether a medication will be effective for a particular person and to help prevent adverse drug reactions (Genetics Home Reference, 2015). In the future, pharmacogenomics will allow the development of tailored drugs to treat a wide range of health problems, including cardiovascular disease, cancer, HIV/AIDS, and asthma. Pharmacogenomics relatively used in the development of drug to prevent adverse effect and in order to make the drug more effective and safe.
With the help of bioinformatics tools, rational drug design can be done easily based on the pharmacogenomics where potential protein sequence or DNA or even RNA sequence can be determined. Example of such a software is called BLAST (basic local alignment search tool) dominantly used in the drug development process. Following are uses of BLAST according to Altschul, Stephen; Gish, Warren; Miller, Webb; Myers, Eugene; Lipman, David (1990) :
a)      Identifying species
With the use of BLAST, you can possibly correctly identify a species or find homologous species. This can be useful, for example, when you are working with a DNA sequence from an unknown species.
b)      Locating domains
When working with a protein sequence you can input it into BLAST, to locate known domains within the sequence of interest.
c)      Establishing phylogeny
Using the results received through BLAST you can create a phylogenetic tree using the BLAST web-page. Phylogenies based on BLAST alone are less reliable than other purpose-built computational phylogenetic methods, so should only be relied upon for "first pass" phylogenetic analyses.
d)     DNA mapping
When working with a known species, and looking to sequence a gene at an unknown location, BLAST can compare the chromosomal position of the sequence of interest, to relevant sequences in the database(s).
e)      Comparison
When working with genes, BLAST can locate common genes in two related species, and can be used to map annotations from one organism to another

BLAST used heuristic method to find similar sequences not by comparing either sequence in its entirety, but rather by locating short matches between the two sequences. To run, BLAST requires a query sequence to search for, and a sequence to search against (also called the target sequence) or a sequence database containing multiple such sequences. BLAST will find sub-sequences in the database which are similar to subsequences in the query. In typical usage, the query sequence is much smaller than the database, e.g., the query may be one thousand nucleotides while the database is several billion nucleotides.

The main idea of BLAST is that there are often high-scoring segment pairs (HSP) contained in a statistically significant alignment. BLAST searches for high scoring sequence alignments between the query sequence and sequences in the database using a heuristic approach that approximates the Smith-Waterman algorithm.(Altschul et al  (1990)). Based on the studies of Mount, D. W. (2004) overview of the BLASTP algorithm (a protein to protein search) is as follows :
o   Remove low-complexity region or sequence repeats in the query sequence.
o   Make a k-letter word list of the query sequence.
o   List the possible matching words.
o   Organize the remaining high-scoring words into an efficient search tree.
o   Repeat step 3 to 4 for each k-letter word in the query sequence.
o   Scan the database sequences for exact matches with the remaining high-scoring words.
o   Extend the exact matches to high-scoring segment pair (HSP).
o   List all of the HSPs in the database whose score is high enough to be considered.
o   Evaluate the significance of the HSP score.
o   Make two or more HSP regions into a longer alignment.
o   Show the gapped Smith-Waterman local alignments of the query and each of the matched database sequences.

o   Report every match whose expect score is lower than a threshold parameter E.

4.0 REFERENCES

4.0 REFERENCES
Altschul, Stephen; Gish, Warren; Miller, Webb; Myers, Eugene; Lipman, David (1990). "Basic local alignment search tool". Journal of Molecular Biology 215 (3): 403–410. doi:10.1016/S0022-2836(05)80360-2
Babu, M. M. (n.d.). Bioinformatics – An aidfor biological research. Retrieved June 9, 2015, from Medical Research Council: http://www.mrc-lmb.cam.ac.uk/genomes/madanm/articles/bioinfo.htm
B.E. Biotechnology, Department of Biotechnology, SIT, Tumkur, Karnataka. (2014, September 19). Retrieved June 1, 2015, from Bioinformatics: http://www.slideshare.net/VivekChandraMohanC/bioinformatics-39293400
Bioinformaticsweb.tk. (2005). BIW. Retrieved June 1, 2015, from Bioinformatics definition- A review: http://bioinformaticsweb.net/definition.html
Casey, D. R. (10 May, 2005). Bioinformatics in Computer-Aided Drug Design. Retrieved from Beyenetwork: http://www.b-eye-network.com/view/852
Department of Molecular Biophysics and Biochemistry. (2001). What is bioinformatics? A proposed definition and overview of the field. Retrieved June 9, 2015, from NCBI: http://www.ncbi.nlm.nih.gov/pubmed/11552348
Genetics Home Reference,. (2015). What is pharmacogenomics?. Retrieved 9 June 2015, from http://ghr.nlm.nih.gov/handbook/genomicresearch/pharmacogenomics%20
Howard Hughes Medical Institute. (2012). Textbook overview. Retrieved June 1, 2015, from What is Bioinformatics: http://cseweb.ucsd.edu/~ppevzner/B4B/overview.html
Mount, D. W. (2004). Bioinformatics: Sequence and Genome Analysis (2nd ed.). Cold Spring Harbor Press.ISBN 978-0-8796-9712-9.
Nilges, M., & Linge, J. P. (2010). Unité de Bio–Informatique Structurale, Institut Pasteur. Retrieved June 1, 2015, from Bioinformatics: http://www.pasteur.fr/recherche/unites/Binfs/definition/bioinformatics_definition.html
Pharmaceutics Bioinformatics. (2012). What is pharmaceutics bioinformatics?. Retrieved June 9, 2015, from http://www.pharmbio.org/pages/pharmaceutical-bioinformatics
Prof B. Jayaram & Co-workers. (2011, May 30). SANJEEVINI (A Complete Drug Designing
Software Suit). Retrieved from Sanjeevini Software: http://www.scfbio-      iitd.res.in/sanjeevini.jsp.
 Rai University. (2015, January 10). Retrieved June 1, 2015, from Bioinformatics: http://www.slideshare.net/raiuniversity/bsc-biochem-i-bobi-u1-introduction-to-bioinformatics
Shrestha, R. (2010, June 5). What is Bioinformatics? – A general perspective. Retrieved June 9, 2015, from WordPress.com: https://raunakms.wordpress.com/2010/06/05/what-is-bioinformatics-%E2%80%93-a-general-perspective/
Soma Mandal, M. M. (14 October, 2009). Retrieved from Rational drug design: http://www.udel.edu/chem/bahnson/chem645/Rational_drug_design_Abhijit.pdf
Twyman, R. (27 August, 2002). Retrieved from Rational drug design: http://genome.wellcome.ac.uk/doc_WTD020912.html