STATEMENT OF PURPOSE
bioinformatics and computational biology sop
“I Can, I Will and I Must”
The above words have always been a key point behind my learning. It has helped me frame my targets and reach the goals of my career so far. I have been blessed to have parents who instilled me with good values and principles. They have supported me in accomplishing the goals I have set forth.
With this mindset I gathered knowledge to succeed in my studies and to be one of the best in my class. At the school level I used to question myself as to how could these DNA, proteins and enzyme perform their heavy load of work by being small and tiny molecules. Then I decided that after my school I should pursue my education in biological sciences. My relentless hard work coupled with calibrated self-confidence rewarded me with a distinction of 92.14% in my matriculation exam. My puerile passion and interests towards biology and mathematics turned out into realities when I secured 98% in biology and 99% in mathematics during my 12th grade examinations.
An enormous number of challenges, thrown out by rapid development in Science and technology have made me choose engineering profession. The burning fire in me to study biological science has led me to take Industrial Biotechnology in EDITED college of Technology affiliated to Anna University one of the world renowned university. It was here that my unquenchable thirst for knowledge and desires to succeed were given full form and in the field of biotechnology. I have been exposed to many fields of study like biochemistry, cell biology, bioinformatics, microbiology, molecular biology, genetics, genetic engineering, protein engineering, enzyme engineering, immunology, bioprocess engineering and animal biotechnology. I volunteered to take part in National level seminars and symposia in and outside college on various topics such as ‘Bioleaching’, ‘golden rice’, ‘Anti-sense technology’ and ‘Starch blockers’ where I could bring out my presentation and oratorical skills in my reports and speeches. In the second year of study I fermented wine from grape juice and exhibited the fermenter model in an All India Conference of Indian Society for Technical Education.
I spent a lot of time in reading journals like ‘Nature’, ‘Advanced Biotechnology’, ‘Phytochemistry’ and ‘Phyotomedicine’ and I made a question to myself that why cant I being a student in a well renowned college do innovative project and publish it in international journals. Thus I started thinking for an innovative project and from my journal and magazine studies I came to know that many of the synthetic drugs used for the treatment of diabetes causes severe side affects and toxicity. So, I proposed the way of using medicinal plants which does not have side effects for the treatments of diabetes and my idea was welcomed by my professors in our college. I started my mini-project tiled “Screening of alpha-Amylase inhibitor from medicinal plants” and my objective was to slow down the pancreatic alpha-amylase enzyme activity which is responsible for the break down of dietary starches into glucose so that we can reduce the usual rise in blood sugar (hyperglycemia) levels of both healthy and diabetic people. Though beginning months of my work met with series failures my professors words that “every failure is a lesson” encouraged me and at my third year end I have successfully screened out medicinal plants Syzygium cumini Linn seeds, Cassia auriculata and Ocimum sanctum having the anti-diabetic property. I have learnt both from my victories and failures.
In final year of study my proposal to identify the structure of anti-diabetic compounds in the medicinal plants has been approved and financially supported by TNSCST (Tamilnadu state council for science and technology) and I feel very happy to say that my project is the only project in our college to get the approval and financial support from TNSCST. With a strong zeal I started up the final year project titled “Identification of starch blocker in medicinal plants” with a team of two members and it is guided by three experienced professors Dr.S.Edited, Dr.T.Edited and Dr.B.Edited. I have gained the practical knowledge while working along with those experienced researchers and at the end I have successfully found the structure of anti-diabetic compounds in the medicinal plants and I have sent two of my research papers to international journals, Journal of Enzyme Inhibition & Medicinal Chemistry and International journal of phytomedicine respectively and it is awaiting for publication and for all my diligence work I received the TNSCST AWARD in our college. I am very proud and satiated with the hope my paper will publish in the international journals and make my dreams to come true.
It is my conviction that a perfect professional is not born merely out of reading books but also through the acquisition of knowledge from various other sources. To gain my practical exposure in the field of biotechnology, I have undergone in-plant training and worked on project titled “Biochemical characterization of cellulase in SPIC (Southern petrochemical industries corporation limited)”, summer training in Pasteur Institute of India, one of the leading vaccine production industry in India and trained in effluent treatment plant in TNPL (TamilNadu Newsprint and Papers Limited). I also had a day interaction with the scientist at IARI (Indian Agricultural Research Institute) on “Wheat Improvement Programme”. Apart from this as a part of our college curriculum we have visited leading biotech companies in India such as Biocon , Shantha biotech and Genei.
I feel that extra curricular activities are as important as studies in estimating a person's overall capacity. So I became a member of N.S.S of our college which helped me to help the poor. I was an active member of Yoga and Indian society of technical education (ISTE)
While finding the structure of the compounds in my project I encountered with the tools of bioinformatics and I feel that the wet lab analysis of the research can be made easier and faster by using the bioinformatics tools. I understand bioinformatics & computational biology as a general approach toward the solution of scientific problems. This scientific quest in understanding the complexities and orders in the biological systems has made me opt for the bioinformatics & computational biology graduate programme.
The tarmac for attaining my target is well paved right from my schooling and so am I here desiring for a Masters program in the Arizona state university-which has a stress on research studies on various nuances of Biotechnology. I was inspired by the vastness of research those professors in this department do. Dr. Lokesh Joshi’s research on posttranslational modification of biomolecules and the work of Dr. Vincent B. Pizziconi on bioresponsive and biomimetic materials cannot go without saying and really looking forward to be a part of it.
I feel that graduate study at your University will be the most logical extension of my academic pursuits and a major step towards achieving my objectives. I would be grateful to you if I am given the opportunity to pursue my graduate studies with financial assistance at your institution and am able to justify your faith in me. I look forward to satiate my evergreen forte and passions starting at Arizona.
(Edited)
Statement of Purpose (sop)for information Technology, Sop for mba, Sample sop for Ms biotechnology, PhD molecular Biology, Statement of purpose for Computer Science, Business management, MBA, international finance, bioinformatics, computerscience, Sample sop MS, PhD, Electrical engineering ...
Sample SOP
Friday, August 7, 2009
Bio informatics sop
Bio informatics
Statement of purpose
It was about 10:30 at night, and except for a small desk lamp and the glow from my computer monitor, it was dark. The genome of the photosynthetic bacteria Prochloroccocus marinus, responsible for providing nearly 40% of the world's energy needs, had just been sequenced. I loaded the draft version of the genbank file into my software pipeline I spent the last two years developing. Minutes later, a genome-scale reconstruction of the P. marinus metabolism appeared on my screen. In that moment, I knew I was the first person to view the entire metabolism of this cyanobacteria. I remember the chills going down my spine, the palpitation of my heart, and the rush that comes only from breathing the rarified air of discovery.
The original motivation for this research came from a paper written in 1999 called "Towards Metabolic Phenomics: An analysis of Genomics using Flux Balances". In this paper, Schilling, Edwards and Palsson noted that with the rapid completion of bacterial genomes, ORFeomes and proteomes, we are therefore at the brink of having a complete 'part catalog' of many organisms. Based on that observation, they predicted our ability to understand the complex relationship between genotype and phenotype will be limited "not by the data, but by our tools to analyze and interpret this data." Finally, they proposed a bioinformatics pipeline to automatically generate metabolic flux models from an annotated genome, arguing that a rigorous constraint-based analysis of these models would enable us to iteratively refine our knowledge.
Inspired by this argument, I implementated their proposal for the Church Lab at Harvard Medical School, enabling us to generate experimentally verifiable flux predictions based on different hypotheses for bacterial growth. This work is described in a paper we published with the ungainly title of "From annotated genomes to metabolic flux models and kinetic parameter fitting." Based on this experience, I learned several surprising and valuable lessons.
Lesson 1: "A good representation is the key to good problem solving" --Patrick Winston
Although these words were said in the context of problems in Artificial Intelligence, the principle applies directly to the problem of mapping genome annotations to metabolic flux models. Such a mapping requires a rich ontology capable of representing the subtle relationships between genes, proteins, enzymes, biochemical reactions, and metabolic pathways. Using the representation underlying SRI's BioCyc database, I was able to develop a bioinformatics pipeline to generate metabolic flux models directly from an annotated genome, perform consistency checks on the data using their powerful query language, and represent the metabolic flux models in a form that could be analyzed using Flux Balance Analysis (FBA) and Minimization of Metabolic Adjustment (MOMA).
Lesson 2: "Standard is better than best" --Gerald J. Sussman
Because license restrictions on the BioCyc database prevented me from publishing most of the models I generated, I decided to collaborate with SRI to develop an open standard for the representation of metabolic pathways called BioPAX. Because we plan to develop open source semantic web technologies to infer metabolic flux models from annotated genomes, aggregate pathways from multiple data sources, and perform consistency checks on the pathway data, we decided to use the W3C recommended web ontology language (OWL) to represent the BioPAX ontology. According to the Pathway resource list at http://biopax.org, over 150 biological pathway databases currently exist. However, to consolidate all this knowledge for a particular organism, it is necessary to extract the pathways from each database, transform each pathway into a standard data representation, and load the data into a repository. As part of the BioPAX working group, which developed the BioPAX ontology to facilitate this goal, I now direct a community effort to extract, transform and load metabolic pathways into BioPAX.
Lesson 3: "The great thing about standards is that there are so many from which to choose" -- unknown
Until recently, the method for exchanging metabolic flux models has been a hodgepodge of spreadsheets, flatfiles, and binary images. It was impossible to recover the information about how a model was developed from this data, and in many cases, the semantic interpretation of the model was open to question. Interoperability consisted of writing converters between different kinds of metabolic analysis tools, each of which expected a different format. To address these issues, we adopted the Systems Biology Markup Language (SBML) to standardize the representation of our models. By consolidating our tool set around a standard, we could focus on simulations rather than data manipulations. Furthermore, by using BioPAX metadata to annotate SBML, each metabolic pathway can be traced back to the database from which it came.
Lesson 4: "Six weeks in the laboratory can save you six minutes at the computer" --Tom Knight
Of course, the only result these efforts accomplished was to shift the rate limiting step back to the generation of experimental data. At a recent Genome annotation meeting in Washington D.C., Peter Karp showed nearly 40% of the known biochemical reactions have orphaned enzymes. Even for E. coli, over one hundred enzymes responsible for catalyzing known biochemical reactions have an unknown sequence. This finding exposes the fragile state of the underlying genomic infrastructure. It is simply not true that we have the "part catalog" of many organisms in hand when nearly 40% of known biochemical function space is inaccessible to sequence homology searches like BLAST. What is needed is a call to arms from the biochemistry labs. An all-hands-on-deck approach to fill the gaps in our knowledge. The goal is straightforward: to make it truly possible to generate complete and consistent models derived entirely from the genome. Attainment of such a goal will finally bridge the gap between functional genomics data and system models.
Lesson 5: "Above all, one must have a feeling for the organism" --Barbara McClintock
In the Introductory Systems Biology course I am co-developing for Harvard's Department of Molecular Cell Biology, we plan to show our students a movie of a neutrophil chasing a bacteria, eventually engulfing it. The goal of the course is to use modeling and simulation to understand the complex regulatory mechanisms involved in chemotaxis. By developing mathematical insights into the model, we will teach our students how to develop intuitions about what is necessary to model in detail and what is appropriate to abstract. More importantly, by systematically and rigorously testing the knowledge inferred by these models, we want them to gain a feeling for the organism.
Longer term career goals center around the representation, integration, modeling and simulation of biochemical pathways to elucidate the complex relationship between genotype and phenotype.
Statement of purpose
It was about 10:30 at night, and except for a small desk lamp and the glow from my computer monitor, it was dark. The genome of the photosynthetic bacteria Prochloroccocus marinus, responsible for providing nearly 40% of the world's energy needs, had just been sequenced. I loaded the draft version of the genbank file into my software pipeline I spent the last two years developing. Minutes later, a genome-scale reconstruction of the P. marinus metabolism appeared on my screen. In that moment, I knew I was the first person to view the entire metabolism of this cyanobacteria. I remember the chills going down my spine, the palpitation of my heart, and the rush that comes only from breathing the rarified air of discovery.
The original motivation for this research came from a paper written in 1999 called "Towards Metabolic Phenomics: An analysis of Genomics using Flux Balances". In this paper, Schilling, Edwards and Palsson noted that with the rapid completion of bacterial genomes, ORFeomes and proteomes, we are therefore at the brink of having a complete 'part catalog' of many organisms. Based on that observation, they predicted our ability to understand the complex relationship between genotype and phenotype will be limited "not by the data, but by our tools to analyze and interpret this data." Finally, they proposed a bioinformatics pipeline to automatically generate metabolic flux models from an annotated genome, arguing that a rigorous constraint-based analysis of these models would enable us to iteratively refine our knowledge.
Inspired by this argument, I implementated their proposal for the Church Lab at Harvard Medical School, enabling us to generate experimentally verifiable flux predictions based on different hypotheses for bacterial growth. This work is described in a paper we published with the ungainly title of "From annotated genomes to metabolic flux models and kinetic parameter fitting." Based on this experience, I learned several surprising and valuable lessons.
Lesson 1: "A good representation is the key to good problem solving" --Patrick Winston
Although these words were said in the context of problems in Artificial Intelligence, the principle applies directly to the problem of mapping genome annotations to metabolic flux models. Such a mapping requires a rich ontology capable of representing the subtle relationships between genes, proteins, enzymes, biochemical reactions, and metabolic pathways. Using the representation underlying SRI's BioCyc database, I was able to develop a bioinformatics pipeline to generate metabolic flux models directly from an annotated genome, perform consistency checks on the data using their powerful query language, and represent the metabolic flux models in a form that could be analyzed using Flux Balance Analysis (FBA) and Minimization of Metabolic Adjustment (MOMA).
Lesson 2: "Standard is better than best" --Gerald J. Sussman
Because license restrictions on the BioCyc database prevented me from publishing most of the models I generated, I decided to collaborate with SRI to develop an open standard for the representation of metabolic pathways called BioPAX. Because we plan to develop open source semantic web technologies to infer metabolic flux models from annotated genomes, aggregate pathways from multiple data sources, and perform consistency checks on the pathway data, we decided to use the W3C recommended web ontology language (OWL) to represent the BioPAX ontology. According to the Pathway resource list at http://biopax.org, over 150 biological pathway databases currently exist. However, to consolidate all this knowledge for a particular organism, it is necessary to extract the pathways from each database, transform each pathway into a standard data representation, and load the data into a repository. As part of the BioPAX working group, which developed the BioPAX ontology to facilitate this goal, I now direct a community effort to extract, transform and load metabolic pathways into BioPAX.
Lesson 3: "The great thing about standards is that there are so many from which to choose" -- unknown
Until recently, the method for exchanging metabolic flux models has been a hodgepodge of spreadsheets, flatfiles, and binary images. It was impossible to recover the information about how a model was developed from this data, and in many cases, the semantic interpretation of the model was open to question. Interoperability consisted of writing converters between different kinds of metabolic analysis tools, each of which expected a different format. To address these issues, we adopted the Systems Biology Markup Language (SBML) to standardize the representation of our models. By consolidating our tool set around a standard, we could focus on simulations rather than data manipulations. Furthermore, by using BioPAX metadata to annotate SBML, each metabolic pathway can be traced back to the database from which it came.
Lesson 4: "Six weeks in the laboratory can save you six minutes at the computer" --Tom Knight
Of course, the only result these efforts accomplished was to shift the rate limiting step back to the generation of experimental data. At a recent Genome annotation meeting in Washington D.C., Peter Karp showed nearly 40% of the known biochemical reactions have orphaned enzymes. Even for E. coli, over one hundred enzymes responsible for catalyzing known biochemical reactions have an unknown sequence. This finding exposes the fragile state of the underlying genomic infrastructure. It is simply not true that we have the "part catalog" of many organisms in hand when nearly 40% of known biochemical function space is inaccessible to sequence homology searches like BLAST. What is needed is a call to arms from the biochemistry labs. An all-hands-on-deck approach to fill the gaps in our knowledge. The goal is straightforward: to make it truly possible to generate complete and consistent models derived entirely from the genome. Attainment of such a goal will finally bridge the gap between functional genomics data and system models.
Lesson 5: "Above all, one must have a feeling for the organism" --Barbara McClintock
In the Introductory Systems Biology course I am co-developing for Harvard's Department of Molecular Cell Biology, we plan to show our students a movie of a neutrophil chasing a bacteria, eventually engulfing it. The goal of the course is to use modeling and simulation to understand the complex regulatory mechanisms involved in chemotaxis. By developing mathematical insights into the model, we will teach our students how to develop intuitions about what is necessary to model in detail and what is appropriate to abstract. More importantly, by systematically and rigorously testing the knowledge inferred by these models, we want them to gain a feeling for the organism.
Longer term career goals center around the representation, integration, modeling and simulation of biochemical pathways to elucidate the complex relationship between genotype and phenotype.
Statement of Purpose phd
Statement of Purpose PhD
Name: Jing Xia
Date of Birth: Sep. 21st 1983
Kansas State University
I have been a student in the graduate program at Kansas State University (K-State) for more than two
years now. Together with the years I was a student before coming to K-State, the time I spent in school so
far may sound as a very long time. However, I do believe this is just the starting point for my career as a
researcher. In fact, I see myself as a perpetual learner. I have a strong interest in sciences, being especially
fascinated by research fields such as algorithms, artificial intelligence and computational biology. For the
past two years, I have spent as much time as I could on acquiring knowledge and getting ready for Ph.D.
Therefore, I am now applying for admission to the Ph.D. program in your department. I believe my M.S.
experiences at K-State have prepared me well for new experiences as a Ph.D. student.
My original motivation for research in bioinformatics has come from my desire to understand the disease
of cancer through computational approaches. The “cancer problem” is a comprehensive and sophisticated
problem. Despite the fact that it has been extensively researched, this problem is still a big challenge for
science. The complexity of this problem motivated me to learn and review the literatures on cancer.
During my M.S. studies at K-State, I first tried to find areas and courses relevant to cancer research. I
have taken 43 credits in graduate courses as part of my master M.S. studies. Not only that I have learned a
lot from the courses I have taken, but also some of them gave me the opportunity to start doing some
research. My M.S. thesis started as a class project for the Introduction to Bioinformatics course that I took
at K-State in spring 2007. In the Artificial Intelligence course that I took in fall 2006, my course project
was focused on building an ontology for proteins and protein interactions. To improve my biological
knowledge, I have also audited Modern Genetics in the Division of Biology for two semesters. The
Probability Theory class that I took in the Mathematics Department was my favorite harvest during the
last summer vacation. If PhD preliminary exams are considered as proof of my breadth knowledge, I also
passed all the PhD preliminary exams required at K-State (six sub exams: programming language (highly
pass), operating system, database, algorithm and formal language) on May 2008.
Bioinformatics and computational biology are topics most relevant to my major and meet my motivation.
This is why I took the Introduction to Bioinformatics course and tried to identify a cancer-related research
topic. Pre-mRNA alternative splicing (AS) was the research topic that I chose for my class project
because I learned that alternative splicing of pre-mRNA produces isoforms of transcripts, and these
isoforms eventually form different types of proteins. Furthermore, some disorders of proteins caused by
alternative splicing result in cancer eventually. The starting point of the project was to identify alternative
splicing events computationally by aligning ESTs to the genome in the model organism, Tribolium
castaneum. Second, I tried state-of-art approaches, such as machine learning methods, to address the
problem of predicting alternative splicing events, by taking into account biological features that are
known to be relevant for alternative splicing. Following this motivation, I did research and literature
review on the biological background of alternative splicing. I learned that biological analyses focus on
exploring the relationships between alternative splicing of certain genes and sequence information,
secondary structure and other biological features of these genes. Thus, my attempt was to try to include
these features into computational analyses. We succeeded in finishing this project and the results were
included in a proposal (1) and will be part of a journal paper (in preparation) (2). In my M.S. thesis work
(3), I also used machine learning, specifically, Support Vector Machines, to answer the question about
how to predict whether pre-mRNA undergoes alternative splicing. The main contribution I made consists
of the exploration of a comprehensive biological set of features relevant to the alternative splicing
problem. A paper related to motifs derivation for alternative splicing was published at the IEEE BIBM
Conference, 2008 (4).
My current research begins with ideas I got while attending a course on probability theory. Among other
things, I learned about the “random walk” in that course. I researched topics that use the random walk
idea and found that it has been widely used, including in the famous algorithm, Page-Rank. Following my
“discoveries”, I did read a book about the basic theory underlying random walks, specifically a book
about Markov chains and I learned that a Markov chain is a powerful tool for modeling stochastic
processes and time series. This made me think of an approach for solving the steady distribution of the
Markov chain and for applying it to new areas. I learned that this distribution has been widely used as a
measurement for graphs, such as Page Rank. Learning that the problem can be reduced to the problem of
finding the eigenvectors of a matrix, I started to read the book on Matrix Computation written by Gene
Golub and did think about how to apply these general approaches to Markov chains. Meanwhile, I read an
overview for solving the distribution of a Markov chain written by Dianne O'Leary. Further, I did more
research for specific solutions to Markov chains and I found William Stewart’s book, Introduction to
Numeric Solution to Markov Chains. Reading this book, I found that many approaches in that have been
applied to speed up the calculation of Page-Rank. My research starts again by trying to find a better way
to put a matrix into the nearly completely decomposable (NCD) form, find an application for the method
and formalize the problem. These are the motivations to my research, and I would like to learn new
knowledge related to these problems and delve into scientific problems.
Meanwhile, I also read the book, Statistical Learning and Pattern Recognition written by C. Bishop. Last
semester, I discussed with Rodney R. Howell about the problem of asymptotic notation with multiple
variables discussed in his unpublished paper (5) and read the unpublished paper (6) written by Charles E.
Leiserson on the same topic. I am also interested in machine learning theory and studied lectures slides of
machine learning classes taught by Avrim Blum.
For future research, I do have a strong interest in a broad area of computer science, and I would like to
learn more, especially about reasoning and learning in artificial intelligence, and about the theory of
algorithms. In addition to the desire to acquire fundamental knowledge of computer science, my dream is
to apply computer science to problems in life sciences. For long time, my goal has not varied at all, even
when sometimes I was frustrated by not being able to learn things more efficiently. Faced with frustration
and obstacles, my credo is to study harder, more diligently and never and ever to give up what I want to
pursue. I am determined to dedicate myself to science as much as I can. I entreat your department to offer
me the opportunity to make it possible for me to realize my dream!
1. Caragea, D; Brown, S; Park, Y.S.; J., Xia. Genomics Studies on Arthropod Affecting Human, Animals
and Plant Health.
2. Kim, S.H.; Xia, J.; Caragea, D.; Brown, S. BeetleBase: the model organism database for Tribolium
castaneum - 2008 Update. s.l. : Nucleotide Acids Research, Database Issue, 2009.
3. Xia, J. MSc. Thesis, Bioinformatics Analyses of Alternative Splicing, EST-based and Machine
Learning-Based Prediction,. USA : CIS Department. Kansas State University, Sep. 2008.
4. Xia, J., Caragea, D. and Brown, S. Exploring Alternative Splicing Features using Support Vector
Machines. Philadelphia, PA. : regular paper, in proceedings of IEEE on Bioinformatics and Biomedicine
(BIBM), Nov. 2008.
5. Howell, Rodney R. On Asymptotic Notation with Multiple Variables. KSU : Technical Report, 2007-4.
6. Demaine, Erik D. and Leiserson, Charles E. Mulitivariate Asymptotic Notation: O No! Apr.2nd, 2008.
Name: Jing Xia
Date of Birth: Sep. 21st 1983
Kansas State University
I have been a student in the graduate program at Kansas State University (K-State) for more than two
years now. Together with the years I was a student before coming to K-State, the time I spent in school so
far may sound as a very long time. However, I do believe this is just the starting point for my career as a
researcher. In fact, I see myself as a perpetual learner. I have a strong interest in sciences, being especially
fascinated by research fields such as algorithms, artificial intelligence and computational biology. For the
past two years, I have spent as much time as I could on acquiring knowledge and getting ready for Ph.D.
Therefore, I am now applying for admission to the Ph.D. program in your department. I believe my M.S.
experiences at K-State have prepared me well for new experiences as a Ph.D. student.
My original motivation for research in bioinformatics has come from my desire to understand the disease
of cancer through computational approaches. The “cancer problem” is a comprehensive and sophisticated
problem. Despite the fact that it has been extensively researched, this problem is still a big challenge for
science. The complexity of this problem motivated me to learn and review the literatures on cancer.
During my M.S. studies at K-State, I first tried to find areas and courses relevant to cancer research. I
have taken 43 credits in graduate courses as part of my master M.S. studies. Not only that I have learned a
lot from the courses I have taken, but also some of them gave me the opportunity to start doing some
research. My M.S. thesis started as a class project for the Introduction to Bioinformatics course that I took
at K-State in spring 2007. In the Artificial Intelligence course that I took in fall 2006, my course project
was focused on building an ontology for proteins and protein interactions. To improve my biological
knowledge, I have also audited Modern Genetics in the Division of Biology for two semesters. The
Probability Theory class that I took in the Mathematics Department was my favorite harvest during the
last summer vacation. If PhD preliminary exams are considered as proof of my breadth knowledge, I also
passed all the PhD preliminary exams required at K-State (six sub exams: programming language (highly
pass), operating system, database, algorithm and formal language) on May 2008.
Bioinformatics and computational biology are topics most relevant to my major and meet my motivation.
This is why I took the Introduction to Bioinformatics course and tried to identify a cancer-related research
topic. Pre-mRNA alternative splicing (AS) was the research topic that I chose for my class project
because I learned that alternative splicing of pre-mRNA produces isoforms of transcripts, and these
isoforms eventually form different types of proteins. Furthermore, some disorders of proteins caused by
alternative splicing result in cancer eventually. The starting point of the project was to identify alternative
splicing events computationally by aligning ESTs to the genome in the model organism, Tribolium
castaneum. Second, I tried state-of-art approaches, such as machine learning methods, to address the
problem of predicting alternative splicing events, by taking into account biological features that are
known to be relevant for alternative splicing. Following this motivation, I did research and literature
review on the biological background of alternative splicing. I learned that biological analyses focus on
exploring the relationships between alternative splicing of certain genes and sequence information,
secondary structure and other biological features of these genes. Thus, my attempt was to try to include
these features into computational analyses. We succeeded in finishing this project and the results were
included in a proposal (1) and will be part of a journal paper (in preparation) (2). In my M.S. thesis work
(3), I also used machine learning, specifically, Support Vector Machines, to answer the question about
how to predict whether pre-mRNA undergoes alternative splicing. The main contribution I made consists
of the exploration of a comprehensive biological set of features relevant to the alternative splicing
problem. A paper related to motifs derivation for alternative splicing was published at the IEEE BIBM
Conference, 2008 (4).
My current research begins with ideas I got while attending a course on probability theory. Among other
things, I learned about the “random walk” in that course. I researched topics that use the random walk
idea and found that it has been widely used, including in the famous algorithm, Page-Rank. Following my
“discoveries”, I did read a book about the basic theory underlying random walks, specifically a book
about Markov chains and I learned that a Markov chain is a powerful tool for modeling stochastic
processes and time series. This made me think of an approach for solving the steady distribution of the
Markov chain and for applying it to new areas. I learned that this distribution has been widely used as a
measurement for graphs, such as Page Rank. Learning that the problem can be reduced to the problem of
finding the eigenvectors of a matrix, I started to read the book on Matrix Computation written by Gene
Golub and did think about how to apply these general approaches to Markov chains. Meanwhile, I read an
overview for solving the distribution of a Markov chain written by Dianne O'Leary. Further, I did more
research for specific solutions to Markov chains and I found William Stewart’s book, Introduction to
Numeric Solution to Markov Chains. Reading this book, I found that many approaches in that have been
applied to speed up the calculation of Page-Rank. My research starts again by trying to find a better way
to put a matrix into the nearly completely decomposable (NCD) form, find an application for the method
and formalize the problem. These are the motivations to my research, and I would like to learn new
knowledge related to these problems and delve into scientific problems.
Meanwhile, I also read the book, Statistical Learning and Pattern Recognition written by C. Bishop. Last
semester, I discussed with Rodney R. Howell about the problem of asymptotic notation with multiple
variables discussed in his unpublished paper (5) and read the unpublished paper (6) written by Charles E.
Leiserson on the same topic. I am also interested in machine learning theory and studied lectures slides of
machine learning classes taught by Avrim Blum.
For future research, I do have a strong interest in a broad area of computer science, and I would like to
learn more, especially about reasoning and learning in artificial intelligence, and about the theory of
algorithms. In addition to the desire to acquire fundamental knowledge of computer science, my dream is
to apply computer science to problems in life sciences. For long time, my goal has not varied at all, even
when sometimes I was frustrated by not being able to learn things more efficiently. Faced with frustration
and obstacles, my credo is to study harder, more diligently and never and ever to give up what I want to
pursue. I am determined to dedicate myself to science as much as I can. I entreat your department to offer
me the opportunity to make it possible for me to realize my dream!
1. Caragea, D; Brown, S; Park, Y.S.; J., Xia. Genomics Studies on Arthropod Affecting Human, Animals
and Plant Health.
2. Kim, S.H.; Xia, J.; Caragea, D.; Brown, S. BeetleBase: the model organism database for Tribolium
castaneum - 2008 Update. s.l. : Nucleotide Acids Research, Database Issue, 2009.
3. Xia, J. MSc. Thesis, Bioinformatics Analyses of Alternative Splicing, EST-based and Machine
Learning-Based Prediction,. USA : CIS Department. Kansas State University, Sep. 2008.
4. Xia, J., Caragea, D. and Brown, S. Exploring Alternative Splicing Features using Support Vector
Machines. Philadelphia, PA. : regular paper, in proceedings of IEEE on Bioinformatics and Biomedicine
(BIBM), Nov. 2008.
5. Howell, Rodney R. On Asymptotic Notation with Multiple Variables. KSU : Technical Report, 2007-4.
6. Demaine, Erik D. and Leiserson, Charles E. Mulitivariate Asymptotic Notation: O No! Apr.2nd, 2008.
statement of purpose bioinformatics sample
Sample statement of purpose bioinformatics sop
Research Interests
“We wish to suggest a structure for the salt of deoxyribose nucleic acid (DNA); this structure has novel features which are of considerable biological interest”
February 1953, Francis Crick
Given the experiences that I have had with Combinatorial Optimization, Evolutionary Computing and Molecular Biology, I am very interested in bioinformatics. As a relative young area, the area of bioinformatics offers a wealth of research opportunities.
My main focus is located in proteins. Proteins are the building blocks and motor molecules of every cell. Proteins are formed by twenty amino acids, where this sequence is called the primary structure of the protein. This primary structure specifies the folding pattern that the protein will have, this information can be obtained by sequencing its DNA, nonetheless predicting the folding of proteins is extremely difficult. Nowadays the methods used to predict this conformation are X Ray Crystallography and Nuclear Magnetic Resonance being both computationally expensive. Here arises one of my main research interest, how to predict accurately the folding of a protein? Besides the non covalent bonds between amino acids affecting this folding what other forces are present? How is the feature space for each probable folding pattern?
Another interesting issue is the identification of active sites in proteins. To identify active sites in proteins there have been followed two main approaches, the first one called structural is based on the finding of cleft or crevice zones in the protein structure. It is well known that the active site of proteins is located into those zones. The second approach based on the sequence of the protein, has as its goal to compare the sequence of the protein against other structure-well-known protein sequences, using some variants of the string matching algorithms.
This last summer I worked using this last approach, looking for phylogenetic motifs in proteins. A motif is a region of DNA highly conserved and the term phylogeny is related to an evolutionary relation between organisms. Using these two concepts it is possible to find those regions of the protein sequence that are most likely to be active.
As can be inferred, some problems in bioinformatics can be seen as combinatorial problems, here arises another of my interests.
Besides the topics above mentioned, I am very interested in the algorithm design area, and it is here where is located the methodology I will use to tackle these problems. The area of meta-heuristics is so wide and potent that I would like to explore some search algorithms for these problems, not only any algorithm but also an efficient one.
Sample statement of purpose
Research Interests
“We wish to suggest a structure for the salt of deoxyribose nucleic acid (DNA); this structure has novel features which are of considerable biological interest”
February 1953, Francis Crick
Given the experiences that I have had with Combinatorial Optimization, Evolutionary Computing and Molecular Biology, I am very interested in bioinformatics. As a relative young area, the area of bioinformatics offers a wealth of research opportunities.
My main focus is located in proteins. Proteins are the building blocks and motor molecules of every cell. Proteins are formed by twenty amino acids, where this sequence is called the primary structure of the protein. This primary structure specifies the folding pattern that the protein will have, this information can be obtained by sequencing its DNA, nonetheless predicting the folding of proteins is extremely difficult. Nowadays the methods used to predict this conformation are X Ray Crystallography and Nuclear Magnetic Resonance being both computationally expensive. Here arises one of my main research interest, how to predict accurately the folding of a protein? Besides the non covalent bonds between amino acids affecting this folding what other forces are present? How is the feature space for each probable folding pattern?
Another interesting issue is the identification of active sites in proteins. To identify active sites in proteins there have been followed two main approaches, the first one called structural is based on the finding of cleft or crevice zones in the protein structure. It is well known that the active site of proteins is located into those zones. The second approach based on the sequence of the protein, has as its goal to compare the sequence of the protein against other structure-well-known protein sequences, using some variants of the string matching algorithms.
This last summer I worked using this last approach, looking for phylogenetic motifs in proteins. A motif is a region of DNA highly conserved and the term phylogeny is related to an evolutionary relation between organisms. Using these two concepts it is possible to find those regions of the protein sequence that are most likely to be active.
As can be inferred, some problems in bioinformatics can be seen as combinatorial problems, here arises another of my interests.
Besides the topics above mentioned, I am very interested in the algorithm design area, and it is here where is located the methodology I will use to tackle these problems. The area of meta-heuristics is so wide and potent that I would like to explore some search algorithms for these problems, not only any algorithm but also an efficient one.
Sample statement of purpose
Subscribe to:
Posts (Atom)