Statement of Purpose PhD
Name: Jing Xia
Date of Birth: Sep. 21st 1983
Kansas State University
I have been a student in the graduate program at Kansas State University (K-State) for more than two
years now. Together with the years I was a student before coming to K-State, the time I spent in school so
far may sound as a very long time. However, I do believe this is just the starting point for my career as a
researcher. In fact, I see myself as a perpetual learner. I have a strong interest in sciences, being especially
fascinated by research fields such as algorithms, artificial intelligence and computational biology. For the
past two years, I have spent as much time as I could on acquiring knowledge and getting ready for Ph.D.
Therefore, I am now applying for admission to the Ph.D. program in your department. I believe my M.S.
experiences at K-State have prepared me well for new experiences as a Ph.D. student.
My original motivation for research in bioinformatics has come from my desire to understand the disease
of cancer through computational approaches. The “cancer problem” is a comprehensive and sophisticated
problem. Despite the fact that it has been extensively researched, this problem is still a big challenge for
science. The complexity of this problem motivated me to learn and review the literatures on cancer.
During my M.S. studies at K-State, I first tried to find areas and courses relevant to cancer research. I
have taken 43 credits in graduate courses as part of my master M.S. studies. Not only that I have learned a
lot from the courses I have taken, but also some of them gave me the opportunity to start doing some
research. My M.S. thesis started as a class project for the Introduction to Bioinformatics course that I took
at K-State in spring 2007. In the Artificial Intelligence course that I took in fall 2006, my course project
was focused on building an ontology for proteins and protein interactions. To improve my biological
knowledge, I have also audited Modern Genetics in the Division of Biology for two semesters. The
Probability Theory class that I took in the Mathematics Department was my favorite harvest during the
last summer vacation. If PhD preliminary exams are considered as proof of my breadth knowledge, I also
passed all the PhD preliminary exams required at K-State (six sub exams: programming language (highly
pass), operating system, database, algorithm and formal language) on May 2008.
Bioinformatics and computational biology are topics most relevant to my major and meet my motivation.
This is why I took the Introduction to Bioinformatics course and tried to identify a cancer-related research
topic. Pre-mRNA alternative splicing (AS) was the research topic that I chose for my class project
because I learned that alternative splicing of pre-mRNA produces isoforms of transcripts, and these
isoforms eventually form different types of proteins. Furthermore, some disorders of proteins caused by
alternative splicing result in cancer eventually. The starting point of the project was to identify alternative
splicing events computationally by aligning ESTs to the genome in the model organism, Tribolium
castaneum. Second, I tried state-of-art approaches, such as machine learning methods, to address the
problem of predicting alternative splicing events, by taking into account biological features that are
known to be relevant for alternative splicing. Following this motivation, I did research and literature
review on the biological background of alternative splicing. I learned that biological analyses focus on
exploring the relationships between alternative splicing of certain genes and sequence information,
secondary structure and other biological features of these genes. Thus, my attempt was to try to include
these features into computational analyses. We succeeded in finishing this project and the results were
included in a proposal (1) and will be part of a journal paper (in preparation) (2). In my M.S. thesis work
(3), I also used machine learning, specifically, Support Vector Machines, to answer the question about
how to predict whether pre-mRNA undergoes alternative splicing. The main contribution I made consists
of the exploration of a comprehensive biological set of features relevant to the alternative splicing
problem. A paper related to motifs derivation for alternative splicing was published at the IEEE BIBM
Conference, 2008 (4).
My current research begins with ideas I got while attending a course on probability theory. Among other
things, I learned about the “random walk” in that course. I researched topics that use the random walk
idea and found that it has been widely used, including in the famous algorithm, Page-Rank. Following my
“discoveries”, I did read a book about the basic theory underlying random walks, specifically a book
about Markov chains and I learned that a Markov chain is a powerful tool for modeling stochastic
processes and time series. This made me think of an approach for solving the steady distribution of the
Markov chain and for applying it to new areas. I learned that this distribution has been widely used as a
measurement for graphs, such as Page Rank. Learning that the problem can be reduced to the problem of
finding the eigenvectors of a matrix, I started to read the book on Matrix Computation written by Gene
Golub and did think about how to apply these general approaches to Markov chains. Meanwhile, I read an
overview for solving the distribution of a Markov chain written by Dianne O'Leary. Further, I did more
research for specific solutions to Markov chains and I found William Stewart’s book, Introduction to
Numeric Solution to Markov Chains. Reading this book, I found that many approaches in that have been
applied to speed up the calculation of Page-Rank. My research starts again by trying to find a better way
to put a matrix into the nearly completely decomposable (NCD) form, find an application for the method
and formalize the problem. These are the motivations to my research, and I would like to learn new
knowledge related to these problems and delve into scientific problems.
Meanwhile, I also read the book, Statistical Learning and Pattern Recognition written by C. Bishop. Last
semester, I discussed with Rodney R. Howell about the problem of asymptotic notation with multiple
variables discussed in his unpublished paper (5) and read the unpublished paper (6) written by Charles E.
Leiserson on the same topic. I am also interested in machine learning theory and studied lectures slides of
machine learning classes taught by Avrim Blum.
For future research, I do have a strong interest in a broad area of computer science, and I would like to
learn more, especially about reasoning and learning in artificial intelligence, and about the theory of
algorithms. In addition to the desire to acquire fundamental knowledge of computer science, my dream is
to apply computer science to problems in life sciences. For long time, my goal has not varied at all, even
when sometimes I was frustrated by not being able to learn things more efficiently. Faced with frustration
and obstacles, my credo is to study harder, more diligently and never and ever to give up what I want to
pursue. I am determined to dedicate myself to science as much as I can. I entreat your department to offer
me the opportunity to make it possible for me to realize my dream!
1. Caragea, D; Brown, S; Park, Y.S.; J., Xia. Genomics Studies on Arthropod Affecting Human, Animals
and Plant Health.
2. Kim, S.H.; Xia, J.; Caragea, D.; Brown, S. BeetleBase: the model organism database for Tribolium
castaneum - 2008 Update. s.l. : Nucleotide Acids Research, Database Issue, 2009.
3. Xia, J. MSc. Thesis, Bioinformatics Analyses of Alternative Splicing, EST-based and Machine
Learning-Based Prediction,. USA : CIS Department. Kansas State University, Sep. 2008.
4. Xia, J., Caragea, D. and Brown, S. Exploring Alternative Splicing Features using Support Vector
Machines. Philadelphia, PA. : regular paper, in proceedings of IEEE on Bioinformatics and Biomedicine
(BIBM), Nov. 2008.
5. Howell, Rodney R. On Asymptotic Notation with Multiple Variables. KSU : Technical Report, 2007-4.
6. Demaine, Erik D. and Leiserson, Charles E. Mulitivariate Asymptotic Notation: O No! Apr.2nd, 2008.
In writing phd statement, you should have you clear ideas and creative thoughts. Sometimes best phd statement of purpose service can give proper ideas.
ReplyDelete