William R. Pearson


  • BS, University of Illinois, Urbana Champaign
  • PhD, California Institute of Tech., Pasadena, CA
  • Postdoc, California Institute of Technology
  • Postdoc, Johns Hopkins School of Medicine

Primary Appointment

  • Professor, Biochemistry and Molecular Genetics


Research Interest(s)

Protein Evolution; Computational Biology

Research Description

We have a long-standing interest in exploiting protein sequence information, both for understanding better how new protein sequences arise and for understanding the relationship between protein sequence and protein structure. Since the description of the FASTP program in 1985, our group has been developing more effective methods for identifying distantly related protein sequences. Over the past 10 years, state-of-the-art methods have improved to where proteins that have diverged from a common ancestor in the past billion years are likely to be detected by sequence similarity searching. We hope to push back that threshold to beyond 2 billion years (near the time when prokaryotes and eukaryotes diverged), but already it is possible to identify novel proteins that are likely to have emerged in the last 500 - 800 million years. If we can identify proteins that emerged in the last 100 - 250 million years, it may be possible to identify the mechanisms by which new proteins are formed.

Our laboratory is also investigating the molecular mechanisms responsible for differential gene expression in higher organisms. We are studying the genes that encode glutathione transferases, a family of inducible detoxification enzymes. Glutathione transferases bind or catalyze the inactivation of a variety of carcinogens, and induction of these enzymes by the food preservative butylated hydroxyanisole (BHA) can protect rodents against potent chemical carcinogens. More than a dozen genes encode glutathione transferases. We are using this multiplicity of genes to study the factors responsible for the expression and induction of these genes. We are also studying glutathione transferase gene expression in humans, where polymorphisms in members of the glutathione transferase gene family have been associated with increased risk in lung cancer. We have extensively characterized a polymorphic deletion of one glutathione transferase gene, and are beginning to examine a polymorphism in a second gene. Genetic diversity in detoxification genes may play an important role in cancer risk.

Selected Publications

  • An introduction to sequence similarity ("homology") searching. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.]. 2013; Unit3.1. PMID: 23749753
  • BLAST and FASTA Similarity Searching for Multiple Sequence Alignment. Methods in molecular biology (Clifton, N.J.). 2013;1079 75-101. PMID: 24170396
  • Mills L, Pearson W. Adjusting scoring matrices to correct overextended alignments. Bioinformatics (Oxford, England). 2013;29(23): 3007-13. PMID: 23995390
  • Li W, McWilliam H, Goujon M, Cowley A, Lopez R, Pearson W. PSI-Search: iterative HOE-reduced profile SSEARCH searching. Bioinformatics (Oxford, England). 2012;28(12): 1650-1. PMID: 22539666
  • Holliday G, Andreini C, Fischer J, Rahman S, Almonacid D, Williams S, Pearson W. MACiE: exploring the diversity of biochemical reactions. Nucleic acids research. 2011;40 D783-9. PMID: 22058127
  • Gonzalez M, Pearson W. Homologous over-extension: a challenge for iterative similarity searches. Nucleic acids research. 2010;38(7): 2177-89. PMID: 20064877
  • Gonzalez M, Pearson W. RefProtDom: a protein database with improved domain boundaries and homology relationships. Bioinformatics (Oxford, England). 2010;26(18): 2361-2. PMID: 20693322
  • Sierk M, Smoot M, Bass E, Pearson W. Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments. BMC bioinformatics. 2010;11 146. PMID: 20307279 | PMCID: PMC2850363
  • Lavelle D, Pearson W. Globally, unrelated protein sequences appear random. Bioinformatics (Oxford, England). 2009;26(3): 310-8. PMID: 19948773 | PMCID: PMC2852211
  • Pearson W, Sierk M. The limits of protein sequence comparison? Current opinion in structural biology. 2005;15(3): 254-60. PMID: 15919194