Saturday, December 10, 2011

Sequencing species.... by the thousands

 When I was giving my first steps in the field of comparative genomics, there was not much to think about when deciding which genomic datasets to use: one would just take them all. With only a few dozens of genomes, mostly of bacteria, one could have everything at hand, in the local disk, just need to update every couple of months by adding one or two more...

 These times have definitely passed, and now the flow of newly sequenced genomes is... well, overwhelming (see figure below, taken from Genomes Online). This is both a blessing and a curse for us doing comparative genomics, since we have an unprecedented amount of data which enables more resolution, but we are increasingly facing novel technical and analyitical challenges.

 Just to give a taste of this avalanche of genomes from different species (projects for sequencing genomes for a given species, such as the 1000 genomes is another story) that is coming, I here list some of the projects I am aware of that aim at sequencing thousands of genomes from a given taxonomic group.

As expected, in this kind of projects it is way more easy to come up with a bold number, than to actually define the list of species that are actually going to be sequenced. At least this is what I can tell from my involvement in the i5K initiative, in which prioritisation of species to be sequenced is not simple, since usually one wants to weigh in different criteria (phylogenetic relevance, biological, economical, and clinical importance, etc).   

 I'm sure I missed some, and, in addition, there is a growing flow of genomes that are sequenced by independent groups, including my modest own group. One common weakness of this large, and small-scale initiatives is that they sometimes come with the cost for covering the genome sequencing but do not account for the necessary bioinformatics analyses to actually make sense of the data. With the sequencing costs dropping and the potential analyses becoming more complex, the actual costs of sequencing projects will more and more be on the side of the analysis beyond the assembly and annotation phases. As a result, many bioinformatics groups are streching their resources to contribute to genomics projects without getting any specific funding.

In my opinion the planning of a sequencing project should account for all the downstream phases with their associated costs. With such an approach we may end up having a handful of genomes less, but we will definitely learn more from them. 

Monday, December 5, 2011

Watch the talks from the CRG Symposium: Computational Biology of Molecular Sequences.

 If you missed the opportunity to attend physically our past symposium on "Computational Biology of molecules" (see this past post), you can now watch the videos of the talks (read message below).

Dear all,

All contents of the 10th CRG Annual Symposium on Computational Biology of Molecular Sequences, celebrated last 10th and 11th of November, are now available online.

Leading scientists in computational biology came together in Barcelona on the occasion of the tenth edition of the CRG Annual Symposium, which focused on computational biology of molecular sequences, organized by the Centre for Genomic Regulation (CRG). The auditorium of the Barcelona Biomedical Research Park (PRBB) hosted the event, celebrated from Thursday 10 to Friday 11 November 2011.
In the microsite you can find the inaugural video of the Symposium, videos of the talks, interviews with some of the speakers, participants and organizers of the event and two summary videos that capture the major points of all sessions. There are also available two articles that summarize the talks and news related to the field of computational biology of sequencing.

We hope that these resources are useful for you!

Click here to visit the 10th CRG Annual Symposium web.

Saturday, December 3, 2011

SESBE: Spanish Society for Evolutionary Biology

 Last week I went to Madrid to attend the 3rd congress of the Spanish Society for Evolutionary Biology (SESBE). This is a relatively new (7 years) society that embraces evolutionary biology as a whole, from palaeontology and systematics, to evolutionary genomics and darwinian medicine. Thus, the meetings are very diverse and one can listen to the most diverse talks, always with the common ground of evolutionary theory as a framework of analysis.

Due to other commitments, I could only stay two days but it was worth and enjoyed most of the talks and, most of all, meeting colleagues around Spain. I would highlight here the talk of Nick Lane, on the evolution of eukaryotes and the role played by mitochondrial endosymbiosis. Nick, who is also a prolific writer of popular science books, gave a very nice talk that seduced the whole audience, including me. I had the opportunity to discuss with him, and it was nice to discuss again on big theories on the evolution of eukaryotes, a big theme that I am passionate.

This year, the SESBE elected a new board, in which I will stand as a secretary. Not that I am very keen on holding such a position, but I was asked and I think one should be prepared to contribute his two cents to noble causes, such as that of this society promoting the study of evolution and its transmission  to society in our country. 

Sunday, November 20, 2011

XI Jornadas de Bioinformatica in Barcelona (23-25 January)

  A short note to spread the word on the joint Spanish and Portuguese Meeting on Bioinformatics.  This is a yearly meeting that is gaining momentum every year, and it is a great opportunity to meet most groups doing bioinformatics in the region. Talks are in English and everybody is welcome to attend.

 As other years, this meeting has associated a regional (Spain, Portugal and North Africa) ISCB student symposium. This year this symposium is co-organized by, Salvador Capella-Gutierrez, one of the members of my lab. 

 If you plan to submit a communication, there is time till the end of November.

 See you there.

Tuesday, November 8, 2011

ALPHY 2012: French-Spanish meeting on Bioinformatics and Evolutionary Genomics (March 19 -21, Banyuls-sur-Mer)

 I am glad to announce ALPHY 2012, which for the first time is jointly co-organized by French and Spanish researchers. I was very glad to be invited by my French colleagues to sit at the organizing committe. I think it is a great opportunity to join two communities with ample experience in phylogenetics-related research.

ALPHY is an annual meeting, organized in France since 1995, dedicated to the field of Bioinformatics and Comparative Genomics (ALPHY = ALignments and PHYlogeny). The main goal of this meeting is to promote informal exchanges in this highly multidisciplinary field, and to encourage young scientists to present their work. The official invitation follows, plus a very tempting picture of the location.

This year, ALPHY is co-organized by Spanish and French scientists, in the nice city of Banyuls. There will be two invited speakers (Henrik Kaessmann and Jose Castresana), and the program will be open to contributions for 20’ talks.
The registration to the meeting is free, but mandatory. Please use the link (top left of this page) to register. If you wish to present your work, submit your abstract in the registration form.
Important dates:
  • Deadline for abstract submission: January 10 2012
  • Deadline for registration : February 1st 2012
Hasta pronto – A bientôt – fins aviat - see you in Banyuls!

Thursday, November 3, 2011

Sad news from CIPF: the rise and fall of the "flagship" of valencian research

For those who don't know. I am originally from Valencia. There, one of the deepest traditions and the main festivity are the so-called "Falles", which in part consist of building huge temporary cardboard sculptures which are exposed for little more than a week and then burned in a big fire. For some people is hard to understand how so much time and money is invested in something that is then left to the flames.

Apparently, something similar is happening with a research centre!!!

 The "Centro de Investigación Príncipe Felipe" was created in 2005 by the local Valencian government with the idea of making it the "flagship" of research in the region. It came with a strong investment from the regional and central governments and soon attracted many scientists. I was one of the seduced scientists, who originally from Valencia, and at that time in the Netherlands was enthusiastic about a move aiming to put biomedical research in Valencia at the forefront.

 Five years after its creation, the cuts started. Crisis had hit Spanish economy and many local governments had big debts, particularly that of Valencia who has been famous for investing in huge events such as the America's cup or the formula 1 competition. When things went complicated, research was seen as one of the most superfluous thing in which a government could invest, and thus cuts were announced. This year the centre is firing 40% of the personnel, including PhD candidates at the middle of their PhD. I guess many of the remaining researchers will leave this downsized center for a better live elsewhere. The flagship is now sinking, "burned" after so much investment and efforts, the comparison to our "Falles" is unavoidable.

The whole story is reported by Nature and by many articles in the Spanish press. As Juli Peretó reports, the local government is letting CIPF fall, while keeping investing on other type of events, such as an international Golf tournament in Castelló, or increasing the funds for a motorbike circuit. This is most ironic, and deeply sad.

 I just wish the best for my many ex-colleagues that are still at CIPF and hope this is not the kind of science policy that the future government of Spain (according to polls is likely to be the same conservative party that is now governing in Valencia) is planning.

Tuesday, November 1, 2011

Educational video on the Tree of Life

In the blog of Jun-Hoe Lee, a former visiting student in my lab, I found this interesting video from Yale university on the Tree of Life and the efforts to reconstruct it.


I think it is a good piece for popular communication of science and conveys pretty reasonably well the problem. Of course, there are simplifications and some important aspects such as that of horizontal transfer of genes, symbioses, and their effects on the tree are not covered, but it provides an attractive and educational introduction to the problem of assembling the tree of life. 

Monday, October 24, 2011

RECOMB 2012 (Barcelona)

 The next RECOMB meeting will be held at Barcelona. Our department is part of the local organizing committee and the list of confirmed speakers looks very promising.

 Submission opened in September, and you still have time to submit papers until the end of the week. Do not miss the deadline.

Saturday, September 24, 2011

Special BiB issue on "Orthology and Applications"

 An special issue on "Orthology and Applications"  is out in the journal Briefings in Bioinformatics.

 This special issue has been edited by Christophe Dessimoz and comprises a number of interesting papers including several comprehensive reviews and also original research articles. Some of the papers emerge from efforts on orthology benchmarking and standardization of datasets that were initiated during the first "Quest for Orthologs meeting" in 2009. See this letter reporting from that meeting. We contributed with an article reporting on the comparison of expression patterns between across-species orthologs and paralogs of a similar evolutionary age.

Wednesday, September 21, 2011

On the "orthology conjecture"


 Jonathan Eisen has opened a thread in his blog to discuss the recent paper by Hahn and colleagues on the "ortholog conjecture"  You can read more about the discussions raised by this paper here.

This is what I wrote, a text which I had to split in three pieces in Eisen's blog given the word limit for comments!!


I appreciate the effort by Matthew Hahnn on explaining the story behind his paper on the so-called "Ortholog conjecture" and on facing some of the criticism. This paper attracted my interest as that of many others that work on or just use orthology. For instance it was chosen by one of my postdocs for our "Journal Club" meeting. And it was discussed during our last "Quest for Orthologs" meeting in Cambridge. I think is raising a necessary discussion and therefore I think is a good paper. This does not mean that I fully agree with the interpretation and conclusions ;-). I hope to modestly contribute to this debate with the following post.

I think one of the causes that this paper has caused so much debate is that the conclusions seem to challenge common practice (inferring function from orthologs), and could be interpreted as the need of changing the strategies of genome annotation. I think, however, that one should interpret carefully these results before start annotating based on paralogous proteins. As I will discuss below one of the problems is that we need to agree in what is the conjecture to then agree in how to test it. I see three main points that can be a source of confusion: i) the issue of what is actually stated by this conjecture, ii) the issue of annotation, and iii) the issue of time

1) What is the "ortholog conjecture"?
Or in other terms, when should we expect orthologs to be more likely to share function than paralogs?. Always? Of course not. All of us would agree that two recently duplicated paralogs are likely to be more similar in function than two distant orthologs, so it is obvious that the conjecture is not simply "orthologs are more similar in function than paralogs". In reality the expectation that orthologs are more likely to be similar in function than paralogs, as least this is how I interpret it, is directly related to the effect that duplication have on functional divergence. If gene duplication has some effect on functional divergence (even in not 100% of the cases), then, given all other things equal (divergence time, story of speciation/duplication events - except fpr the duplication defining the orthologs) one would expect orthologs to be more likely to conserve function.

I think this complexity is not well considered (by many authors, in general). Hahn refeers to the famous review of orthology by Koonin (2005) as the source for the term "ortholog conjecture". However, In that paper this conjecture is discussed always within the context of genes accross two particular species, whether in Hahn's paper it is taken as well to other contexts. Thus, the proper context in which to test this conjecture is only between orthologs and between-species paralogs. As we can see,  Red and purple lines in Hahn paper in figure2 do not show any clear difference.

 Secondly, Koonin was very cautions in his paper, stating that he was referring to "equivalent functions" and not exactly the same "function", correctly implying that the functional contexts would be different in the two different species. This brings me to the next point.

ii) annotation
If the expectation of functional conservation of orthologs refers to a given pair of species, then it makes no sense to test that expectation between paralogs within the same species and orthologs in different species. We were interested in this issue and it took us some effort to control for this "species" influence on the comparison, if you are interested you can read our paper on divergence of expression profiles between orthologs and paralogs (

As Hahn founds, and it was anticipated by Koonin in that review, there is a huge influence of the "species context", a big constraint of what fraction of the function is shared. Indeed I think is the dominant signal in Hahn's paper. Why is that? One possibility is that the functional context determines the function, I agree. However, we should not discard biases in how different communities working around a model species define processes and function, also the type of experiments that are usually done. For instance experimental inference from KO mutants might be common from mouse, but I guess is not the case in humans (!!). I think this may be having a big influence and might even be the dominant signal in Hahns paper.

Finally function has many levels and I expect subfunctionalization mostly affect lower levels (i.e. more specific). Biases may also
 exist in the level of annotation between species or between families of different size (contributing more or less to the orthologs/paralogs class).

Microarray data are less likely to be subject to biases (although some may exist), at least they should be expected to be free of "human interpretation biases" and so Hahn and colleaguies did well, in my opinion, of testing that dataset. It is important to note that for microarrays and for orthologs and between-species paralogs (which I think is the right frame for testing the conjecture) ortholgs are more likely to share an expression context. This is compatible to what we found in the paper mentioned above, and compatible with the orthology conjecture as stated by koonin (accross species)

iii) time
 Finally, one aspect which I think is fundamental is the notion of "divergence time". Since paralogs can emerge at different time-scales they are composed by a heterogeneous set of protein pairs. Most of comparisons of orthologs and paralogs (Hahn's as well) use sequence divergence as a proxy of time. However this is only a poor estimate, specially when duplications (as in here) are involved (we explored this issue in the past: This means that for a given divergence time paralogs may have larger sequence divergence than orthologs at the same divergence time, or otherwise (if gene conversion is playing a role). Is the conjecture based on sequence divergence or on divergence time?, I think the initial sense of using orthology to annotate accross species is based on the notion of comparing things at the same evolutionary distance. Thus basing our conclusions on divergence times might not be the proper way of doing it.


To conclude, and with the intention of going beyond this particular paper,
I would finish by saying that the key to the problem lies on how we interpret the so-called "ortholog conjecture" or how are our expectations on how function evolves. What I get from re-reading Eugene Koonin's paper and how I am using that "assumption" in my day-to-day work is the following:

"Orthologs in two given species are more likely to share equivalent functions than paralogs between these two species"

Therefore the notion of "accross the same pair of species" is important and thus only part of the comparisons made by Hahn and colleagues could directly test this. Looking at the microarray and between-species comparisons data, the conjecture may even hold true!!

I, however, do think that the conjecture as stated above is limited and does not capture the complexity of orthology relationships. Indeed us, and many other researchers, are tuning the confidence of the orthology-based annotation based on whether the orthologs are one-to-one, one-to-many or many-to-many, even when orthologs are "super-orthologs" (with no duplication event in the lineages separating the two orthologs).

Since, the underlying assumption of the ortholog conjecture is that duplication may (not necessarily always) promote functional shifts, then many-to-many orthology relationships will tend to include  orthologous pairs with different functions.

 Thus I would re-state the conjecture (or expectation) as follows:

 "In the absence of additional duplication events in the lineages separating them, two orthologous genes from two given species are more likely to share equivalent functions than two paralogs between these two species"

 This would be a more conservative expectation, which is closer to the current use of orthology-based annotation that tends to identify one-to-one orthologs, rather than any type.

 When duplications start appearing in subsequent lineages thus creating one- or many-to-many orthology relationships, the situation is less clear. Following the assumption that duplications may promote functional divergence. Then one could expand the conjecture by "the more duplications in the evolutionary history separating two genes, the lower the expectation that these two genes would share equivalent functions".

 I wrote this contribution on the fly, and surely there are ways of expressing this in more appropriate terms. In any case I hope I made clear the idea that the conjecture emerges from the notion of duplications causing functional shifts and that our expectations will be clearer if expressed on those terms. This goes on the lines of what Jonathan Eisen mentioned on considering the whole phylogenetic story to annotate genes.

 Under this perspective, the real important hypothesis is that "duplications tend promote functional shifts", I think this is based on solid grounds and has been tested intensively in the past.  


Toni Gabaldón

Wednesday, September 14, 2011

CRG Symposium: Computational Biology of Molecular Sequences. 10-11 November

Registration is open for the CRG symposium organized by our Bioinformatics and Genomics programme. This meeting will host internationally reknown scientists in the Bioinformatics field. Just to cite some: Smith, Tramontano, Ponting, Sankoff, Koonin, Bairoch, Brunak... Below you'll find the symposium overview and the complete list of speakers. 

Advances in methods to sequence nucleic acids, coupled with more general advances in automation, robotization, and multiplexing, have resulted in the capacity to survey the phenomena of life in a global manner and with unprecedented resolution. As a result, Biology, traditionally an analytic science in which the natural world is dissected in its elemental components in order to be comprehended, is becoming a synthetic science, in which the phenomena of life is approached in more systemic way. In parallel, Biology, a science in which human effort  been directed until very recently towards data acquisition, is increasingly becoming a discipline in which data is obtained with almost no human intervention, and the effort is being directed towards data analysis. Computational systems to store, analyze and model biological data have thus become an essential part of research in Biology. The connection between Biology and Computation, however, runs much deeper as we are coming to realize that the unfolding of the instructions in the genome is, stricto senso, a computation on the DNA sequence.  Biology, thus, cannot be understood without Computation. The two-day CRG symposium on “Computational Biology of Molecular Sequences” will bring together renowned Computational Biologists from around the world, including both pioneers in the field, as well as promising young scientists. Presentations, discussions and dialogue during the Symposium will contribute to survey the status of a discipline that, at the intersection of Biology and Computation, will have an enormous impact on the world of the XXIst century.
Confirmed Speakers
Amos BAIROCH Swiss Institute of Bioinformatics (SIB) and University Geneva, Geneva CH
Mathieu BLANCHETTE McGill University, Montréal CA
Søren BRUNAK Technical University of Denmark, Kongens Lyngby DK
Philipp BUCHER Swiss Institute for Experimental Cancer Research (ISREC), Lausanne CH
Brendan FREY University of Toronto, Toronto CA
Mark GERSTEIN Yale University, New Haven US
Nick GOLDMAN European Bioinformatics Institute, Hinxton UK
Tim HUBBARD Wellcome Trust Sanger Institute, Hinxton UK
Eugene V. KOONIN National Center for Biotechnology Information, Bethesda US
Gene MYERS Janelia Farm Research Campus, Ashburn US
Chris PONTING University of Oxford, Oxford UK
David SANKOFF University of Ottawa, Ottawa CA
Ron SHAMIR Tel-Aviv University, Tel-Aviv IL
Temple F. SMITH BioMolecular Engineering Resource Center, Boston US
Terry SPEED Walter & Eliza Hall Institute of Medical Research, Parkville AU
Peter STADLER Universität Leipzig, Leipzig DE
Gary STORMO Washington University School of Medicine, Saint Louis US
Ana TRAMONTANO Sapienza University, Rome IT
Michele VENDRUSCOLO University of Cambridge, Cambridge UK
Martin VINGRON Max Planck Institute for Molecular Genetics, Berlin DE

Monday, September 5, 2011

Article collection on the Tree Of Life in Biology Direct

The journal Biology Direct has initiated an article collection entitled "Beyond the tree of Life". The main focus of this series seems to be the challenges posed by evolutionary mechanisms, such as Horizontal Gene Transfer, that may blur or completely destroy the classical view of a bifurcating tree of life representing the evolutionary relationships of organisms, specially prokaryotes.

 The issue is not new, but the current wealth of genomic data and the availability of new methodological approaches to measure and compare the evolutionary signals of thousands of protein families has prompted a revival of the debate on how strong are tree-like and network-like signals in the different domains of life.

The series started last June and is being updated regularly. There are already interesting articles from various authors including William Martin, Eric Bateste, and Eugene Koonin.

A nice add-on, which is one of the features I lke the most about Biology Direct, is that reviewer's reports can be read along with the paper, thus having a complementary view of the author's interpretation of the data.

Thursday, September 1, 2011

A new journal for "big" Science

 I have a mixed feeling for the current proliferation of scientific journals. On the one hand I feel that it is a natural response to the increase in the number of researchers in the world and the growing specialization of science. Moreover, it serves to open up the publication system to the wider community and sometimes breaks up dangerous closed circles that monopolize the access to publication in certain areas. On the other hand, as a researcher with broad interests and one who wants to follow the progress of my field, I feel overwhelmed. Times when browsing the table of contents (TOCs) of a handful of journals was enough to identify almost all relevant papers are definitely over. Nowadays one needs complex literature-mining strategies to try to cope with the flow. It seems one also needs to search for potential new Journals that may become the forum for papers relevant to your research. I'm keen to use this blog to spread the word of new Journals that are relevant to my field, as I did in the past. Now I am doing it again.

When I recently heard about the new BMC-based journal Giga Science Journal I felt it will be worth to keep an eye. As they say, "GigaScience aims to revolutionize data dissemination, organization, understanding, and use. An on-line open-access open-data journal, we publish 'big-data' studies from the entire spectrum of life and biomedical sciences."

 The original idea of this journal is that it links standard publication with a database to store and search all asociated data. I personally think this journal will fill and important gap and seems perfectly prepared to do so, as judged by the The editorial board, which includes many researchers from centres that are at the forefront of massive data production such as BGI, Wellcome Trust, EBI, JCVI, 

I am looking forward to the first articles to see direct examples of how effective this system is and how the database fits the needs of inherently diverse types of data, but at a first glance it seems that this journal may meet the needs of upcoming studies on massive data such as those coming from genomics or systems biology.

Sunday, August 14, 2011

The best of....... SMBE2011

With this post I initiate a series that will highlight some talks or posters in (some of) the meetings I attend. I want to note from the very beginning that this is very subjective and is according to my own taste and interests. I hope, however, that these highlights may also be interesting for some of the readers of this blog.

 I came back recently from the last Society for Molecular Biology and Evolution meeting (SMBE 2011) in Kyoto, Japan. This meeting has been marked by the recent natural disaster of the 2011 earthquake that affected the nuclear power station of Fukushima and the attendance was significantly lower than recent SMBE meetings. (around 650 attendees as compared to 2000 in SMMBE2010 in Lyon). However, despite this, the quality of the meeting has been really high with plenty of interesting presentations in the form of posters or talks.

 The poster that most caught my attention was one presenting the "Centroid Wheel Tree" representation, who allows representing alternative topologies within the same phylogenetic tree.
Wheel Tree Representation

I still have to explore that possibility and how it differs from the more standard network representations, but it looks promising and fairly adequate to accomodate our interest in accounting for the topological variation within phylomes.  

Among the selected oral presentations, my favourite was that  from Shigehiro Kuraku (Konstanz University, Germany) on the debated positions of the two rounds of whole genome duplications in the early vertebrates.

 From the invited speakers I would choose the talk of Nancy Moran, which went through many fascinating examples of insect endosymbiotic bacteria showing extremely reduced genomes.

 And, finally, one of the interesting parts of the meeting was one special session organized to conmemorate Walter Fitch, who passed away earlier this year (see my previous post). It was iinteresting to hear of many annecdotes from Masatoshi Nei, who shared with him the efforts of initiating the Society for Molecular Biology and Evolution and the MBE journal.

Mashatoshi Nei commemorating Walter Fitch at SMBE 2011

 Of course these are just some very personal highlights from a very interesting meeting. I will most probably attend next SMBE 2012  meeting in Dublin.  

 PS- I just noted that my blog has surpassed the 1,000 visits, this is encouraging.

Monday, July 18, 2011

Brief introductory article to phylogenomics

I would like to share a brief introductory article to Phylogenomics and Genome Evolution that I wrote by request from the Roche Institute portal. The idea was to provide an overview to the general audience of what is phylogenomics about and what are the main challenges ahead.

The article is also available in Spanish.

Sunday, July 3, 2011

Reading what Darwin read (and noted)

 I definitely enjoyed reading Charles Darwin's The Origin of Species (Several times, indeed, at different stages of my career). Besides being one of the most influential books in Biology I always recommend its reading because is a great example of how a unifying theory is built. Indeed, the book transpires the process of thought of the author, from facts, to hypotheses, to weak points, and, therefore, is a much better example of how the phylosophy of Science works than the condensed pieces of scientific results that we digest in our daily readings of journals.

Thanks to the efforts of several institutions we are now much closer to getting the whole picture of how Darwin came about his theory. You probably all know by now that a comprehensive collection of notebooks from the british scientist is avaliable online. Now, we will also have public access to a digital version of 330 of the 1480 titles of Darwin's personal library.  Thus, we can now read the books that Darwin used to read. Of note, full transcriptions of his annotations and marks in those books are provided. Besides the broadly acknowledged books from Lyell, and Malthus, I am sure many others may provide hints on how the theory emerged in Darwin's mind.

Sunday, May 15, 2011

Learn Neighbor Joining method in 1-minute video.


 Have a ruler, a pen, scissors and some tape and glue around?... enough to reconstruct the tree of apes!.

I came across a short video, which illustrates for the general public how to build a phylogenetic tree from pair-wise distances.  The video has been made by Hidetoshi Shimodaira, the guy behind CONSEL package. I am already using it for teaching purposes.

Will we have one on ML and Bayesian reconstruction?

Sunday, May 1, 2011

Bioinformatics Summer School in Bratislava

Just a quick announcement for a 1-week summer school of bioinformatics in Bratislava, in which I will be lecturing. You can have more info here. 

 Here is a short description from the course website:

The summer school will provide an overview of several areas of computational biology, covering concrete tools, examples of their use, and underlying models and methods. Intended audience includes biologists who want to become more experienced bioinformatics users as well as computer scientists, mathematicians and others who are interested in this exciting research area. The summer school is primarily targeted at doctoral students and postdocs, although more experienced researchers or Master students are welcome to attend as well. The program will include lectures, practical workshops, and research seminars given by experienced researchers from several countries. Working language is English.

Friday, March 18, 2011

The father of orthology and paralogy concepts, passes off.

Last week Walter Fitch, a founder of the field of molecular evolution, passed off. He, among many other contributions to the field of Molecular evolution, coined the concepts of orthology and paralogy. Therefore, Fitch's seminal work provide the foundations of a big part of what I am doing now. He left us, but his work will still propel current research in phylogenetics and comparative genomics.

Friday, February 11, 2011

On the debate of recognising mitochondria as bacteria

I recently read Mark Pallen's provocative opinion article on whether we should recognise mitochondria as bacteria, and therefore give them their own taxonomic classification (now they are simply considered an organelle of the host cell). This paper has been featured and discussed in other blogs such as that of Jonathan Eisen.

 This issue is close to my heart, since I did my PhD precisely in tracing the origin and evolution of this organelles and their inter-mingling with their hosts. I agree with many of the arguments raised by Pallen,
 but I do not necessarily conclude that this should lead us to create a distinct taxonomic class for mitochondria. In practice, thinking of "mitochondria as bacteria" can coexist with its current classification as an organelle, as long as we are aware of their bacterial ancestry. Indeed I think that this is the dominant view of most people doing research on mitochondria, so I do not think that classifying mitochondria as a new bacterial class will radically enhance our possibilities of understanding or manipulating this organelles. On the other hand one could argue that strictly considering mitochondria as bacteria will close our eyes to the critical organellar properties of mitochondria, thereby hampering our potential to understand them. I imagine a future opinion paper entitled "time to recognise that Mitochondriaceae are organelles?", with the same kind of arguments arguing for recognising mitochondria as a true organelle from the host cells, and listing its many similarities with other membrane-bounded organelles in the cell.

 Since evolution has crossed the border from free living bacterium to organelle at least a couple of times, it follows that these two stages are united by a continued evolutionary time line, of small stages separated by a discrete number of changes. The current diversity only allows us to infer some of these intermediate stages and usually these are used to base our "categories". Making a separation is of course arbitrary but useful to describe to things that appear different to us. Despite the parallelisms mentioned with some reduced endosymbiont or pathogens, there is a quantitative jump in the level of integration with the host cell that we can easily recognize. Setting there a divide between what we call an organelle or what we call a bacterial species seems to me reasonable, although we could think of other operational definitions, is clear. 

 All the rest is deciding what degree of purity we want to apply to our definitions. As a biologist I am used to the limitations of our central concepts such as the ones of "species" or "genes", which similarly have to accomodate exceptions and different interpretations depending on what organisms we are dealing with. We humans have a natural tendency of classifying things into simple schemes, and we have to recognize the advantage of using operational classifications that are "generally correct" while not becoming too uneasy when understanding that the actual complexity is much bigger. The important thing is to be aware of the exceptions and of the "provisional and approximate nature" of practical definitions, until we find better ones.

 In summary, I am in favour of changes of our current paradigms to newer ones that better fit our current knowledge, but I am of the opinion that simply giving mitochondria the level of a taxonomic family is not solving anything, nor improving our understanding of these "highly derived bacteria" or "bacterial-derived organelles", as you prefer to call them.

Wednesday, February 9, 2011

EMBO-meeting Comparative Genomics of Eukaryotic Microorganisms

This is a quick note to announce this year's EMBO meeting on: Comparative genomics of eukaryotic microorganisms: understanding the complexity of diversity

I have attended to this meeting since 2006 and always enjoyed it. Besides the impressive panel of speakers there is plenty of time to meet with all attendees in an informal environment, so is one of this fruitful meeting from which you return with a broader knowledge and new ideas. Recommended.

15 - 20 October | 2011 | Sant Feliux | Spain

Sunday, January 30, 2011

Why only hungry K.lactis have sex?

One of my professors at the Univeristy of Valencia used to tell us that a Yeast's life was "mainly driven by food and sex", referring to the relevance and impact in this single-cell organisms of the signalling pathways in response to starvation, presence of nutrients or pheromones. One particular species of yeast, the diary yeast Kluyveromyces lactis, seemed to have combined both stimuli into a single pathway, requiring both starvation and pheromone signals to mate. Although this was known for decades, the specific mechanism and how it had evolved remained a mystery.

In a recent paper by the group of Alexander Johnson (UCSF), the origin of such phenotype has been established, by comparing regulation of mating genes in K. lactis, Saccharomyces cerevisiae, and Candida albicans. The evolutionary mechanism involved is that of a transcriptional rewiring, in which the core mating genes have been put under the control of the gene responsible for signalling starvation (RME1), which in turn is now also controlled by the mating factors (a/alpha). This intercalation of a new step within the mating signalling pathway effectively results in both stimuli being necessary for mating.

How could this happen? the implied scenario involves the acquisition of regulation by mating factors for RME1, at least four core mating genes loosing their reponsive elements to the mating factors - rather than change of the binding site of the factor, which was found to be similar to that in the other yeasts-, and the same genes acquiring responsive elements to RME1. 9 transitions in total. The first one (RME1 under control of mating) also occurs in S. cerevisiae, so it seems to have pre-dated the re-programming of the core mating genes control, effectively paving the way for the final rewiring. To unveil the order of the  other 8 transitions, one would need to find intermediate states in other yeasts. Given the potential deleterious effects of a mating gene loosing pheromone control, and the low probability of loosing one binding factor while acquiring the other one in four genes, I envision an intermediary state where the genes where responding to both RME1/mating factors. Then, the lost in a single core mating gene of the direct response to mating factors would render the pheromone-responsive elements in the other core genes non-functional (three of this core genes encode proteins that should be combined into a heterotrimer to function), thus leaving the only functional route that passing through RME1. Accumulating mutations would have then simply removed the pheromone-response site.

An interesting story of how regulation can effectively be altered by evolution in small steps.  Another important connection is that of the fact that for many fungi, most particularly pathogens such as Candida glabrata, we lack direct observation of the mating cycle although they conserve intact the mating genes and for some we have indirect evidence that they mate. Perhaps it all comes down to very specific requirements for mating, achieved by intercalating layers of regulation of mating genes as that found in K. lactis.

Friday, January 28, 2011

Map of scientific collaborations

My brother pointed to me this map of scientific collaboration between researchers made by  Olivier H. Beauchesne. Inspired by a similar map drawn for facebook friends, O. Beauchesne used a bibliographical database to trace links between universities if their respective researchers were co-authoring articles.

The result is an amazing picture, that shows that, as for many other things, research is unevenly distributed around the world. With a clear North/South divide, research and economic centres overlap almost completely. As expected, North-america, Europe and Japan are the most densely connected areas. Emerging research countries such as India, China, and Brasil can also be recognized. Within Europe, South UK, Paris, the Netherlands, and Switzerland/Austria seem to form research hubs. Perhaps influenced by language and cultural ties, Spain and Portugal are relatively well connected to central and south-American countries.

Barcelona, where my lab is located, appears to be the most densely connected research pole in Spain.

Wednesday, January 19, 2011

A PLoS currents for the Tree of Life

 I recently discovered that PLoS Currents has opened a new track for the Tree of Life: PLoS Currents: Tree Of Life.

 PLoS currents is yet another form of publishing scientific results. A small group of editors and reviewers reviews every paper to check that it is t is " a legitimate work of science and does not contain any obvious methodological, ethical or legal violations." If that's the case papers are published immediately (and indexed in Pubmed). Another novelty is that all the publication procedure is based on a web-based tool called google-knol.

 So far there are only 4 articles (or "knols"?) and they all seem pertinent to the topic, one of them was very useful to me, since it described a compilation of benchmark datasets for phylogeny. 

 It looks worth to keep an eye.