| |
Name of Collaboratory :
|
|
Cyberinfrastructure for Phylogenetic Research (CIPRes)
|
|
| |
URL :
|
|
http://www.phylo.org |
|
| |
Collaboratory Status
:
|
|
| Operational
|
|
Start Date :
2003
|
End Date :
2008
|
|
|
| |
Primary Collaboratory Function :
|
|
Community Infrastructure Development |
|
| |
Secondary Collaboratory Functions :
|
|
Distributed Research Center |
|
| |
Domain(s)
:
|
|
Evolutionary biology, Phylogeny |
|
| |
Brief Description
of the Collaboratory :
|
|
The goal of CIPRes is to establish the parameters of the cyberinfrastructure to reconstruct the evolutionary history, or the "Tree of Life," of all species on the planet. The project is a collaboration between biologists and computer scientists and has an almost equal number of researchers from each domain. Many of the biologists involved in CIPRes are accomplished software developers who created programs that are widely used in their scientific community. One of the aims of CIPRes is to bring together these dispersed efforts and to build on them by employing the latest advances in algorithm development and optimization. CIPRes is committed to providing open-source software.
CIPRes faces significant computational challenges. Current algorithms are able to process a few hundred species, but the number of described species is over a million, and biologists estimate that 10-20 million species actually exist. Lack of data is another challenge in the effort to reconstruct evolutionary history. Even though most species have yet to be described, technology for genotyping and sequencing is moving quickly, and it is possible to conceive of the day when much more data will be available. Until that time comes, however, CIPRes must simulate data in order to develop a cyberinfrastructure that can one day build a Tree of Life that includes millions of species.
CIPRes is a very distributed project and is organized into five main groups. The goal of the simulation group is to constantly refine models of evolution and to use these models to provide the simulated datasets needed to benchmark project efforts. The algorithm group devotes its efforts to long-term thinking about how to scale up reconstruction processes and solve difficult optimization problems. Members of the software architecture group guide the overall software development for the entire project, and a team of professional programmers implements the vision of the software architects. The outreach group's mission is to help all age levels understand the concept of evolution.
An Executive Committee oversees the general activities of CIPRes. This eight-member committee meets monthly and is comprised of the Principal Investigators (PI) from the five lead institutions and the leaders from the database, simulation, and outreach groups. The PI at the University of New Mexico serves as the Director; he receives administrative and accounting support from a part-time project manager. Day-to-day activities are the responsbility of the focus group leaders, who organize their own meetings and report each month to the Executive Committee. An Advisory Committee helps CIPRes assess its short- and long-term goals and overall direction.
|
|
| |
Access to
Instruments :
|
|
CIPRes is in the process of acquiring a large machine. The exact specifications of the machine will not be known until a contract is awarded; however, it is expected to have 100-200 processors and at least 300GB of memory. It will also have a database server. The machine's main purpose is to test CIPRes models and algorithms, but it will also be available to the entire community, so that researchers not part of CIPRes can benchmark their own approaches, and so that biologists not interested in software development or installation can employ a web server to analyze their data using the CIPRes platform. The machine should be installed in the first quarter of 2005 and open to community access by summer 2005. |
|
| |
Access to
Information Resources :
|
|
Project documents, focus group email archives, presentations, slide templates, and desgin documents are available on the CIPRes intranet. |
|
| |
Access to
People as Resources :
|
|
Most of the biologists in the project are also competent computer programmers, so they are able to contribute to the software development. Computer scientists are accustomed to thinking in terms of models, so they can help propose models of evolution to compute with. Each domain is able to contribute to the other, which makes CIPRes a very interdisciplinary project.
CIPRes participants are widely distributed, so they communicate primarily by phone conferences and email. Project-wide meetings are held twice each year. |
|
| |
Funding Agency
or Sponsor : |
|
| National Science Foundation (NSF) |
|
|
|
| |
| |
|
Notes on Funding Agencies/Sponsors:
CIPRes is supported by the NSF Information Technology Research program; it is a large ITR project. It was funded through NSF's collaborative funding mechanism, which allows investigators from two or more institutions to collaborate on a unified research project. The University of New Mexico is the lead institution. The other institutions that submitted proposals are Florida State University, University of California-Berkeley, University of California-San Diego, and University of Texas at Austin. There is a Principal Investigator at each of these locations. Subcontracts were made to eight other participating institutions.
|
|
|
|
|
| |
Organizations with Funded Participants:
|
|
|
| Organization name: |
Approx # of participants:
|
Description of organization's role(s): |
| American Museum of Natural History (AMNH) |
1
|
Outreach/Education
|
| University of Pennsylvania |
|
|
|
Department of Computer and Information Science (UPenn) |
1
|
Modeling
|
|
Biology Department (UPenn) |
1
|
Modeling
|
| University of California, Berkeley |
|
|
|
Jepson Herbarium (UC-Berkeley) |
1
|
Outreach/Education; Databases
|
|
Computer Science Division (UC-Berkeley) |
7
|
Algorithms
|
| Florida State University |
|
|
|
Department of Biological Science (FSU) |
2
|
Software development
|
| University of New Mexico (UNM) |
|
|
|
Department of Computer Science (UNM) |
3
|
Algorithms
|
| University of Texas at Austin |
|
|
|
School of Biological Sciences (UT-Austin) |
3
|
Algorithms; Modeling
|
|
Department of Computer Sciences (UT-Austin) |
8
|
Algorithms; Databases; Software development
|
| Yale University |
|
|
|
Peabody Museum (Yale) |
1
|
Outreach/Education
|
|
Department of Ecology & Evolutionary Biology (Yale) |
1
|
Modeling; Databases
|
| University of Connecticut (UConn) |
|
|
|
Department of Ecology and Evolutionary Biology (UConn) (EEB) |
1
|
Software development
|
| University of Arizona |
|
|
|
Department of Entomology (U-Arizona) |
1
|
Software development; Databases; Outreach/Education
|
| North Carolina State University |
|
|
|
Department of Statistics (NCSU) |
1
|
Modeling
|
| University of British Columbia |
|
|
|
Department of Zoology (UBC) |
1
|
Software development; Databases
|
| University of California, San Diego (UCSD) |
|
|
|
San Diego Supercomputer Center (SDSC) |
9
|
Algorithms; Databases; Software development
|
| State University of New York (SUNY) |
|
|
|
University at Buffalo |
1
|
Databases
|
|
|
|
|
|
|
|
|
| |
| TOTAL
PARTICIPANTS: |
43
|
|
|
Notes on Participants/Organizations:
More institutions are likely to join in the future because some of the students who were instrumental in developing parts of the project are graduating and moving to other universities. The project also has collaborators from Europe, New Zealand, and Singapore. Foreign participants do not receive funds except for travel to meetings.
|
|
|
|
| |
|
|
|
|
| |
Communications Technology
Used :
|
|
The primary communication technologies used are email and phone. The Executive Committee (EC) tried to use videoconferencing, but they found that it was expensive and that the technology did not work well with eight people. Most members of the EC have direct access to the Grid, so they also tried to conference using the Grid, but it is not yet mature enough to support this kind of activity. |
|
| |
Technical
Capabilities :
|
|
Management of technical resources
Access control/login facilities
Asynchronous object sharing
Common file space
Asynchronous conversation
Threaded discussion, Email
Synchronous conversation
Audio
|
|
| |
Key Articles : |
|
|
|
| |
Project-reported performance data
:
|
|
A key deliverable of the CIPRes project is platform-independent software that can be downloaded and installed in a PC or a large supercomputer. A current version of the software is available at: http://www.phylo.org/software.html |
|