Creating and providing free Web-based public resources to the thousands of researchers around the world depends on top-notch software development, complex data management, and a high-powered computing infrastructure.
Designed by a team of computer science professionals—led by Chinh Dang—with expertise in web and database development, high performance computing and large scale data management, the Allen Institute’s data pipeline is built for high-throughput data processing and web application development and hosting.
Capable of processing more than 4 terabytes—or 4,000 gigabytes—of image data per day, the Allen Institute’s data processing pipeline leverages a combination of commercially available hardware and custom software development. The 2,200 square foot computing facility houses the state-of-the-art storage infrastructure and large memory server farm. A high-speed gigabit fiber network connects this equipment to other servers and microcomputers throughout the Allen Institute for on-the-fly data analysis. Since the inception of the Allen Institute, more than one petabyte—or 1,000 terabytes—of data have been generated.
A custom-developed software system handles the data management and automated data analysis. Images generated are automatically processed and mapped according to predefined workflows that include modules created by our informatics team for that project. The system is an efficient pipeline that can accommodate simultaneous execution of multiple large-scale atlases and projects.
Public data display is presented within a web application portal—the ALLEN BRAIN ATLAS data portal—with customized visualization for each project. To maximize availability and performance, the portal is hosted by the Allen Institute at a separate co-location facility as well as at a satellite center in Sweden by our collaborator, the International Neuroinformatics Coordinating Facility (http://www.incf.org/).