A shared data science platform: bringing power and simplicity to research

Guillaume Moutier

Université Laval

Guillaume Moutier holds an engineering degree from the École Centrale de Lyon in France (1996). He is the Director of the Architecture office at Université Laval.As an architect and a project manager for CGI and Université Laval, Guillaume Moutier has lead numerous IT projects, from servers or networking infrastructure to application integration or development.At the head of the architecture team, he currently notably oversees a major data science project that will provide all research domains and teams with the technologies and tools that will enable them to harvest their data full potential.

Abstract

Most of today’s research produce or use data that must be stored, transferred, analyzed, published and preserved. This poses a challenge for many researchers, where local resources are not sufficient to support the required workflow, or offer the necessary level of performance, security and reliability. Moreover, useful research data produced in laboratories are often not easily discoverable and accessible to other researchers.

Université Laval is currently building a service that will help researchers create the most value from this data. It will provide all research domains and teams with the technologies and tools that will enable them to harvest its full potential. This new service will complement the existing resources already available through Calcul Québec and Compute Canada.

The three main objectives of this project are to:

Build a new Datacenter specialized in data collection, processing and valorization. Adding to the other two main data centers located on different areas of the campus, this Tier-3 facility will allow for scalable and reliable architectures;
Setup all necessary infrastructures, applications and tools for data storage and processing. With a shared object-based data lake and databases, data discovery, transfer, ingestion, analysis and visualization will be easily made through ready-to-use or on demand data science environments (Hadoop, Spark, VMs, Containers, Notebooks,...);
Create a service offering for data valorization. A team of data scientists and data engineers will give advice and support for the platform utilization and provide training for its users, and a research data management framework (governance, security, ethics) will enable our researchers to quick-start their data projects along recognized rules and best practices.

This presentation will give an overview of the project, describing the services that will be offered, the different technological infrastructures being deployed (100Gbps networking, network virtualization, object storage, OpenStack and OpenShift environments,…), the tools and applications that are used and the way they are configured (data catalogs, shared notebooks, scalable container-based Spark environments,…). We will also make a quick demonstration of this new integrated environment and show how it simplifies research and allows for new collaboration and interdisciplinary studies, such as the PULSAR project, an innovative and sustainable approach to health.

Authors

Guillaume Moutier (Université Laval)
Florent Parent (Université Laval)

Topic Area

Research Computing: Bridging the gap between Research Computing and IT

Session

D2-S5-04 » Tuesday Session 5 - 4 (15:30 - Tuesday, 19th June, DAC Lower Floor)

Presentation Files

The presenter has not uploaded any presentation files.

Email Support • Blog • Privacy Policy • Cancellation Policy