Computer code on a screen

The GDSC’s cross-disciplinary research directions include:

(i) Topological Data Analysis. The challenges that high-dimensional, incomplete, and noisy data present are great, but in many applications, exploiting the topological nature of the problem is possible. GDSC aims to develop new fundamental methods and theory to rigorously explore the promise of this unique approach.

(ii) Data Representation. Data compression, embeddings, and dimension reduction play a fundamental role in data science. Inspired by new core challenges in biomedical imaging, genomics, and neural-spike training data, GDSC aims to develop novel source models and distortion measures, and ultimately seek a unifying theoretical framework across domains and disciplines.

(iii) Network & Graph Learning. Many of the fundamental challenges in applying data science to non-homogeneous populations are best explored through a network or graph structure. GDSC aims to develop new techniques for parameter-dependent eigenvalue problems in spectral community detection, density-estimation methods on networks, and a theoretical framework for time-varying graphical models to study dynamic variable relations in time-evolving networks.

(iv) Decisions, Control & Dynamic Learning. Sequential decisions are high-stakes in medicine. GDSC aims to utilize systems and control-engineering methods to improve health and disease management and develop new foundational theories and methods for label-efficient active learning and dynamic treatment regimes.

(v) Diverse & Complex Modalities. Big data is complex data, and major new innovations are needed. GDSC aims to develop theoretical frameworks for inference under computational and privacy constraints and for high-dimensional data without parametric model assumptions. Text, image, and audio data present further challenges. To address such challenges, GDSC aims to explore transition systems for graph parsing of natural language and new fusion approaches for fully multimodal analysis.