CCPortal
CAREER: Scalable methods for discovering multivariate dependencies in high dimensional data.
项目编号1916787
Balakanapathy Rajaratnam
项目主持机构University of California-Davis
开始日期2017-08-28
结束日期06/30/2022
英文摘要This proposal aims to develop principled methods for discovering multivariate dependencies which cater to ultra high dimensional settings. A common theme that unites the proposed methods is scalability and identification of their limitations. A popular approach to identifying sparse inverse covariance matrices is through penalized likelihood methods. We propose a novel approach for solving the penalized Gaussian log-likelihood that is faster than its competitors by many orders of magnitude. The second research component in the proposal investigates the statistical properties of thresholded matrices in finite samples, with a view to obtaining a positive definite covariance estimation method which is highly scalable. The third research aspect of the project investigates quantifying the variability and uncertainty of estimated graphical network models. A methodology that takes advantage of a convex pseudo-likelihood formulation of the graphical model selection problem is introduced. This allows for the development of a highly scalable uncertainty quantification method with theoretical safeguards. The fourth research aspect of the project examines the use of the methodology proposed in the previous three sub-components to an application in the area of climate change, where high dimensional covariance estimation is required. The proposal also has a significant teaching and outreach component which aims to introduce statistics to aspiring young scientists at various stages of their undergraduate and graduate studies.

The availability of high-throughput data from various applications, including genomics, environmental sciences and others, has created an urgent need for methodology and tools for analyzing high dimensional data. Extracting and making sense of the many complex relationships and multivariate dependencies in the data and developing principled inferential procedures is one of the major challenges facing statisticians and data scientists. The theoretical and methodological work proposed in this project is motivated by applications and interdisciplinary collaborations in fields as diverse as the earth and environmental sciences, genomics and cancer research, and the social sciences. In genomics for instance, one is often interested to know how various genes are associated, and how these associations differ between an experimental (diseased) and control group. Gene regulatory networks also serve as important tools to study the evolutions of diseases. In the context of the climate change debate, modeling temperature at different points on the globe requires parsimonious modeling of the way in which these variables are related. Modeling correlations also arises naturally in material sciences and engineering where one is interested in seeing how different atomic particles interact when new materials are produced. Hence the proposed project for estimating correlations in very high dimensional settings will have widespread applications, since understanding associations/relationships between many variables is an endeavor that is common to many scientific disciplines. The proposed work, though firmly rooted in the statistical sciences, is very much interdisciplinary, and involves collaborations and partnerships between statisticians/data scientists and biomedical scientists, engineers and earth scientists.
资助机构US-NSF
项目经费$293,153.00
项目类型Continuing Grant
国家US
语种英语
文献类型项目
条目标识符http://gcip.llas.ac.cn/handle/2XKMVOVA/211810
推荐引用方式
GB/T 7714
Balakanapathy Rajaratnam.CAREER: Scalable methods for discovering multivariate dependencies in high dimensional data..2017.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Balakanapathy Rajaratnam]的文章
百度学术
百度学术中相似的文章
[Balakanapathy Rajaratnam]的文章
必应学术
必应学术中相似的文章
[Balakanapathy Rajaratnam]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。