About

Hey! I’m Pranay Mundra, a Research Data Engineer II at the University of Rochester Medical Center, Department of Biostatistics & Computational Biology working with the McCall Research Lab.

My Background

I graduated from the Univeristy of Rochester, Department of Computer Science with a M.Sc. in Computer Science, advised by Professor Fatemeh Nargesian. As a Graduate Research Assistant, I developed KOIOS, a novel filter verification system designed for top-k semantic similarity set search. Concurrently, I contributed to the development of Quok, a system tailored for approximate query answering over Open Knowledge. Additionally, I delved into the domain of coreset construction for machine learning, focusing on the creation of Fair K Coverage (FKC) coresets. My research aimed to select a substantially smaller subset of data, termed a coreset, capable of approximating the entire dataset. FKC coresets, designed to incorporate the vital data properties of fairness and coverage.

During my Master’s I was a visiting researcher at the Paris Lodron Universität Salzburg Database Research Group advised by Professor Martin Schäler. Together, we spearheaded the development of an alignment algorithm with the noble purpose of unveiling intricate language patterns and discerning semantic similarities within biblical texts spanning numerous languages and historical epochs. Additionally, I was bestowed the honor of serving as a visiting researcher at the esteemed MIT CSAIL Parallel Computing Group, where I had the privilege of working with Professor Quanquan Liu and Professor Julian Shun. In this project, we are working on an implementation of a benchmark suite that paves the way for distributed privacy-preserving locally adjustable graph algorithms.

Previously, I graduated from the University of Washington in Seattle with a B.Sc. in Mathematics. I worked on a variety of software engineering projects during the course of my four years, from research papers to side projects. Maimon, a framework that aids in approximating acyclic schema discovery from relations using Multivalued Dependencies, has been one of my main projects. Additionally, I’ve worked with LightDB and other Visual Database Management Systems. The Caltech Tensorlab is another place I’ve worked as a research intern. There, I researched how effectively the present neural network architecture can understand the compositional nature of data and helped develop a new architecture called Tree Stack Memory Units.

My Research

My main areas of interest are Database Systems, Machine Learning and Differential Privacy. I like working on data discovery and mining problems like minimizing data bias, improving data quality for machine learning, differential privacy systems, and query models. My current project involves selecting coresets for huge datasets in order to train ML models on the coresets with equivalent accuracy, lowering computational costs and enhancing speed. I’m also developing a solution that will allow for rapid aggregate queries over massive knowledge graphs with missing values. I also enjoy working on problems in number theory, group theory, probability, combinatronics, and graph theory with applications in computer science.