News
- September 2023: Started MMath CS at UWaterloo! Supervised by Kate Larson and Edith Law.
- April 2021: Awarded NSERC Undergraduate Student Research Award (USRA) Grant to pursue research with Dr. Charles Perin on emotion and data visualization.
Projects
Towards Online PU-Learning
PU-learning is the task of learning a positive vs. negative (PvN) classifier from only positive and unlabeled data. In general, this task can be split up into two main parts: (1) Mixture proportion estimation (MPE), the task of estimating the proportion of positive examples in the unlabeled data, and (2) using this estimate to help train a PvN classifier. This project focused on mixture proportion estimation, and the goal was to build the first algorithm to do mixture proportion estimation in the online setting. In pursuit of this, I built TOM-ON, an algorithm for MPE that runs in constant time and accurately predicts the mixture proportion. And in addition to this, I developed another algorithm for the batch setting, TOM, that achieves state-of-the-art performance.
Context, Color and Emotion in Data Visualization
The goal of this project is to understand the relationship between the emotions induced by a data visualization and the visual features of visualization itself. To uncover these relationships, we are using traditional hypothesis testing. Phase 1 of this study focused on the relationship between the color of a visualization and emotion, Phase 2 focused on the relationship between properties of the data (e.g. trend) and emotion, and Phase 3, which we are working on now, is focused on the relationship between data labelling and the induced emotion. This project is funded by an NSERC USRA grant.
Visualization of the Canadian Community Health Survey from 2015-2019
Statistics Canada releases the anonymized results from their Canadian Community Health Survey every year, but the data tends to sit lifelessly in tables. The goal of this project was to make this valuable resource more interpretable by visualizing it using D3.js.
Work Experience
-
Junior Data Scientist
NannyML
January 2023 - Present- Developed novel, linear time univariate and multivariate drift detection methods based on the Maximum Mean Discrepancy.
- Implemented standard drift detection methods in our open-source library, including Earth Mover’s distance, Hellinger distance, Jensen-Shannon distance, and L-Infinity distance.
- Wrote documentation outlining the strengths, weaknesses and differences between various drift detection methods for continuous and categorical features. Developed experiments and visualizations to demonstrate these differences.
-
Data Science Intern
NannyML
July 2022 - December 2022- Processed and built models for many image, text, and tabular data sets in order to test and validate new monitoring features.
- Built and deployed an end-to-end model using Docker and Kubernetes to demonstrate product capabilities in real-world use cases to potential customers.
-
Teaching Assistant, Python Programming
University of Victoria
September 2021 - December 2021- Taught four 2-hour lab sessions per week and graded midterm and final exams.
- Accompanied Dr. Celina Berg in lectures to aid in answering student questions.
-
Teaching Assistant, Java Programming
University of Victoria
September 2020 - December 2020- Co-lead three 2-hour lab sessions per week and graded assignments and exams.
-
Research Assistant & Software Developer
Applied and Theoretical Neuroscience Lab, University of Victoria
September 2020 - December 2020- Built software to remove artifacts from MUSE EEG data using ICA, from the ground up.
- Refined data handling processes.
Technical Skills
Languages: Python, R, C++, CUDA, SQL, Java, C, MATLAB, JavaScript, LaTeX
Developer Tools: Git, VS Code, Atom, Jupyter
Libraries: PyTorch, Scikit-Learn, SciPy, AutoKeras, NumPy, Pandas, NLTK, Matplotlib, seaborn, D3.js
Applications: Excel