Linkedin

BIOINFORMATICIAN

AI/ML Research Scientist, specializing in applying computational methods to analyze, model, and predict in the domains of medical research and protein structures. Experienced in developing and deploying bioinformatics pipelines for large-scale datasets including next-generation sequencing data. Proficient in Python and Linux/Unix, with expertise in experimental design, data interpretation, and machine learning applications.



TECHNICAL SKILLSProgramming

Python, Bash, Go, Java

Packages

Hail, Pymol, Pandas, Numpy, Pytorch
Matplotlib, Scikit-learn

Tools

Git, GCP, Linux (Ubuntu), Conda, Amber
Molecular Dynamics package



EDUCATION


Master of Science in Computational Biology
Carnegie Mellon University
2021 - 2023

Bachelor of Arts in Biology
Grinnell College
2012 - 2018



RESEARCH


Substance Abuse Prediction in Depression 
Patients using DeepLearning
Mar 2024 - Ongoing

Project Advisor: Dr.LiRong Wang
University of Pittsburgh Pittsburgh, PA

Designed and implemented Python pipeline for processing Electronic Health Record (EHR) data, adapting
BERT transformer to use EHR codes for single disease prediction task on NIH All of Us platform.

Predicted risk of alcohol and substance abuse in at-risk patients with ROAUC of 0.95 and accuracy of 92%, surpassing the baseline logistic regression model by 0.13 in ROAUC and 5% in accuracy.


Hidden Markov Model Guided Predictive
Enzyme Mutagenesis
Oct 2023 - Jan 2024

Project Advisor: Dr.Peng Liu
University of Pittsburgh Pittsburgh, PA

Utilized Amber package to run protein-ligand Molecular Dynamics (MD) simulation and extract time series atomic distance data from MD trajectory data.

Implemented Python pipeline identifying potential key positions at active site for mutations using Hidden- Markov Model (HMM) to improve enzyme-substrate binding specificity based on atomic-distance data.


Preterm Infant Growth Trajectory Prediction
using Microbiome
May 2022 - Aug 2022

Project Advisor:
Ziv Bar-Joseph, Jose Lugo-Martinez
CMU Pittsburgh, PA

Developed a HMM classifier in Python from scratch to predict growth trajectories (normal vs. faltering) in preterm infants using gut microbiome data, demonstrating proficiency in Python and machine learning.

Despite achieving a moderate AUC-ROC of 0.67, provided insights into limitations of HMM in sparse microbiome datasets, guiding research towards more suitable model, Input-Output HMM, to improve prediction.





COURSEWORK PROJECTS


Small Molecule Anti-Bacterial Potency
Screening using Active Learning
Feb 2023 - May 2023

Implemented Expected Model Change (EMC) active learning method using logistic regression as base model in Python script, which can be applied to general binary classification tasks with tabular datasets.

Using EMC method, achieved accuracy of 71% in predicting antibacterial effect of small molecules with 40% fewer samples than random sampling to achieve same accuracy, demonstrating efficiency of in reducing sample size during model training while effectively identifying informative samples.


Motif Search in Glioblastoma
Protein-Protein Interaction Network
Feb 2023 - May 2023

Developed a Python pipeline for identifying motifs in glioblastoma protein-protein interaction networks, employing NetworkX package for network generation and sub-graph isomorphism motif searching algorithm.

Designed a comparative analysis of PPI motifs between random and glioblastoma networks, revealing significantly different motif distributions and predicting motif functions through gene/protein ontology analysis.


MAZE GAME

Exit finding maze game created using pygame.
Implemented using: python 3.11.5, pygame 2.5.1


Link to Repository



This is an exit finding maze game. Mouse and keyboard are needed to play this game. Under the pygame framework, maze generating algorithm is used to generate new maze every time player clicks on a level (easy, medium, hard). There is only one correct path from start to finish.
  •  

Maze Generation

Grid frame of the maze is a nested list. Node object was added to the grid and contains directions (or neighbors) which it can go to. Direction to parents and the selected directions each node takes makes the path of the maze. Start and end node is separated when the maze initializes to be in a diagonal position.

After implementing the maze generating algorithm, I learned that I was using the "Hunt and Kill" maze generating algorithm. In the hunt mode, the algorithm branches out until it meets a dead end or the end node. Once it reaches a deadend, it goes into the kill mode, finding new path from visited nodes. If there are no more paths that can be branched out from visited nodes, the algorithm finishes.

More of different types of maze generating algorithms are explained here:

Link

The algorithm was tuned so that it sticks either to the wall or visited nodes when finding paths to make nice looking mazes.


Pygame Framework

To make the maze in pygame, rectangles were generated for the walls, player and the end location. The game was designed so that when the player rectangle collides with the wall it stops moving. So, the player can only move along the path of the maze.

Button.py was feched from the following repository:

Link

Background music and the song that comes out when maze is finished are both from bensound, a copyright free music website.