Education

University of California San Diego — M.S. in Computer Science

Sept 2024 — Ongoing • GPA: 3.96/4.0

Relevant coursework: Graduate Computer Security, Algorithm Design and Analysis, Principles of Database Systems, Principles of Programming Languages, Intro to Embedded Computing, Virtualization and Cloud Computing, Networked Services, Probabilistic Reasoning and Learning, Parallel Programming, Systems for LLM and AI Agents

Coursework Highlights and Skills

CUDA Matrix Multiplication Optimization

  • Implemented high-performance GPU matrix multiplication with tiling, shared memory, coalesced accesses, and ILP.
  • Achieved up to 3.27 TFLOPS, 25x speedup over naive implementation, and competitive performance with cuBLAS.
  • Performed roofline analysis and tuned for edge cases (non-multiples of tile/warp sizes).

2D Wave Simulation using MPI

  • Parallelized a 2D wave equation solver using MPI with a 5-point stencil.
  • Implemented ghost-cell communication for arbitrary 1D and 2D processor geometries.
  • Optimized memory layout for distributed grids to enable scaling across up to 384 cores.
  • Conducted performance and scaling experiments on SDSC Expanse, achieving near-linear speedup.
  • Validated correctness against serial implementation and measured MPI communication overhead.

University of California Santa Cruz — B.S. in Computer Science

Sept 2020 — June 2024 • GPA: 3.85/4.0

Graduated Cum Laude (top ~15% of graduating class by GPA).

Coursework Highlights and Skills

Operating System Kernel Project (PintOS)

  • Implemented kernel subsystems including process management, system calls, and synchronization primitives in C.
  • Built virtual memory system with lazy loading, swap management, and clock page replacement.
  • Designed and implemented a file system with indexed inodes and fine-grained locking.
  • Developed user–kernel memory safety checks and page fault handling.

Distributed Key-Value Store with Causal Consistency (Python)

  • Designed and implemented a replicated key-value store with causal consistency using vector clocks.
  • Implemented replication and broadcast mechanisms using multithreaded worker queues.
  • Built failure detection and view management through HTTP-based replica health monitoring.
  • Implemented causal dependency tracking with vector clocks and deferred request processing.
  • Developed concurrent request handling using Python threading and worker pools.
  • Containerized distributed system using Docker.