Anh Tran

I am currently an Electrical Engineering Master's student at the University of Pennsylvania. I earned my Bachelor's degree in Electrical Engineering from VinUniversity.

My background covers computer architecture, parallel computing, FPGA, and deep learning. I am particularly passionate about bridging these fields to drive innovation and efficiency.

Profile photo

Projects

I mainly work on computer architecture, digital design, and machine learning systems. In my free time, I also enjoy exploring computer vision, control, and robotics.

SoC Design for Real-Time Data Deduplication and Compression
SoC Design for Real-Time Data Deduplication and Compression
project page / code

Designed a full System-on-Chip (SoC) pipeline on the Ultra96-V2 FPGA combining content-defined chunking, SIMD-accelerated SHA256 hashing, and a hardware LZW compressor for real-time deduplication. Achieved over 800 Mb/s end-to-end throughput with a 0.65 compression ratio.

CSRSPMM: Optimized CUDA Kernels for CSR Sparse × Dense MatMul
CSRSPMM: Optimized CUDA Kernels for CSR Sparse × Dense MatMul
code

Developed optimized CUDA kernels for CSR sparse–dense matrix multiplication using warp-per-row scheduling, shared-memory buffering, and vectorized loads/stores to handle irregular sparsity patterns. Outperforms cuSPARSE and torch.sparse across various sparsity levels and matrix sizes.

Pipelined RISC-V Processor
Pipelined RISC-V Processor
project page / code

Built a 32-bit RISC-V core in SystemVerilog with a five-stage pipeline, full data forwarding, hazard control, instruction and data caches, and an AXI memory interface. Validated against the official RISC-V test suite and successfully deployed on a Lattice ECP5 FPGA.

Efficient DNN via Pruning & Sparse Matrix Compression
Efficient DNN via Pruning & Sparse Matrix Compression
project page / code

Conducted an empirical analysis of pruning patterns across varying sparsity levels and designed a custom CSR-based linear layer to reduce memory footprint and accelerate inference while preserving model accuracy.

Configurable Logic Block Design, Verification and Optimization
Configurable Logic Block Design, Verification and Optimization
project page

Designed a 16-bit Configurable Logic Block (CLB) in Cadence 45 nm Salicide (1.0 V / 1.8 V, 1P11M) technology. The CLB integrates a 16:1 LUT, SRAM-based memory, SIPO register, and a clock generator. Thoroughly verified and optimized to mitigate timing hazards while balancing power and performance.

Auto-driving Engines for F1/10
Auto-driving Engines for F1/10
project page / code

Built an autonomous driving stack for F1/10-scale cars, combining SLAM-based localization, motion planning, optimal control, and deep learning perception into a unified pipeline. The full system was integrated and tested on the F1TENTH RoboRacer platform.

LC3 Simulator
LC3 Simulator
code

Implemented a C-based simulator for the LC-3 (Little Computer-3) instruction set architecture. Validated by running benchmark assembly programs including interactive applications such as 2048, Hangman, and Rogue.

Automatic Solar Tracker
Automatic Solar Tracker
code

Built a dual-axis solar tracking system using LDR-based light sensing and closed-loop servo control to continuously orient a panel toward peak irradiance. The control logic was implemented in embedded C on a Tiva TM4C123GH6PM microcontroller.

Contact

Feel free to reach out via email:
anhthh@seas.upenn.edu
tranhuyhoanganh220702@gmail.com

Or connect on LinkedIn as Huy Hoang Anh Tran.