|
Anh Tran
I am currently an Electrical Engineering Master's student at the
University of Pennsylvania.
I earned my Bachelor's degree in Electrical Engineering from
VinUniversity.
My background covers computer architecture, parallel computing, FPGA, and deep learning.
I am particularly passionate about bridging these fields to drive innovation and efficiency.
Email /
LinkedIn /
Facebook /
Github
|
|
Projects
I mainly work on computer architecture, digital design, and machine learning systems. In my free time, I also enjoy exploring computer vision, control, and robotics.
|
|
|
SoC Design for Real-Time Data Deduplication and Compression
project page /
code
Designed a full System-on-Chip (SoC) pipeline on the Ultra96-V2 FPGA combining content-defined chunking, SIMD-accelerated SHA256 hashing, and a hardware LZW compressor for real-time deduplication. Achieved over 800 Mb/s end-to-end throughput with a 0.65 compression ratio.
|
|
|
CSRSPMM: Optimized CUDA Kernels for CSR Sparse × Dense MatMul
code
Developed optimized CUDA kernels for CSR sparse–dense matrix multiplication using warp-per-row scheduling, shared-memory buffering, and vectorized loads/stores to handle irregular sparsity patterns. Outperforms cuSPARSE and torch.sparse across various sparsity levels and matrix sizes.
|
|
|
Pipelined RISC-V Processor
project page /
code
Built a 32-bit RISC-V core in SystemVerilog with a five-stage pipeline, full data forwarding, hazard control, instruction and data caches, and an AXI memory interface. Validated against the official RISC-V test suite and successfully deployed on a Lattice ECP5 FPGA.
|
|
|
Efficient DNN via Pruning & Sparse Matrix Compression
project page /
code
Conducted an empirical analysis of pruning patterns across varying sparsity levels and designed a custom CSR-based linear layer to reduce memory footprint and accelerate inference while preserving model accuracy.
|
|
|
Configurable Logic Block Design, Verification and Optimization
project page
Designed a 16-bit Configurable Logic Block (CLB) in Cadence 45 nm Salicide (1.0 V / 1.8 V, 1P11M) technology. The CLB integrates a 16:1 LUT, SRAM-based memory, SIPO register, and a clock generator. Thoroughly verified and optimized to mitigate timing hazards while balancing power and performance.
|
|
|
Auto-driving Engines for F1/10
project page /
code
Built an autonomous driving stack for F1/10-scale cars, combining SLAM-based localization, motion planning, optimal control, and deep learning perception into a unified pipeline. The full system was integrated and tested on the F1TENTH RoboRacer platform.
|
|
|
LC3 Simulator
code
Implemented a C-based simulator for the LC-3 (Little Computer-3) instruction set architecture. Validated by running benchmark assembly programs including interactive applications such as 2048, Hangman, and Rogue.
|
|
|
Automatic Solar Tracker
code
Built a dual-axis solar tracking system using LDR-based light sensing and closed-loop servo control to continuously orient a panel toward peak irradiance. The control logic was implemented in embedded C on a Tiva TM4C123GH6PM microcontroller.
|
|