Senior AI Researcher, Software Engineer, UofT'20
As a Sr. ML Researcher, I conduct cutting-edge research on language and vision models, with a focus on alignment, efficiency, and generation. I have led and contributed to several projects on training and fine-tuning large language models (LLMs), retrieval-augmented generation, and multimodal foundation models, using deep learning frameworks like PyTorch and TensorFlow. With a background in both software and hardware, my work has ranged from model architecture design & optimization, inference/training runtime performance, memory optimization, and HW/SW-codesign. I have a strong academic background with a master's degree from the University of Toronto. I am proficient in Python/C++/CUDA, ML frameworks, and LLMs. I am passionate about advancing the state-of-the-art in AI and applying it to real-world problems.
M.A.Sc., Electrical and Computer Engineering, UNIVERSITY OF TORONTO
Chair of Embedded Security, RUHR UNIVERSITY BOCHUM
B.Sc., Electrical and Electronics Engineering, GERMAN UNIVERSITY IN CAIRO
Below are some of my skills, and I'm always looking to learn more.
Extensive research optimizing CNN, RNN, and transformer-based models targetting CV, NLP, and Speech applications. Research work includes model compression techniques (e.g., low-rank tensor decomposition, and layer truncation), knowledge distillation, data-efficient training (e.g., data sampling, and token dropping) of large transformer-based models. Models I've worked on include BERT, CPM, GPT-2/3, ViT, Linformer, Conformer, and Swin-transformer.
In-depth experience with PyTorch and TensorFlow frameworks through numerous R&D projects.
Experience with Python for programming and scripting.
In-depth experience with C/C++ programming, efficient implementation of data structures and algorithms, and the use of the STL library. Projects I've worked on includes building a cycle-accurate simulator for custom ML hardware accelerators called DNNsim.
Hands-on experience with GPU programming using CUDA and code optimization through various projects including building a Compressed-Memory Sprase DNN Inference Accelerator on GPU .
Hands-on experience with Perl, TCL, Bash scripting through various FPGA and ASIC design projects.
In-depth experience with FPGA and ASIC Design using Verilog, SystemVerilog, and VHDL including designing a custom novel ML training hardware accelerator called FPRaker (published in MICRO'21).
Hands-on experience with DevOps tools such as git, docker, conda, JIRA through various projects.
Hands-on experience with the different ASIC design tools such as Intel Quartus Prime, Xilinx ISE, Synopsys Design Compiler, HSPICE & HSIM,Cadence SoC En- counter, Innovus & Virtuoso.
Here you can see some of the open-source projects I've done on my own time.
In my free time, I continue to work on personal projects and have many ideas just waiting to be realized.
Winner of Huawei Quarterly Outstanding Contribution to Project Award. [October 2020]
University of Toronto Edward S. Rogers Sr. Graduate Scholarship for 2 years. [2019 & 2020]
Ruhr University Bochum Undergraduate Research Award for 1 year. [2017]
German University in Cairo High School Excellence Scholarship for 5 years. [2013-2018]
Omar Mohamed Awad. 2019
M. Nikolić, G. Hacene, C. Bannon, A. Lascorz, M. Courbariaux, O. Mohamed Awad, I. Vivancos, Y. Bengio, V. Gripon, A. Moshovos
2024 IEEE International Symposium on Circuits and Systems (ISCAS)
F. Ataiefard, W. Ahmed, H. Hajimolahoseini, S. Asani, F. Javadi, M. Hassanpour, O. Mohamed Awad, A. Wen, K. Liu, Y. Liu
Association for the Advancement of Artificial Intelligence (AAAI 2024)
H. Hajimolahoseini, O. Mohamed Awad, W. Ahmed, A. Wen, S. Asani, M. Hassanpour, F. Javadi, M. Ahmadi, F. Ataiefard, K. Liu, Y. Liu
37th Conference on Neural Information Processing Systems (NeurIPS 2023)
F. Javadi, W. Ahmed, H. Hajimolahoseini, F. Ataiefard, M. Hassanpour, S. Asani, A. Wen, O. Mohamed Awad, K. Liu, Y. Liu
37th Conference on Neural Information Processing Systems (NeurIPS 2023)
M. Elgammal, O. Mohamed Awad, I. Edo, A. Moshovos, V. Betz
HEART ’23 : Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
H. Hajimolahoseini, M. Rezagholizadeh, V. Partovinia, M. Tahaei, O. Mohamed Awad, Y. Liu
35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks · Dec 6, 2021
O. Mohamed Awad, H. Hajimolahoseini, M. Lim, G. Gosal, W. Ahmed, Y. Liu , G. Deng
Hardware Aware Efficient Training (HAET) at ICLR 2021
(Winner paper of the Hardware Aware Efficient Training competition at ICLR 2021)
O. Mohamed Awad, M. Mahmoud, I. Edo, A. Hadi Zadeh, C. Bannon, A. Moshovos
54th IEEE/ACM International Symposium on Micro-architecture (MICRO), 2021. [Acceptance Rate : 21%]
A. Hadi Zadeh, I. Edo, O. Mohamed Awad, A. Moshovos
53rd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020. [Acceptance Rate : 19%]
M. Mahmoud, I. Edo, A. Hadi Zadeh, O. Mohamed Awad, J. Albericio, A. Moshovos
53rd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020. [Acceptance Rate : 19%]
A. Delmás, S. Sharify, I. Edo, D. Malone Stuart, O. Mohamed Awad, P. Judd, M. Mahmoud, M. Nikolic, K. Siu, Z. Poulos, and A. Moshovos
52nd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019. [Acceptance Rate : 23%]
C. Kison, O. Mohamed Awad, M. Fyrbiak, C. Paar
IEEE Transactions on Information Forensics and Security, 2019
A short summary of my work experience..
- Mentoring new hires in the team during their 3-month ramp-up program (setting goals, technical/non-technical mentorship, review/evaluate progress)
- Interviewing candidates for our internal SW/HW co-design project (exploring the interplay between ML arch/training algos with HW to influence next release of Huawei's custom ML accelerator).
- Designing efficient Transformer + SSM Hybrid LLMs.
- Leading a HW/SW-codesign project on sub-8-bits training and inference of LLMs.
- Training Memory-augmented LLMs to support long/unlimited context.
- Researching data-efficient training algorithms that provide significant time-to-accuracy savings in pretraining LLMs.
- Working on a data-efficient training library that provides various out-of-the-box algorithms to accelerate the training of any arbitrary LLM.
- Main contributor of 2 internal patents (one for a novel dataset sampling method to accelerate model
training, and another novel method to accelerate MoE transformer models).
- Winner of the Data Application Acceleration Lab's Individual Award of Q1, 2023.
- Analysis/debugging/tuning of end-to-end performance (starting from model implementation in TensorFlow/PyTorch all down to microcode running on chip) of deep learning models on the Cerebras CS-2 Wafer.
- Performance modeling/projection of upcoming models (eg. Vision Transformer, Linformer) and kernels (e.g. attention) to be supported
- Optimize the training performance of various state-of-the-art NLP models (BERT, CPM, GPT-2/3) on
Huawei’s Ascend910 AI training server.
- Kernels development and performance optimization for Huawei’s Ascend910 AI training server.
- Researching model compression techniques, e.g., low-rank tensor decomposition and layer trunca-
tion.
- Researching knowledge distillation techniques to improve accuracy of compressed models.
- Design of a neural network training accelerator based on a novel processing element architecture
that exploits fine-grain unstructured sparsity to increase the performance and energy efficiency of
the training process by 1.47× and 1.39×, respectively on average over the studied models.
- Development of a custom cycle-accurate trace-based simulator (C/C++) to model the execution time
and memory access of the proposed accelerator compared to a baseline value-agnostic accelerator.
- Exploiting the narrow floating-point value distribution during training through exponent base-delta
encoding compression to save off-chip memory bandwidth by 30% on average.