Sajjad Ayoubi Email : Mobile : +98-910-163-0998
Data scientist with more than 3 years of experience building predictive models in NLP and CV using advanced
DL techniques and classical ML methods. I’m an active contributor to the ML open-source community and have
some datasets and models available for public consumption. Exploring data and training networks are two of my
passions. Using artificial intelligence and software engineering, I solve real-world data-driven problems.
2.1 Vocational Tehran, Iran
Machine Learning Engineer Oct. 2020 June. 2022
Question Answering: Mellat Bank’s central branch used our chatbot to answer questions from employees
working in other branches regarding the banking procedures. The system reduced the need for human experts to
answer the questions by 50 percent using a question answering model and rule-based conversation.
Automatic Speech Recognition: For our chatbot system to answer audio-based questions (phone calls), we
added an ASR model to convert them into text, and we collected over 10K hours of weakly-supervised ASR data.
With our Speech Recognition system based on Wav2vec, we were able to convert short audio into text.
Hamtech Tehran, Iran
Machine Learning Engineer Intern Summer 2020
Text Classification: Under Supervision of Dr. Khaligh Razavi I Created models for categorizing Persian
research papers in IranDoc Repository and comparing different models such as BagOfWords, ULMFit, BERT.
Chatbot: Under Supervision of Dr. Khaligh Razavi I Created a chatbot system that answers to some domain
specific question based on SQuADv2 question answering dataset and deploying it as a Telegram bot.
2.2 Teaching
Hamtech Internship Program Tehran, Iran
Machine Learning Engineer Mentor 2021-22 Summer
Deep Learning Development (RecycleIt Project): Interns were supposed to create an ML-powered product
that could detect different types of bottles and recycle them. It was my responsibility to guide them through the
process of data collection, exploring the data, creating data pipelines, and creating product-ready models that could
be deployed and monitored.
ML Study Group: The study group I led covered Machine Learning research papers, best practices, how to
explore the field, and how to implement ML research papers and applications to improve their resumes.
Open Source Projects
PersianQA: First Persian Question Answering dataset consists of 10K pair of QA along with training a benchmark
model for the task using BERT. Both the model and dataset are publicly available.
FaceLib: An Open source Python library for face analysis tasks such as face detection, face recognition, age gender
estimation, face alignment and emotion detection.
CLIPfa: The only Persian version of OpenAI’s CLIP that connects text and images in embedding space using
contrastive learning. For this project, we collected 400K pairs of images and Farsi captions to train such a model. Both
the model and dataset are publicly available as well as an online demo of text-based image search using CLIP.
Pars BigBird: This is the first Persian Transformer-based Language model with O(n) complexity and can work over
texts with more than 4096 sub-tokens, which was trained on the Persian section of Oscar dataset. The English version
from Google Research inspired this work. It outperforms ParsBert on various tasks.
B.Sc. in Computer (Software) Engineering Tehran, Iran
Shamsipour Technical and Vocational College; GPA: 3.60 Oct. 2021 Now
Selected Courses: Algorithms(20/20) - Network Engineering(19/20) - Databases(20/20)
A.A.S. in Computer (Software) Engineering Tehran, Iran
Shamsipour Technical and Vocational College; GPA: 3.68 (18.15/20.00) Oct. 2019 June. 2021
Thesis: Face Analysis with Deep Learning approaches
Selected Courses: Discrete Math(20/20) - Data Structures(18/20) - Advanced Programming(20/20)
Technical Skills
Deep Learning/Machine Learning: Designing ML Pipelines, Data Visualization, Model Selection, Exploratory Data
Analysis, Decision Trees, Regression, Transformers, Generative Models, Graph Neural Nets, Convectional Nets
Packages: PyTorch, TensorFlow/Keras, HuggingFace, Scikit-learn, Numpy, Pandas, OpenCV, Plotly
Other Skills: Familiar with Linux, Git, SQL, Docker, Data Structures, Algorithms
Online Courses
Mathematics for Machine Learning Specialization (Coursera): The Empirical College of London teaches Linear
Algebra, Vector Calculus, and Probability and their applications in machine learning. (Certificate here)
NLP Specialization (Coursera): group teaches classic text processing methods to advanced methods
such as attention-based language models, BERT and T5. (Certificate here)
Stanford’s Machine Learning (Stanford Online): Machine Learning Course by Stanford online in
taught by Andrew Ng discussing basic ML topics such as Bias-Variance, Linear models , etc.
Full Stack Deep Learning (FSDL): Taught by FSDL group discussing how to create an ML-powered application and
its best practices from data collection to deployment.
Persian: Native proficiency English: Limited working proficiency