Group Project : The 'DREAM-R' project, developed as part of the Rice D2K Lab Capstone Program Fall 2023, is a deployable and generalizable genomic analytics python package focused on Age-related Macular Degeneration (AMD). Utilizing advanced machine learning, this Python package aids in identifying key genomic signatures of AMD, facilitating a deeper understanding of the disease's genetic aspects.GitHub Project Presentation Poster
Skills: Genomic data analysis | Feature Engineering | Feature selection and reduction | Machine learning models | statistical methods | Autoencoders | python package
About
Welcome to my portfolio! I am a Data Science graduate student at the University of Houston and a visiting scholar at Rice University. My interest in data science began during my undergraduate second year, leading me to self-learn and undertake various internships focused on applying machine learning and deep learning models involved in Computer Vision, NLP, and data analysis.
My recent internship during fall 2023 involved data analysis and visualization, enhancing my skills in SQL, Python, Tableau, and KPIs. At the same time, I have been involved in graduate research on Large Language Models (LLMS) and I am actively expanding my skill set in data engineering and cloud services, staying abreast of the latest technologies and practices in the field.
With over 4 years of hands-on experience and a strong academic background, as I near my graduation in May 2024, I am actively looking for full-time roles to further apply and expand my data science expertise. Take a closer look at my portfolio to know more about my projects and skillset.
Skills
Programming Languages:





Tools:







Machine Learning Frameworks :






Big data and cloud :






Experience
Generative AI Engineer/Researcher
AICEBERG (May 2024 - Present)
- Enhanced Large Language Models (LLMs) such as Llama, Mistral, and Falcon, achieving a 25% improvement in performance and efficiency. Employed GPU-enabled machines with PyTorch and CUDA libraries, reducing training time by 40%.
- ADesigned and managed ETL pipelines for complex datasets, with a focus on instruction-based fine-tuning of LLMs. Boosted data scalability and reliability through comprehensive data processing and integration strategies.
- Handled datasets from sources like English Wikipedia, improving embedding utility for text similarity and analytics. Utilized custom Python scripts and Spark to enhance data extraction, cleaning, and manipulation, achieving a 50% efficiency increase.
- Leveraged AWS services such as Redshift and S3 for data warehousing, ensuring seamless integration with existing infrastructure. Improved data access and analysis capabilities for global teams, cutting data retrieval time by 35%.
- Conducted prompt analysis to assess the relevance of user inputs to customer data. Applied Retrieval-Augmented Generation (RAG) techniques to provide context-based responses, increasing response accuracy by 20%.
- Managed customer data onboarding using LangChain custom loaders, processing data in various formats, chunking it, and adding metadata. Fine-tuned LLMs on customer data using AWS S3, improving processing speed by 30%.
Data Processing, Analytics, and Data Visualization Specialist
MATA INVENTIVE (Aug 2023 - Dec 2023)
- Collaborated with cross-functional teams to understand business requirements and extract data-driven insights for decision-making.
- Automated data transformation tasks using Python scripts, reducing manual workloads, and increasing data processing efficiency and achieved a 30% reduction in processing time, leading to faster decision-making.
- Conducted A/B testing and statistical analysis to measure the impact of data-driven decisions on key performance indicators (KPIs). Improved KPIs by 15% through iterative testing and optimization, contributing to data-backed strategic decisions.
- Created data visualizations with PHP, highlighting key insights in machine monitoring and inventory data.
Graduate Research Assistant
UNIVERSITY OF HOUSTON (Jan 2023 - Jul 2023)
- Deployed and fine-tuned Transformer-based language models from the Hugging Face library for image-to-image translation tasks in the field of computer vision. Achieved a 15% improvement in translation accuracy compared to baseline models.
- Analyzed the architecture details of different Transformer models, including variations such as BERT, GPT-2, and RoBERTa, and conducted layer-by-layer analysis.
- Reduced model training time by 20% through optimization of attention mechanisms, enhancing overall efficiency.
- Extended the application of Transformer models beyond image-to-image translation to natural language tasks, this involved fine- tuning models for tasks such as text classification, sentiment analysis, and language generation.
Machine Learning Engineer Intern
INDIAN INSTITUTE OF TECHNOLOGY HYDERABAD (Jul 2021 - Aug 2022)
- Attained fine-tuned state-of-the-art object detection models to specifically detect faces within datasets. Attained an impressive accuracy rate of 78% in detecting faces.
- Played a crucial role in a team that developed a new architecture for image dehazing, a significant achievement that has been recognized through a publication.
Data Science Intern
NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPPALLI (May 2021 - July 2021)
- Employed both supervised and unsupervised machine learning techniques to analyze and work on ontologies datasets.
- Utilized RDF (Resource Description Framework) and OWL(Web Ontology Language) for constructing comprehensive ontologies in Extractive Metallurgy, leveraging their capabilities for complex data representation and interrelation. Employed Protégé for efficient ontology editing and integrated machine learning models for automation and enrichment of the ontological data.
- Contributed to a research paper submitted to the International Conference on Data Science and Security, demonstrating proficiency in data science methodologies.
Projects

Advanced network intrusion detection system developed using the KDD Cup 1999 dataset, integrating eight models including KNN, SVMs, Discriminant Analysis, Naive Bayes, Decision Tree, and Logistic Regression. Designed to enhance accuracy and reliability in identifying network anomalies. Github
Skills: Machine Learning | Data Analysis | Cybersecurity | Model Evaluation
Developed a machine learning model to assess the authenticity of disaster-related tweets, with a focus on improving crisis response and management. Github
Skills: NLTK | BERT | Text mining
This project aims to perform facial landmark analysis and anomaly detection using facial landmark detection techniques. It calculates various facial angles and ratios, which can be useful for applications like medical diagnosis, cosmetic surgery planning, and facial anomaly detection. Github
Skills: OpenCV | Facial Landmark detection | Computer vision
Explore my Tableau Public profile showcasing a range of data visualizations and dashboards that highlight my skills in data analysis and storytelling.
GitHub
Tableau Public

This group project focuses on employing CNN and transformer models for the semantic segmentation of building damage from the 2017 Mexico City earthquake. Explore our work on GitHub. GitHub Project Poster
Skills: Semantic segmentation | Transformer-based models | CNN Models
This project involves customizing and enhancing a website template to create a unique and personalized web presence. It includes modifying the HTML5 UP 'Read Only' template to fit personal branding, incorporating responsive design, and adding custom features. Github
Skills: Website | Portfolio creation | html5, cssEducation


Master of Science in Engineering Data Science
UNIVERSITY OF HOUSTON
(Aug 2022 - May 2024)
Coursework: Introduction to Data Science, Introduction to Machine Learning, Probability and Statistics, Text Mining, Database Management System, Cybersecurity Data Analytics, Deep Learning, Information Visualization