About

Welcome to my portfolio! I am a Data Science graduate student at the University of Houston and a visiting scholar at Rice University. My interest in data science began during my undergraduate second year, leading me to self-learn and undertake various internships focused on applying machine learning and deep learning models involved in Computer Vision, NLP, and data analysis.

My recent internship during fall 2023 involved data analysis and visualization, enhancing my skills in SQL, Python, Tableau, and KPIs. At the same time, I have been involved in graduate research on Large Language Models (LLMS) and I am actively expanding my skill set in data engineering and cloud services, staying abreast of the latest technologies and practices in the field.

With over 4 years of hands-on experience and a strong academic background, as I near my graduation in May 2024, I am actively looking for full-time roles to further apply and expand my data science expertise. Take a closer look at my portfolio to know more about my projects and skillset.

Skills

Programming Languages:

Python
SQL
C/C++
PHP
HTML/CSS

Tools:

Tableau
Power BI
OpenCV
HTML/CSS
HTML/CSS
HTML/CSS
HTML/CSS
HTML/CSS

Machine Learning Frameworks :

NumPy
Pandas
Scikit-Learn
TensorFlow
PyTorch
PyTorch

Big data and cloud :

AWS
Azure
Hadoop
Snowflake
Hadoop
PySpark

Experience

Generative AI Engineer/Researcher

AICEBERG (May 2024 - Present)

  • Enhanced Large Language Models (LLMs) such as Llama, Mistral, and Falcon, achieving a 25% improvement in performance and efficiency. Employed GPU-enabled machines with PyTorch and CUDA libraries, reducing training time by 40%.
  • ADesigned and managed ETL pipelines for complex datasets, with a focus on instruction-based fine-tuning of LLMs. Boosted data scalability and reliability through comprehensive data processing and integration strategies.
  • Handled datasets from sources like English Wikipedia, improving embedding utility for text similarity and analytics. Utilized custom Python scripts and Spark to enhance data extraction, cleaning, and manipulation, achieving a 50% efficiency increase.
  • Leveraged AWS services such as Redshift and S3 for data warehousing, ensuring seamless integration with existing infrastructure. Improved data access and analysis capabilities for global teams, cutting data retrieval time by 35%.
  • Conducted prompt analysis to assess the relevance of user inputs to customer data. Applied Retrieval-Augmented Generation (RAG) techniques to provide context-based responses, increasing response accuracy by 20%.
  • Managed customer data onboarding using LangChain custom loaders, processing data in various formats, chunking it, and adding metadata. Fine-tuned LLMs on customer data using AWS S3, improving processing speed by 30%.

Data Processing, Analytics, and Data Visualization Specialist

MATA INVENTIVE (Aug 2023 - Dec 2023)

  • Collaborated with cross-functional teams to understand business requirements and extract data-driven insights for decision-making.
  • Automated data transformation tasks using Python scripts, reducing manual workloads, and increasing data processing efficiency and achieved a 30% reduction in processing time, leading to faster decision-making.
  • Conducted A/B testing and statistical analysis to measure the impact of data-driven decisions on key performance indicators (KPIs). Improved KPIs by 15% through iterative testing and optimization, contributing to data-backed strategic decisions.
  • Created data visualizations with PHP, highlighting key insights in machine monitoring and inventory data.

Graduate Research Assistant

UNIVERSITY OF HOUSTON (Jan 2023 - Jul 2023)

  • Deployed and fine-tuned Transformer-based language models from the Hugging Face library for image-to-image translation tasks in the field of computer vision. Achieved a 15% improvement in translation accuracy compared to baseline models.
  • Analyzed the architecture details of different Transformer models, including variations such as BERT, GPT-2, and RoBERTa, and conducted layer-by-layer analysis.
  • Reduced model training time by 20% through optimization of attention mechanisms, enhancing overall efficiency.
  • Extended the application of Transformer models beyond image-to-image translation to natural language tasks, this involved fine- tuning models for tasks such as text classification, sentiment analysis, and language generation.

Machine Learning Engineer Intern

INDIAN INSTITUTE OF TECHNOLOGY HYDERABAD (Jul 2021 - Aug 2022)

  • Attained fine-tuned state-of-the-art object detection models to specifically detect faces within datasets. Attained an impressive accuracy rate of 78% in detecting faces.
  • Played a crucial role in a team that developed a new architecture for image dehazing, a significant achievement that has been recognized through a publication.

Data Science Intern

NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPPALLI (May 2021 - July 2021)

  • Employed both supervised and unsupervised machine learning techniques to analyze and work on ontologies datasets.
  • Utilized RDF (Resource Description Framework) and OWL(Web Ontology Language) for constructing comprehensive ontologies in Extractive Metallurgy, leveraging their capabilities for complex data representation and interrelation. Employed Protégé for efficient ontology editing and integrated machine learning models for automation and enrichment of the ontological data.
  • Contributed to a research paper submitted to the International Conference on Data Science and Security, demonstrating proficiency in data science methodologies.

Projects

DREAM-R Project
DREAM-R: Genomic Analytics for AMD

Group Project : The 'DREAM-R' project, developed as part of the Rice D2K Lab Capstone Program Fall 2023, is a deployable and generalizable genomic analytics python package focused on Age-related Macular Degeneration (AMD). Utilizing advanced machine learning, this Python package aids in identifying key genomic signatures of AMD, facilitating a deeper understanding of the disease's genetic aspects.GitHub Project Presentation Poster

Skills: Genomic data analysis | Feature Engineering | Feature selection and reduction | Machine learning models | statistical methods | Autoencoders | python package
Project Image
Classification of Internet Firewall Dataset

Advanced network intrusion detection system developed using the KDD Cup 1999 dataset, integrating eight models including KNN, SVMs, Discriminant Analysis, Naive Bayes, Decision Tree, and Logistic Regression. Designed to enhance accuracy and reliability in identifying network anomalies. Github

Skills: Machine Learning | Data Analysis | Cybersecurity | Model Evaluation
Project 2 Image
Natural Language Processing with Disaster Tweets

Developed a machine learning model to assess the authenticity of disaster-related tweets, with a focus on improving crisis response and management. Github

Skills: NLTK | BERT | Text mining
Project 3 Image
Facial Landmark Analysis and Anomaly Detection

This project aims to perform facial landmark analysis and anomaly detection using facial landmark detection techniques. It calculates various facial angles and ratios, which can be useful for applications like medical diagnosis, cosmetic surgery planning, and facial anomaly detection. Github

Skills: OpenCV | Facial Landmark detection | Computer vision
Tableau Public Profile
My Tableau Public Profile

Explore my Tableau Public profile showcasing a range of data visualizations and dashboards that highlight my skills in data analysis and storytelling. GitHub Tableau Icon Tableau Public

Skills: Data Visualization | Tableau_dashboards
Semantic Segmentation Project
Semantic Segmentation of Building Damage

This group project focuses on employing CNN and transformer models for the semantic segmentation of building damage from the 2017 Mexico City earthquake. Explore our work on GitHub. GitHub Project Poster

Skills: Semantic segmentation | Transformer-based models | CNN Models
Personal Website Template Project Image
Personal Website Template Modification and Creation

This project involves customizing and enhancing a website template to create a unique and personalized web presence. It includes modifying the HTML5 UP 'Read Only' template to fit personal branding, incorporating responsive design, and adding custom features. Github

Skills: Website | Portfolio creation | html5, css

Education

Visiting Graduate - Data Science

RICE UNIVERSITY

(Aug 2023 - Dec 2023)

GPA: 4/4

Coursework: Data Science Capstone Project

Master of Science in Engineering Data Science

UNIVERSITY OF HOUSTON

(Aug 2022 - May 2024)

GPA: 3.82/4

Coursework: Introduction to Data Science, Introduction to Machine Learning, Probability and Statistics, Text Mining, Database Management System, Cybersecurity Data Analytics, Deep Learning, Information Visualization

B.Tech in Materials Science & Computer Science Minor

NIT TIRUCHIRAPPALLI

(Aug 2018 - Aug 2022)

GPA: 3.57/4

Coursework: Data Structures and Algorithms, Database Management Systems, Network Security, Operating Systems, Data Communication and Networks, Big Data Analytics

Publications

Abhi Jaiswal, Sharin Shahana K C, Sujitha Ravichandran, Adarsh K, Bharat Bhat, Biresh Kumar Joardar, Sumit K. Mandal

IEEE Journal on Emerging and Selected Topics in Circuits and Systems, July 2024

Onkar Susladkar, Gayatri Deshmukh, Subhrajit Nag, Ananya Mantravadi, Dhruv Makwana, Sujitha Ravichandran, Sai Chandra Teja R, Gajanan H Chavhan, C Krishna Mohan, Sparsh Mittal

Journal of Systems Architecture, Elsevier, November 2022

Sujitha Ravichandran and G. Deepak

2022 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2022, pp. 01-05