Summary
Data Scientist | AI/ML & Generative AI | AWS Certified | Building Scalable Predictive & Data-Driven Solutions
AWS Certified Data Engineer with 4+ years of experience designing scalable data pipelines, automating ETL workflows, and delivering cloud native solutions across healthcare, AI, and speech analytics domains.
Built real-time and batch pipelines using AWS (S3, Lambda, RDS, EC2), Python, SQL, and Spark—reducing processing latency by 25%, cutting manual effort by 50%, and enhancing data reliability.
Developed Power BI dashboards and optimized SQL/Oracle queries for 1M+ records, improving analytics speed and operational insights.
OnAir Post: Sankarsh Sanap
About
Education
George Mason University, Fairfax, Virginia (GMU)
Master of Science | Data Analytics Engineering
Jan 2024 – Dec 2025
Concentration: Applied ML, NLP with Deep Learning
CGPA: 3.93/4
Savitribai Phule Pune University (SPPU)
Bachelor of Engineering | Mechanical Engineering
July 2017 – June 2021
CGPA: 3.9/4
Experience
George Mason University, USA
Software Developer Intern
Aug 2025 – Present
Developed real-time speech analytics features using React Native, reducing latency by 25%
Engineered DSP algorithms for speech enhancement, boosting audio clarity by 35%
Built REST APIs with Firebase Cloud Functions and GitLab CI/CD on GCP
EPAM Systems, USA
Data Engineer Intern
Jun 2024 – Jun 2025
Developed scalable data pipelines using Python and Airflow with Hugging Face LLMs
Built AWS infrastructure (EC2, S3, IAM, CloudWatch) for real-time API deployment
Implemented data contracts and validation rules ensuring schema integrity in MongoDB
Tata Consultancy Services (Walgreens Boot Alliance), India
Data Engineer
Aug 2021 – Jan 2024
Led migration of healthcare data to AWS (S3, RDS, Lambda), reducing costs by 30%
Automated ETL workflows for 1M+ daily records, reducing runtime by 50%
Built predictive models in Python and R, improving forecast accuracy by 25%
Anvizon (PwC), India
Project Analyst
Jan 2021 – Jul 2021
Conducted clustering and PCA on 90K+ customer records, improving segmentation by 12%
Developed Faster R-CNN model for defect detection, reducing production downtime by 15%
Improved image segmentation accuracy from 76% to 88%
Skills
• Languages & Scripting: Python, SQL, Java, C#, Perl, Bash, JavaScript
• ML & AI: XGBoost, LightGBM, PyTorch, TensorFlow, scikit-learn, SageMaker, statistical modeling
• GenAI & NLP: LangChain, Bedrock, RAG, LangGraph, GPT APIs, SpaCy, NLTK, Prompt Engineering
• Computer Vision: YOLOv7, Faster R-CNN, OpenCV, CNNs, Image Segmentation
• Cloud & MLOps: AWS (S3, Lambda, EC2, SageMaker, Bedrock, API Gateway), Docker, GitHub Actions, CI/CD • Data Engineering: Pandas, NumPy, Feature Engineering, EDA, ETL Pipelines
• Frameworks & APIs: FastAPI, Flask, REST, GraphQL, Flask-SocketIO
• Databases: PostgreSQL, MySQL, SQLite, MongoDB
• Tools: Git, Jenkins, JIRA, VSCode, Postman, Pytest
Certifications
• AWS Certified Data Engineer
• AWS Certified Cloud Practitioner
• Microsoft Certified Power BI Data Analyst Associate
• Introduction to Generative AI by Google
• Generative AI Fundamentals by Databricks
• Statistics for Data Science and Business Analysis
Web Links
Previous Projects
AniMagi – LLM Support Bot (LangChain + Pinecone)
• Designed and deployed RAG-powered assistant using FastAPI, Pinecone, and OpenAI GPT, automating 65% of FAQs and cutting escalations by 60%.
• Reduced average support resolution time from 2.4 minutes to 45 seconds, increasing agent productivity by 3x.
Computer Vision
• Achieved 85% accuracy training CNNs on FashionMNIST; built panoramic stitcher using OpenCV with 100% alignment precision across datasets.
• Used custom alpha blending + OpenCV homography for seamless image composition; improved rendering time by 30%.
Predicting Stock Market Trends Using Machine Learning (LSTM + ARIMA + Prophet)
• Developed and optimized time-series forecasting models (LSTM, ARIMA, Prophet) to predict stock market trends with improved accuracy through advanced feature engineering and hyperparameter tuning.
• Applied financial indicators (SMA, RSI, MACD) to enhance predictive performance, achieving measurable gains over baseline models.
LLM Threat Fusion: Risk Analysis for Large Language Models
• Conducted structured threat analysis of LLM deployments by synthesizing academic, open-source, and technical sources into a unified framework.
• Designed a risk evaluation framework identifying vulnerabilities (prompt injection, data leakage, model bias) and proposed mitigation strategies for enterprise adoption.
onAir Projects
Server Management
Working with Sai Sriram Uppada…..
AI onAir Summariser
Working with Namita Chougule…
Data Engineering Hub
Working on administrating the Data Engineering Hub
onAir LLM Search
Working with Todd …….
