Sankarsh Sanap

Summary

Data Scientist | AI/ML & Generative AI | AWS Certified | Building Scalable Predictive & Data-Driven Solutions

AWS Certified Data Engineer with 4+ years of experience designing scalable data pipelines, automating ETL workflows, and delivering cloud native solutions across healthcare, AI, and speech analytics domains.

Built real-time and batch pipelines using AWS (S3, Lambda, RDS, EC2),  Python, SQL, and Spark—reducing processing latency by 25%, cutting manual effort by 50%, and enhancing data reliability.

Developed Power  BI dashboards and optimized SQL/Oracle queries for 1M+ records, improving analytics speed and operational insights.

OnAir Post: Sankarsh Sanap

About

Education

George Mason University, Fairfax, Virginia (GMU)
Master of Science | Data Analytics Engineering
Jan 2024 – Dec 2025
Concentration: Applied ML, NLP with Deep Learning
CGPA: 3.93/4

Savitribai Phule Pune University (SPPU)
Bachelor of Engineering | Mechanical Engineering
July 2017 – June 2021
CGPA: 3.9/4

Experience

George Mason University, USA
Software Developer Intern
Aug 2025 – Present

Developed real-time speech analytics features using React Native, reducing latency by 25%
Engineered DSP algorithms for speech enhancement, boosting audio clarity by 35%
Built REST APIs with Firebase Cloud Functions and GitLab CI/CD on GCP

EPAM Systems, USA
Data Engineer Intern
Jun 2024 – Jun 2025

Developed scalable data pipelines using Python and Airflow with Hugging Face LLMs
Built AWS infrastructure (EC2, S3, IAM, CloudWatch) for real-time API deployment
Implemented data contracts and validation rules ensuring schema integrity in MongoDB

Tata Consultancy Services (Walgreens Boot Alliance), India
Data Engineer
Aug 2021 – Jan 2024

Led migration of healthcare data to AWS (S3, RDS, Lambda), reducing costs by 30%
Automated ETL workflows for 1M+ daily records, reducing runtime by 50%
Built predictive models in Python and R, improving forecast accuracy by 25%

Anvizon (PwC), India
Project Analyst
Jan 2021 – Jul 2021

Conducted clustering and PCA on 90K+ customer records, improving segmentation by 12%
Developed Faster R-CNN model for defect detection, reducing production downtime by 15%
Improved image segmentation accuracy from 76% to 88%

Skills

• Languages & Scripting: Python, SQL, Java, C#, Perl, Bash, JavaScript

• ML & AI: XGBoost, LightGBM, PyTorch, TensorFlow, scikit-learn, SageMaker, statistical modeling

• GenAI & NLP: LangChain, Bedrock, RAG, LangGraph, GPT APIs, SpaCy, NLTK, Prompt Engineering

• Computer Vision: YOLOv7, Faster R-CNN, OpenCV, CNNs, Image Segmentation

• Cloud & MLOps: AWS (S3, Lambda, EC2, SageMaker, Bedrock, API Gateway), Docker, GitHub Actions, CI/CD • Data Engineering: Pandas, NumPy, Feature Engineering, EDA, ETL Pipelines

• Frameworks & APIs: FastAPI, Flask, REST, GraphQL, Flask-SocketIO

• Databases: PostgreSQL, MySQL, SQLite, MongoDB

• Tools: Git, Jenkins, JIRA, VSCode, Postman, Pytest

Certifications

• AWS Certified Data Engineer

• AWS Certified Cloud Practitioner

• Microsoft Certified Power BI Data Analyst Associate

• Introduction to Generative AI by Google

• Generative AI Fundamentals by Databricks

• Statistics for Data Science and Business Analysis

Web Links

Previous Projects

AniMagi – LLM Support Bot (LangChain + Pinecone)

• Designed and deployed RAG-powered assistant using FastAPI, Pinecone, and OpenAI GPT, automating 65% of FAQs and cutting escalations by 60%.

• Reduced average support resolution time from 2.4 minutes to 45 seconds, increasing agent productivity by 3x.

Computer Vision

• Achieved 85% accuracy training CNNs on FashionMNIST; built panoramic stitcher using OpenCV with 100% alignment precision across datasets.

• Used custom alpha blending + OpenCV homography for seamless image composition; improved rendering time by 30%.

Predicting Stock Market Trends Using Machine Learning (LSTM + ARIMA + Prophet)

• Developed and optimized time-series forecasting models (LSTM, ARIMA, Prophet) to predict stock market trends with improved  accuracy through advanced feature engineering and hyperparameter tuning.

• Applied financial indicators (SMA, RSI, MACD) to enhance predictive performance, achieving measurable gains over baseline  models.

LLM Threat Fusion: Risk Analysis for Large Language Models

• Conducted structured threat analysis of LLM deployments by synthesizing academic, open-source, and technical sources into a  unified framework.

• Designed a risk evaluation framework identifying vulnerabilities (prompt injection, data leakage, model bias) and proposed  mitigation strategies for enterprise adoption.

onAir Projects

Server Management

Working with Sai Sriram Uppada…..

AI onAir Summariser

Working with Namita Chougule…

Data Engineering Hub

Working on administrating the Data Engineering Hub

onAir LLM Search

Working with Todd …….

Discuss

OnAir membership is required. The lead Moderator for the discussions is Sankarsh Sanap. We encourage civil, honest, and safe discourse. For more information on commenting and giving feedback, see our Comment Guidelines.

This is an open discussion on the contents of this post.

Home Forums Open Discussion

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.

Home Forums Data Analytics

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.
Skip to toolbar