Summary
👋 Hi, I’m Sai Sriram Uppada – a Data Analytics Engineering graduate, AWS Certified Data Engineer, and AI enthusiast who enjoys building end‑to‑end data, ML, and LLM‑powered systems that solve real problems in a scalable way.
I recently completed my Master’s in Data Analytics Engineering at George Mason University (GPA 3.85/4.0), where I focused on data engineering, machine learning, and LLM-based analytics. Over the past few years, through academic projects, research work, and internships, I’ve worked across the full data lifecycle: from ingestion and ETL to modeling, evaluation, and deployment for analytics and intelligent applications. I’m actively seeking full-time and internship opportunities as a Data Engineer, Data Analytics Engineer, or Data Analyst where I can contribute to data platforms, AI-driven products, and cloud-native solutions.
OnAir Post: Sai Sriram Uppada
News
onAir Tech, – February 10, 2026
OnAir Tech Corporation is excited to announce that Sai Sriram Uppada (Ram) has joined the onAirTech Team. Ram will be focusing on the server management and the onAir job matching system.
About
Biography
Data Analytics Engineering graduate with a Master’s degree from George Mason University (GPA: 3.85/4.0), specializing in building end-to-end data pipelines, machine learning models, decision-ready dashboards, and LLM-driven analytics systems. I bring hands-on expertise across the complete data lifecycle, from ETL design and cloud infrastructure to advanced machine learning, deep learning, and Large Language Model applications.
My technical foundation spans Python-driven data engineering, SQL, PySpark, Big Data, statistical modeling, data processing frameworks, modern BI tools, and AI-powered solutions, with practical experience in architecting cloud-native data systems on AWS (S3, Glue, EMR, Aurora, Redshift, Lambda) as I hold AWS Certified Data Engineer – Associate & Cloud Practitioner certifications, and designing operational data stores, implementing production-grade ETL workflows using SnapLogic and Databricks, and have demonstrated expertise in prompt engineering, database management, feature engineering, model evaluation, and data-driven storytelling.
I am actively seeking challenging data engineering and analytics roles where I can leverage my technical expertise in cloud data infrastructure, machine learning, and AI to build scalable, production-ready systems that drive measurable business impact.
Experience
Graduate Research Assistant
George Mason University
January 2025 – July 2025 · Hybrid, Fairfax County, VA
- Architected and evaluated an Operational Data Store (ODS) for eRebate’s transactional data infrastructure, designing scalable data pipelines for reliable reporting and analytics.
- Engineered end-to-end ETL workflows using Python and SnapLogic, orchestrating data pipeline between AWS S3, Aurora PostgreSQL, Lambda, and Glue for cloud-native data engineering.
- Optimized Aurora PostgreSQL schemas with strategic indexing to support scalability and downstream analytical use cases, improving query performance significantly.
Overview
Served as a junior research assistant on a mission-critical project to design and evaluate an Operational Data Store (ODS) for eRebate’s transactional data infrastructure. Led initiatives to architect scalable data pipelines supporting reliable reporting, analytics, and downstream decision-making. Built and tested end-to-end ETL workflows using Python and SnapLogic, processing flat-file data (CSV/TSV formats) and transforming it into structured, normalized formats suitable for cloud storage and analytics. Engineered data pipeline orchestration between AWS S3 (data lake), AWS Aurora PostgreSQL (analytical database), AWS Lambda (serverless compute), and AWS Glue (ETL service), demonstrating proficiency in cloud-native data engineering patterns.
Technologies
Python, Snap Logic, AWS S3, AWS Aurora, PostgreSQL, AWS Lambda, AWS Glue, ETL, Data Pipeline, CSV/TSV Processing
Impact
Demonstrated hands-on competency in cloud data engineering, ETL tool implementation, schema optimization, and data governance, critical skillsets for enterprise-scale analytics infrastructure.
Key Achievements
- Built and tested end-to-end ETL workflows using Python and SnapLogic, processing flat-file data (CSV/TSV formats) and transforming it into structured, normalized formats suitable for cloud storage and analytics
- Engineered data pipeline orchestration between AWS S3 (data lake), AWS Aurora PostgreSQL (analytical database), AWS Lambda (serverless compute), and AWS Glue (ETL service), demonstrating proficiency in cloud-native data engineering patterns
- Designed and optimized Aurora PostgreSQL schemas and table structures, evaluating trade-offs between normalization, indexing strategies, and query performance to support scalability and downstream analytical use cases
- Conducted rigorous data quality assessments, documenting data profiling results, pipeline behavior, transformation anomalies, and quality findings in comprehensive internal documentation to support knowledge sharing and future enhancements
- Collaborated with data engineering and analytics teams in a research-driven environment, contributing to architectural discussions and best-practice documentation for ODS design and maintenance
Education
Master’s in Data Analytics Engineering
January 2024 – December 2025 | GPA: 3.85/4.0
George Mason University
Fairfax, VA
Specialized in core data engineering, AI, machine learning, and analytics with hands-on expertise in designing end-to-end data pipelines, LLM-powered analytics systems, and scalable AWS cloud architectures.
AWS Certified Data Engineer Associate
Capstone: LLM Resume Matcher
LLM-driven analytics systems
Cloud-native architectures on AWS
Bachelor’s in Information Technology
August 2019 – May 2023
Gayatri Vidya Parishad (GVP) College of Engineering
Visakhapatnam, India
Focused on computer science, IT fundamentals with emphasis on programming, data structures, data mining, data warehousing, AI-ML, and software development methodologies.
Machine Learning Internships
Data Analytics Internships
Software Engineering Fundamentals
Certifications
AWS Certified Data Engineer – Associate
Issued by: Amazon Web Services
2025
Validates expertise in AWS data engineering services (Glue, EMR, Redshift, Aurora)
Cloud Computing
Skills
Technical expertise and competencies have been developed over the years.
I offer a range of services across data engineering, analytics, cloud deployment, AI and LLMs, and machine learning.
Programming Languages
Proficient in core programming languages for data engineering, analytics, and machine learning.
Python, R, SQL, PySpark
Core Competencies
Key areas of expertise and professional capabilities.
- End-to-End Data Pipeline Architecture & Orchestration
- Machine Learning Model Development & Evaluation
- Large Language Model (LLM) Integration & Prompt Engineering
- Cloud Data Engineering (AWS)
- ETL Workflow Design & Optimization
- Real-Time Data Processing & Anomaly Detection
- Feature Engineering & Statistical Analysis
- Data Visualization & Storytelling
- Model Interpretability & Explainability (SHAP, LIME)
- Full-Stack Application Development (Mobile & Web)
- SQL & NoSQL Database Design
- Agile & Research-Driven Collaboration
Contact
Email: OnAir Member
Locations
Fairfax
4308 Cotswolds Hill ln,
Fairfax, VA, 22030
Phone: +1 7049569712
Web Links
Past Projects
I worked on several projects over the last 4 years.
Crash Analytics: Predictive Modeling of Road Accident Severity Using Machine Learning
Jan 2025 – May 2025
Led a comprehensive research initiative analyzing 132,000 global road accident records to investigate the interplay of driver behavior, environmental conditions, and temporal factors in determining accident severity, with direct applications to transportation safety policy and public health interventions.
PySpark, XGBoost, Random Forest, K-Means Clustering
Overview
Led a comprehensive research initiative analyzing 132,000 global road accident records to investigate the interplay of driver behavior, environmental conditions, and temporal factors in determining accident severity, with direct applications to transportation safety policy and public health interventions.
Conducted extensive exploratory data analysis (EDA) on large-scale accident datasets, identifying critical features such as driver alcohol level, visibility conditions, vehicle speed, and temporal patterns using Pandas, NumPy, and Matplotlib. Built ensemble machine learning models, including Random Forest, Logistic Regression, and XGBoost, achieving 80% classification accuracy and an F1-score of 0.80 in predicting accident severity across multiple categories. Addressed significant class imbalance in the dataset using SMOTE (Synthetic Minority Over-sampling Technique), improving minority class detection and ensuring balanced model performance across all severity levels.
Applied advanced model interpretability techniques, SHAP (SHapley Additive explanations), LIME (Local Interpretable Model-agnostic Explanations), and Partial Dependency Plots (PDPs), to extract actionable insights and explain individual predictions to non-technical stakeholders.
Developed interactive web-based visualizations using Streamlit, enabling collaborative exploration, real-time model feedback, and stakeholder engagement throughout the analysis lifecycle. Synthesized findings into evidence-based policy recommendations targeting driver behavior modification, infrastructure safety improvements, and temporal risk mitigation strategies.
Technologies
Python, Pandas, NumPy, Matplotlib, Random Forest, Logistic Regression, XGBoost, SMOTE, SHAP, LIME, Streamlit, Statistical Analysis
Impact
Demonstrated expertise in large-scale data analysis, ensemble modeling, interpretability frameworks, and stakeholder communication, delivering research-quality outputs suitable for academic publication and policy impact.
Key Achievements
- Conducted extensive exploratory data analysis on 132,000 road accident records
- Achieved 80% classification accuracy and F1-score of 0.80 using ensemble models
- Addressed significant class imbalance using SMOTE (Synthetic Minority Over-sampling Technique)
- Applied advanced model interpretability techniques: SHAP, LIME, and Partial Dependency Plots (PDPs)
- Developed interactive web-based visualizations using Streamlit
- Synthesized findings into evidence-based policy recommendations
Real Estate Price Prediction Using Socioeconomic Indicators
Aug 2024 – Dec 2024
Executed a data-driven research study examining how socioeconomic factors (crime rates, school accessibility, healthcare availability, employment statistics) influence real estate pricing across Connecticut, leveraging twenty years of property transaction data.
Overview
Executed a data-driven research study examining how socioeconomic factors (crime rates, school accessibility, healthcare availability, employment statistics) influence real estate pricing across Connecticut, leveraging twenty years of property transaction data. Aggregated and integrated heterogeneous datasets spanning property transactions, crime statistics, school ratings, healthcare facilities, and employment data, performing rigorous data cleaning and ZIP code–based spatial joins.
Executed advanced data preprocessing, including median imputation for missing values, ZIP code normalization, temporal feature engineering (time-series transformations), and log transformation for skewness correction to stabilize model predictions. Trained ensemble regression models, Random Forest, and XGBoost, to capture nonlinear relationships and interaction effects between socioeconomic variables and property prices, achieving an R² of 0.90, demonstrating high predictive fidelity.
Conducted feature importance analysis and residual error diagnostics, revealing property crime rate as a dominant factor in housing devaluation and uncovering regional disparities in price drivers. Created publication-quality visualizations using Tableau and Matplotlib, illustrating temporal trends, geographic patterns, and regional disparities to support urban planning and socioeconomic policy discussions. Delivered actionable recommendations for real estate valuation, neighborhood risk assessment, and targeted urban development interventions.
Technologies
Python, Pandas, NumPy, Matplotlib, Tableau, Random Forest, XGBoost, Feature Engineering, Spatial Analysis, Data Integration
Impact
Demonstrated expertise in feature engineering, regression modeling, heterogeneous data integration, and data-driven insights for urban policy, applicable to real estate analytics, urban planning, and socioeconomic research domains.
Key Achievements
- Aggregated and integrated heterogeneous datasets spanning twenty years of property transaction data
- Executed advanced data preprocessing, including median imputation, ZIP code normalization, and temporal feature engineering
- Achieved R² of 0.90, demonstrating high predictive fidelity using Random Forest and XGBoost
- Conducted feature importance analysis revealing property crime rate as a dominant factor in housing devaluation
- Created publication-quality visualizations using Tableau and Matplotlib
- Delivered actionable recommendations for real estate valuation and urban development interventions
Fashion Designing Using Generative Adversarial Networks (GANs)
Jan 2023 – May 2023
Developed a cross-platform mobile application showcasing AI-generated fashion designs, integrating a pre-trained Deep Convolutional Generative Adversarial Network (DCGAN) model optimized for on-device inference on smartphones.
Overview
Developed a cross-platform mobile application showcasing AI-generated fashion designs, integrating a pre-trained Deep Convolutional Generative Adversarial Network (DCGAN) model optimized for on-device inference on smartphones. Converted a pre-trained DCGAN model to TensorFlow Lite format, optimizing the model for on-device inference on mobile devices while maintaining generation quality and minimizing computational overhead.
Built a responsive mobile frontend using React Native and Expo, implementing multi-screen navigation architecture with React Navigation and persistent local data storage using Async Storage for browsing and bookmarking generated designs. Integrated Axios for backend API communication and implemented image handling capabilities using React Native Image Picker, enabling users to upload reference images for style inspiration and personalization. Deployed a lightweight Flask REST API backend on Heroku (free tier), handling model prediction requests and image preprocessing (resizing, normalization) using Pillow and NumPy libraries.
Designed user interface using React Native Paper, adhering to Material Design principles for consistency, accessibility, and visual polish across Android and iOS platforms. Tested application on Android Studio Emulator and Expo Go, ensuring cross-platform compatibility and user experience quality. Achieved 80% user satisfaction in usability testing, demonstrating the application’s effectiveness in communicating AI-generated designs to non-technical users.
Technologies
React NativeExpoJavaScriptTensorFlow LiteDCGANFlaskPythonHerokuReact NavigationAsync StorageMaterial DesignAndroidiOS
Impact
Delivered a production-ready mobile application demonstrating full-stack development expertise (frontend, backend, ML model integration), mobile optimization, and user-centric design, applicable to AI-powered consumer applications.
Key Achievements
- Converted pre-trained DCGAN model to TensorFlow Lite format for on-device inference
- Built a responsive mobile frontend using React Native and Expo with multi-screen navigation
- Deployed lightweight Flask REST API backend on Heroku for model predictions
- Implemented image handling capabilities using React Native Image Picker
- Designed user interface using React Native Paper, adhering to Material Design principles
- Tested on Android Studio Emulator and Expo Go for cross-platform compatibility
- Achieved 80% user satisfaction in usability testing
onAir Server Management
I am currently overseeing the management of onAir Servers on AWS.
I am also the Hub Manager for the AI Nexus GMU Custom onAir Hub.
AWS
I am currently working on AWS servers
onAir Job Matching
GMU Capstone
LLM-Based Resume–Job Matching and SOC Classification System
Architected and deployed a sophisticated end-to-end Large Language Model (LLM) pipeline to intelligently match resumes with job opportunities using Standard Occupational Classification (SOC-2018) codes and multi-model LLM inference.
Overview
Architected and deployed a sophisticated end-to-end Large Language Model (LLM) pipeline to intelligently match resumes with job opportunities using Standard Occupational Classification (SOC-2018) codes and multi-model LLM inference. Designed a data ingestion layer to process and normalize resume and job-posting data from heterogeneous sources (CSV, PDF, and JSON formats), implementing robust data cleaning and personally identifiable information (PII) removal to ensure data privacy and consistency. Integrated and benchmarked five major LLM providers: OpenAI GPT-4o, GPT-4o-mini, Anthropic Claude, Cohere, and Mistral, using standardized prompts and payload structures to enable fair, reproducible model comparisons across inference tasks.
Engineered three distinct prompt engineering tiers (baseline, intermediate, comprehensive) and evaluated LLM performance by comparing model-generated SOC codes against manually labeled ground truth data, achieving high classification accuracy targeting 90%+. Developed a Top-10 job recommendation engine that ranks and filters job postings per resume, improving user experience and matching precision through LLM-powered relevance scoring.
Stored all predictions, ground truth labels, evaluation metrics, and model performance statistics in MongoDB, enabling comprehensive analytics and audit trails for model validation and deployment decisions. Implemented version control and documentation workflows on GitHub, ensuring experiment reproducibility, code quality, and knowledge transfer across team members.
Technologies
Python, OpenAI, GPT, APIs, Anthropic Claude, Cohere, Mistral, MongoDB, GitHub, Prompt Engineering, SOC-2018 Classification
Impact
Delivered a production-ready LLM pipeline demonstrating expertise in prompt engineering, multi-model LLM integration, evaluation methodologies, and data-driven decision-making, directly supporting talent acquisition and workforce matching initiatives.
Key Achievements
- Designed data ingestion layer to process resume and job-posting data from CSV, PDF, and JSON formats
- Integrated and benchmarked five major LLM providers: OpenAI GPT-4o, GPT-4o-mini, Anthropic Claude, Cohere, and Mistral
- Engineered three distinct prompt engineering tiers (baseline, intermediate, comprehensive)
- Achieved high classification accuracy targeting 90%+ in SOC code classification
- Developed a Top-10 job recommendation engine with LLM-powered relevance scoring
- Stored all predictions, ground truth labels, evaluation metrics, and model performance statistics in MongoDB
- Implemented version control and documentation workflows on GitHub for experiment reproducibility
