Sai Sriram Uppada

Summary

👋 Hi, I’m Sai Sriram Uppada – a Data Analytics Engineering graduate, AWS Certified Data Engineer, and AI enthusiast who enjoys building end‑to‑end data, ML, and LLM‑powered systems that solve real problems in a scalable way.

I recently completed my Master’s in Data Analytics Engineering at George Mason University (GPA 3.85/4.0), where I focused on data engineering, machine learning, and LLM-based analytics. Over the past few years, through academic projects, research work, and internships, I’ve worked across the full data lifecycle: from ingestion and ETL to modeling, evaluation, and deployment for analytics and intelligent applications. I’m actively seeking full-time and internship opportunities as a Data Engineer, Data Analytics Engineer, or Data Analyst where I can contribute to data platforms, AI-driven products, and cloud-native solutions.

OnAir Post: Sai Sriram Uppada

News

Sai Sriram Uppada – Joins onAir Tech Corporation
onAir Tech, Todd Gilette – February 10, 2026

OnAir Tech Corporation is excited to announce that Sai Sriram Uppada (Ram) has joined the onAirTech Team. Ram will be focusing on the server management and the onAir job matching system.

Comment

•

News Link

•

About

Biography

Data Analytics Engineering graduate with a Master’s degree from George Mason University (GPA: 3.85/4.0), specializing in building end-to-end data pipelines, machine learning models, decision-ready dashboards, and LLM-driven analytics systems. I bring hands-on expertise across the complete data lifecycle, from ETL design and cloud infrastructure to advanced machine learning, deep learning, and Large Language Model applications.

My technical foundation spans Python-driven data engineering, SQL, PySpark, Big Data, statistical modeling, data processing frameworks, modern BI tools, and AI-powered solutions, with practical experience in architecting cloud-native data systems on AWS (S3, Glue, EMR, Aurora, Redshift, Lambda) as I hold AWS Certified Data Engineer – Associate & Cloud Practitioner certifications, and designing operational data stores, implementing production-grade ETL workflows using SnapLogic and Databricks, and have demonstrated expertise in prompt engineering, database management, feature engineering, model evaluation, and data-driven storytelling.

I am actively seeking challenging data engineering and analytics roles where I can leverage my technical expertise in cloud data infrastructure, machine learning, and AI to build scalable, production-ready systems that drive measurable business impact.

Experience

Graduate Research Assistant
George Mason University
January 2025 – July 2025 · Hybrid, Fairfax County, VA

Architected and evaluated an Operational Data Store (ODS) for eRebate’s transactional data infrastructure, designing scalable data pipelines for reliable reporting and analytics.
Engineered end-to-end ETL workflows using Python and SnapLogic, orchestrating data pipeline between AWS S3, Aurora PostgreSQL, Lambda, and Glue for cloud-native data engineering.
Optimized Aurora PostgreSQL schemas with strategic indexing to support scalability and downstream analytical use cases, improving query performance significantly.

Overview

Served as a junior research assistant on a mission-critical project to design and evaluate an Operational Data Store (ODS) for eRebate’s transactional data infrastructure. Led initiatives to architect scalable data pipelines supporting reliable reporting, analytics, and downstream decision-making. Built and tested end-to-end ETL workflows using Python and SnapLogic, processing flat-file data (CSV/TSV formats) and transforming it into structured, normalized formats suitable for cloud storage and analytics. Engineered data pipeline orchestration between AWS S3 (data lake), AWS Aurora PostgreSQL (analytical database), AWS Lambda (serverless compute), and AWS Glue (ETL service), demonstrating proficiency in cloud-native data engineering patterns.

Technologies

Python, Snap Logic, AWS S3, AWS Aurora, PostgreSQL, AWS Lambda, AWS Glue, ETL, Data Pipeline, CSV/TSV Processing

Impact

Demonstrated hands-on competency in cloud data engineering, ETL tool implementation, schema optimization, and data governance, critical skillsets for enterprise-scale analytics infrastructure.

Key Achievements

Built and tested end-to-end ETL workflows using Python and SnapLogic, processing flat-file data (CSV/TSV formats) and transforming it into structured, normalized formats suitable for cloud storage and analytics
Engineered data pipeline orchestration between AWS S3 (data lake), AWS Aurora PostgreSQL (analytical database), AWS Lambda (serverless compute), and AWS Glue (ETL service), demonstrating proficiency in cloud-native data engineering patterns
Designed and optimized Aurora PostgreSQL schemas and table structures, evaluating trade-offs between normalization, indexing strategies, and query performance to support scalability and downstream analytical use cases
Conducted rigorous data quality assessments, documenting data profiling results, pipeline behavior, transformation anomalies, and quality findings in comprehensive internal documentation to support knowledge sharing and future enhancements
Collaborated with data engineering and analytics teams in a research-driven environment, contributing to architectural discussions and best-practice documentation for ODS design and maintenance

Education

Master’s in Data Analytics Engineering
January 2024 – December 2025 | GPA: 3.85/4.0
George Mason University
Fairfax, VA

Specialized in core data engineering, AI, machine learning, and analytics with hands-on expertise in designing end-to-end data pipelines, LLM-powered analytics systems, and scalable AWS cloud architectures.

AWS Certified Data Engineer Associate
Capstone: LLM Resume Matcher
LLM-driven analytics systems
Cloud-native architectures on AWS

Bachelor’s in Information Technology
August 2019 – May 2023
Gayatri Vidya Parishad (GVP) College of Engineering
Visakhapatnam, India

Focused on computer science, IT fundamentals with emphasis on programming, data structures, data mining, data warehousing, AI-ML, and software development methodologies.

Machine Learning Internships
Data Analytics Internships
Software Engineering Fundamentals

Certifications

AWS Certified Data Engineer – Associate
Issued by: Amazon Web Services
2025

Validates expertise in AWS data engineering services (Glue, EMR, Redshift, Aurora)

Cloud Computing

Skills

Technical expertise and competencies have been developed over the years.
I offer a range of services across data engineering, analytics, cloud deployment, AI and LLMs, and machine learning.

Programming Languages
Proficient in core programming languages for data engineering, analytics, and machine learning.

Python, R, SQL, PySpark

Core Competencies

Key areas of expertise and professional capabilities.

End-to-End Data Pipeline Architecture & Orchestration
Machine Learning Model Development & Evaluation
Large Language Model (LLM) Integration & Prompt Engineering
Cloud Data Engineering (AWS)
ETL Workflow Design & Optimization
Real-Time Data Processing & Anomaly Detection
Feature Engineering & Statistical Analysis
Data Visualization & Storytelling
Model Interpretability & Explainability (SHAP, LIME)
Full-Stack Application Development (Mobile & Web)
SQL & NoSQL Database Design
Agile & Research-Driven Collaboration

Contact

Email: OnAir Member

Locations

Fairfax
4308 Cotswolds Hill ln,
Fairfax, VA, 22030
Phone: +1 7049569712

Web Links

Past Projects

I worked on several projects over the last 4 years.

Crash Analytics: Predictive Modeling of Road Accident Severity Using Machine Learning

Jan 2025 – May 2025

Led a comprehensive research initiative analyzing 132,000 global road accident records to investigate the interplay of driver behavior, environmental conditions, and temporal factors in determining accident severity, with direct applications to transportation safety policy and public health interventions.

PySpark, XGBoost, Random Forest, K-Means Clustering

Overview

Conducted extensive exploratory data analysis (EDA) on large-scale accident datasets, identifying critical features such as driver alcohol level, visibility conditions, vehicle speed, and temporal patterns using Pandas, NumPy, and Matplotlib. Built ensemble machine learning models, including Random Forest, Logistic Regression, and XGBoost, achieving 80% classification accuracy and an F1-score of 0.80 in predicting accident severity across multiple categories. Addressed significant class imbalance in the dataset using SMOTE (Synthetic Minority Over-sampling Technique), improving minority class detection and ensuring balanced model performance across all severity levels.

Applied advanced model interpretability techniques, SHAP (SHapley Additive explanations), LIME (Local Interpretable Model-agnostic Explanations), and Partial Dependency Plots (PDPs), to extract actionable insights and explain individual predictions to non-technical stakeholders.

Developed interactive web-based visualizations using Streamlit, enabling collaborative exploration, real-time model feedback, and stakeholder engagement throughout the analysis lifecycle. Synthesized findings into evidence-based policy recommendations targeting driver behavior modification, infrastructure safety improvements, and temporal risk mitigation strategies.

Technologies

Python, Pandas, NumPy, Matplotlib, Random Forest, Logistic Regression, XGBoost, SMOTE, SHAP, LIME, Streamlit, Statistical Analysis

Impact

Demonstrated expertise in large-scale data analysis, ensemble modeling, interpretability frameworks, and stakeholder communication, delivering research-quality outputs suitable for academic publication and policy impact.

Key Achievements

Conducted extensive exploratory data analysis on 132,000 road accident records
Achieved 80% classification accuracy and F1-score of 0.80 using ensemble models
Addressed significant class imbalance using SMOTE (Synthetic Minority Over-sampling Technique)
Applied advanced model interpretability techniques: SHAP, LIME, and Partial Dependency Plots (PDPs)
Developed interactive web-based visualizations using Streamlit
Synthesized findings into evidence-based policy recommendations

Real Estate Price Prediction Using Socioeconomic Indicators

Aug 2024 – Dec 2024

Overview

Executed a data-driven research study examining how socioeconomic factors (crime rates, school accessibility, healthcare availability, employment statistics) influence real estate pricing across Connecticut, leveraging twenty years of property transaction data. Aggregated and integrated heterogeneous datasets spanning property transactions, crime statistics, school ratings, healthcare facilities, and employment data, performing rigorous data cleaning and ZIP code–based spatial joins.

Executed advanced data preprocessing, including median imputation for missing values, ZIP code normalization, temporal feature engineering (time-series transformations), and log transformation for skewness correction to stabilize model predictions. Trained ensemble regression models, Random Forest, and XGBoost, to capture nonlinear relationships and interaction effects between socioeconomic variables and property prices, achieving an R² of 0.90, demonstrating high predictive fidelity.

Conducted feature importance analysis and residual error diagnostics, revealing property crime rate as a dominant factor in housing devaluation and uncovering regional disparities in price drivers. Created publication-quality visualizations using Tableau and Matplotlib, illustrating temporal trends, geographic patterns, and regional disparities to support urban planning and socioeconomic policy discussions. Delivered actionable recommendations for real estate valuation, neighborhood risk assessment, and targeted urban development interventions.

Technologies

Python, Pandas, NumPy, Matplotlib, Tableau, Random Forest, XGBoost, Feature Engineering, Spatial Analysis, Data Integration

Impact

Demonstrated expertise in feature engineering, regression modeling, heterogeneous data integration, and data-driven insights for urban policy, applicable to real estate analytics, urban planning, and socioeconomic research domains.

Key Achievements

Aggregated and integrated heterogeneous datasets spanning twenty years of property transaction data
Executed advanced data preprocessing, including median imputation, ZIP code normalization, and temporal feature engineering
Achieved R² of 0.90, demonstrating high predictive fidelity using Random Forest and XGBoost
Conducted feature importance analysis revealing property crime rate as a dominant factor in housing devaluation
Created publication-quality visualizations using Tableau and Matplotlib
Delivered actionable recommendations for real estate valuation and urban development interventions

Fashion Designing Using Generative Adversarial Networks (GANs)

Jan 2023 – May 2023

Overview

Developed a cross-platform mobile application showcasing AI-generated fashion designs, integrating a pre-trained Deep Convolutional Generative Adversarial Network (DCGAN) model optimized for on-device inference on smartphones. Converted a pre-trained DCGAN model to TensorFlow Lite format, optimizing the model for on-device inference on mobile devices while maintaining generation quality and minimizing computational overhead.

Built a responsive mobile frontend using React Native and Expo, implementing multi-screen navigation architecture with React Navigation and persistent local data storage using Async Storage for browsing and bookmarking generated designs. Integrated Axios for backend API communication and implemented image handling capabilities using React Native Image Picker, enabling users to upload reference images for style inspiration and personalization. Deployed a lightweight Flask REST API backend on Heroku (free tier), handling model prediction requests and image preprocessing (resizing, normalization) using Pillow and NumPy libraries.

Designed user interface using React Native Paper, adhering to Material Design principles for consistency, accessibility, and visual polish across Android and iOS platforms. Tested application on Android Studio Emulator and Expo Go, ensuring cross-platform compatibility and user experience quality. Achieved 80% user satisfaction in usability testing, demonstrating the application’s effectiveness in communicating AI-generated designs to non-technical users.

Technologies

React NativeExpoJavaScriptTensorFlow LiteDCGANFlaskPythonHerokuReact NavigationAsync StorageMaterial DesignAndroidiOS

Impact

Delivered a production-ready mobile application demonstrating full-stack development expertise (frontend, backend, ML model integration), mobile optimization, and user-centric design, applicable to AI-powered consumer applications.

Key Achievements

Converted pre-trained DCGAN model to TensorFlow Lite format for on-device inference
Built a responsive mobile frontend using React Native and Expo with multi-screen navigation
Deployed lightweight Flask REST API backend on Heroku for model predictions
Implemented image handling capabilities using React Native Image Picker
Designed user interface using React Native Paper, adhering to Material Design principles
Tested on Android Studio Emulator and Expo Go for cross-platform compatibility
Achieved 80% user satisfaction in usability testing

onAir Server Management

I am currently overseeing the management of onAir Servers on AWS.

I am also the Hub Manager for the AI Nexus GMU Custom onAir Hub.

AWS

I am currently working on AWS servers

onAir Job Matching

GMU Capstone

LLM-Based Resume–Job Matching and SOC Classification System

Overview

Architected and deployed a sophisticated end-to-end Large Language Model (LLM) pipeline to intelligently match resumes with job opportunities using Standard Occupational Classification (SOC-2018) codes and multi-model LLM inference. Designed a data ingestion layer to process and normalize resume and job-posting data from heterogeneous sources (CSV, PDF, and JSON formats), implementing robust data cleaning and personally identifiable information (PII) removal to ensure data privacy and consistency. Integrated and benchmarked five major LLM providers: OpenAI GPT-4o, GPT-4o-mini, Anthropic Claude, Cohere, and Mistral, using standardized prompts and payload structures to enable fair, reproducible model comparisons across inference tasks.

Engineered three distinct prompt engineering tiers (baseline, intermediate, comprehensive) and evaluated LLM performance by comparing model-generated SOC codes against manually labeled ground truth data, achieving high classification accuracy targeting 90%+. Developed a Top-10 job recommendation engine that ranks and filters job postings per resume, improving user experience and matching precision through LLM-powered relevance scoring.

Stored all predictions, ground truth labels, evaluation metrics, and model performance statistics in MongoDB, enabling comprehensive analytics and audit trails for model validation and deployment decisions. Implemented version control and documentation workflows on GitHub, ensuring experiment reproducibility, code quality, and knowledge transfer across team members.

Technologies

Python, OpenAI, GPT, APIs, Anthropic Claude, Cohere, Mistral, MongoDB, GitHub, Prompt Engineering, SOC-2018 Classification

Impact

Delivered a production-ready LLM pipeline demonstrating expertise in prompt engineering, multi-model LLM integration, evaluation methodologies, and data-driven decision-making, directly supporting talent acquisition and workforce matching initiatives.

Key Achievements

Designed data ingestion layer to process resume and job-posting data from CSV, PDF, and JSON formats
Integrated and benchmarked five major LLM providers: OpenAI GPT-4o, GPT-4o-mini, Anthropic Claude, Cohere, and Mistral
Engineered three distinct prompt engineering tiers (baseline, intermediate, comprehensive)
Achieved high classification accuracy targeting 90%+ in SOC code classification
Developed a Top-10 job recommendation engine with LLM-powered relevance scoring
Stored all predictions, ground truth labels, evaluation metrics, and model performance statistics in MongoDB
Implemented version control and documentation workflows on GitHub for experiment reproducibility

Discuss

OnAir membership is required. The lead Moderator for the discussions is suppada. We encourage civil, honest, and safe discourse. For more information on commenting and giving feedback, see our Comment Guidelines.

Questions
Feedback
Open Discussion

This topic has 0 replies, 1 voice, and was last updated 5 days, 4 hours ago by onAir Curators.

Viewing 1 post (of 1 total)

Author
Posts
February 6, 2026 at 2:08 am #5392
onAir Curators
Participant
Author
Posts

Viewing 1 post (of 1 total)

You must be logged in to reply to this topic.

This is an open discussion on the contents of this post.

This topic has 0 replies, 1 voice, and was last updated 5 days, 4 hours ago by onAir Curators.

Viewing 1 post (of 1 total)

Author
Posts
February 6, 2026 at 2:08 am #5401
onAir Curators
Participant
Author
Posts

Viewing 1 post (of 1 total)

You must be logged in to reply to this topic.