I2S Masters/ Doctoral Theses


All students and faculty are welcome to attend the final defense of I2S graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Md Mashfiq Rizvee

Hierarchical Probabilistic Architectures for Scalable Biometric and Electronic Authentication in Secure Surveillance Ecosystems

When & Where:


Eaton Hall, Room 2001B

Degree Type:

PhD Comprehensive Defense

Committee Members:

Sumaiya Shomaji, Chair
Tamzidul Hoque
David Johnson
Hongyang Sun
Alexandra Kondyli

Abstract

Secure and scalable authentication has become a primary requirement in modern digital ecosystems, where both human biometrics and electronic identities must be verified under noise, large population growth and resource constraints. Existing approaches often struggle to simultaneously provide storage efficiency, dynamic updates and strong authentication reliability. The proposed work advances a unified probabilistic framework based on Hierarchical Bloom Filter (HBF) architectures to address these limitations across biometric and hardware domains. The first contribution establishes the Dynamic Hierarchical Bloom Filter (DHBF) as a noise-tolerant and dynamically updatable authentication structure for large-scale biometrics. Unlike static Bloom-based systems that require reconstruction upon updates, DHBF supports enrollment, querying, insertion and deletion without structural rebuild. Experimental evaluation on 30,000 facial biometric templates demonstrates 100% enrollment and query accuracy, including robust acceptance of noisy biometric inputs while maintaining correct rejection of non-enrolled identities. These results validate that hierarchical probabilistic encoding can preserve both scalability and authentication reliability in practical deployments. Building on this foundation, Bio-BloomChain integrates DHBF into a blockchain-based smart contract framework to provide tamper-evident, privacy-preserving biometric lifecycle management. The system stores only hashed and non-invertible commitments on-chain while maintaining probabilistic verification logic within the contract layer. Large-scale evaluation again reports 100% enrollment, insertion, query and deletion accuracy across 30,000 templates, therefore, solving the existing problem of blockchains being able to authenticate noisy data. Moreover, the deployment analysis shows that execution on Polygon zkEVM reduces operational costs by several orders of magnitude compared to Ethereum, therefore, bringing enrollment and deletion costs below $0.001 per operation which demonstrate the feasibility of scalable blockchain biometric authentication in practice. Finally, the hierarchical probabilistic paradigm is extended to electronic hardware authentication through the Persistent Hierarchical Bloom Filter (PHBF). Applied to electronic fingerprints derived from physical unclonable functions (PUFs), PHBF demonstrates robust authentication under environmental variations such as temperature-induced noise. Experimental results show zero-error operation at the selected decision threshold and substantial system-level improvements as well as over 10^5 faster query processing and significantly reduced storage requirements compared to large scale tracking.


Fatima Al-Shaikhli

Optical Measurements Leveraging Coherent Fiber Optics Transceivers

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

PhD Dissertation Defense

Committee Members:

Rongqing Hui, Chair
Shannon Blunt
Shima Fardad
Alessandro Saladrino
Judy Wu

Abstract

Recent advancements in optical technology are invaluable in a variety of fields, extending far beyond high-speed communications. These innovations enable optical sensing, which plays a critical role across diverse applications, from medical diagnostics to infrastructure monitoring and automotive systems. This research focuses on leveraging commercially available coherent optical transceivers to develop novel measurement techniques to extract detailed information about optical fiber characteristics, as well as target information. Through this approach, we aim to enable accurate and fast assessments of fiber performance and integrity, while exploring the potential for utilizing existing optical communication networks to enhance fiber characterization capabilities. This goal is investigated through three distinct projects: (1) fiber type characterization based on intensity-modulated electrostriction response, (2) coherent Light Detection and Ranging (LiDAR) system for target range and velocity detection through different waveform design, including experimental validation of frequency modulation continuous wave (FMCW) implementations and theoretical analysis of orthogonal frequency division multiplexing (OFDM) based approaches and (3) birefringence measurements using a coherent Polarization-sensitive Optical Frequency Domain Reflectometer (P-OFDR) system.

Electrostriction in an optical fiber is introduced by interaction between the forward propagated optical signal and the acoustic standing waves in the radial direction resonating between the center of the core and the cladding circumference of the fiber. The response of electrostriction is dependent on fiber parameters, especially the mode field radius. We demonstrated a novel technique of identifying fiber types through the measurement of intensity modulation induced electrostriction response. As the spectral envelope of electrostriction induced propagation loss is anti-symmetrical, the signal to noise ratio can be significantly increased by subtracting the measured spectrum from its complex conjugate. We show that if the field distribution of the fiber propagation mode is Gaussian, the envelope of the electrostriction-induced loss spectrum closely follows a Maxwellian distribution whose shape can be specified by a single parameter determined by the mode field radius.        

We also present a self-homodyne FMCW LiDAR system based on a coherent receiver. By using the same linearly chirped waveform for both the LiDAR signal and the local oscillator, the self-homodyne coherent receiver performs frequency de-chirping directly in the photodiodes, significantly simplifying signal processing. As a result, the required receiver bandwidth is much lower than the chirping bandwidth of the signal. Simultaneous multi-target of range and velocity detection is demonstrated experimentally. Furthermore, we explore the use of commercially available coherent transceivers for joint communication and sensing using OFDM waveforms.

In addition, we demonstrate a P-OFDR system utilizing a digital coherent optical transceiver to generate a linear frequency chirp via carrier-suppressed single-sideband modulation. This method ensures linearity in chirping and phase continuity of the optical carrier. The coherent homodyne receiver, incorporating both polarization and phase diversity, recovers the state of polarization (SOP) of the backscattered optical signal along the fiber, mixing with an identically chirped local oscillator. With a spatial resolution of approximately 5 mm, a 26 GHz chirping bandwidth, and a 200 us measurement time, this system enables precise birefringence measurements. By employing three mutually orthogonal SOPs of the launched optical signal, we measure relative birefringence vectors along the fiber.


Past Defense Notices

Dates

Fairuz Shadmani Shishir

Toward Trustworthy Biomedical AI: Efficient Protein Language Models and Privacy-Aware Clinical Representations

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

PhD Comprehensive Defense

Committee Members:

Sumaiya Shomaji, Chair
Tamzidul Hoque
Cuncong Zhong
Bishnu Sarker
Michael Hageman

Abstract

Accurate biological sequence annotation and privacy-aware clinical modeling are central challenges in modern computational biology and biomedical AI. This dissertation presents scalable and interpretable deep learning frameworks spanning protein family classification, metal-ion binding prediction, and privacy-preserving electrocardiogram (ECG) representation learning. First, we introduce GPCR-SLM, a lightweight transformer-based framework for high-resolution classification of G-protein coupled receptors (GPCRs), one of the largest and most pharmacologically important protein families, targeted by approximately 35% of FDA-approved drugs. Unlike traditional homology-based tools such as BLAST and HMMER, which struggle to distinguish closely related families with low sequence similarity, our knowledge-distilled small language model achieves 99% accuracy across 86 GPCR families. The framework significantly outperforms BLAST (86.4%) and HMMER (91%) while delivering a 33.5× computational speedup compared to large protein language models, enabling scalable functional annotation as protein databases continue to expand. 

Second, we present an end-to-end deep learning pipeline for protein–metal-ion binding prediction. Binding site annotation is traditionally labor-intensive and limited by handcrafted features or predefined residue sets. We systematically evaluate five state-of-the-art protein language models and incorporate positional encoding to capture long-range residue dependencies. Our approach achieves a Matthews Correlation Coefficient (MCC) of 0.89 with precision, recall, and F1 scores exceeding 95% for six major metal ions under 10-fold cross-validation, demonstrating robust predictive performance and improved biological interpretability. Finally, we address fairness and privacy in clinical AI through a variational autoencoder (VAE) framework for ECG representation learning. Because ECGs inherently encode sensitive soft biometrics such as sex, age, and race, we design a dual-discriminator architecture that suppresses demographic information while preserving clinically relevant signals. The reconstructed ECGs substantially reduce demographic identifiability while maintaining strong predictive performance for reduced left ventricular ejection fraction, left ventricular hypertrophy, and 5-year mortality. 

Collectively, this work advances parameter-efficient, scalable, and privacy-conscious deep learning methodologies for both molecular and clinical domains, bridging computational protein science and trustworthy biomedical AI. 


Shailesh Pandey

Vision-Based Motor Assessment in Autism: Deep Learning Methods for Detection, Classification, and Tracking

When & Where:


Zoom Meeting: https://kansas.zoom.us/j/87952337768 Meeting ID: 879 5233 7768 Passcode: 965792

Degree Type:

PhD Comprehensive Defense

Committee Members:

Sumaiya Shomaji, Chair
Shima Fardad
Zijun Yao
Cuncong Zhong
Lisa Dieker

Abstract

Motor difficulties show up in as many as 90% of people with autism, but surprisingly few, somewhere between 13% and 32%, ever get motor-focused help. A big part of the problem is that the tools we have for measuring motor skills either rely on a clinician's subjective judgment or require expensive lab equipment that most families will never have access to. This dissertation tries to close that gap with three projects, all built around the idea that a regular webcam and some well-designed deep learning models can do much of what costly motion-capture labs do today.

The first project asks a straightforward question: can a computer tell the difference between how someone with autism moves and how a typically developing person moves, just by watching a short video? The answer, it turns out, is yes. We built an ensemble of three neural networks, each one tuned to notice something different. One focuses on how joints coordinate with each other spatially, other zeroes in on the timing of movements, and the third learns which body-part relationships matter most for a given clip. We tested the system on 582 videos from 118 people (69 with ASD and 49 without) performing simple everyday actions like stirring or hammering. The ensemble correctly classifies 95.65% of cases. The timing-focused model on its own hits 92%, which is nearly 10 points better than a standard recurrent network baseline. And when all three models agree, accuracy climbs above 98%.

The second project deals with stimming, the repetitive behaviors like arm flapping, head banging, and spinning that are common in autism. Working with 302 publicly available videos, we trained a skeleton-based model that reaches 91% accuracy using body pose alone. That is more than double the 47% that previous work managed on the same benchmark. When we combine the pose information with what the raw video shows through a late fusion approach, accuracy jumps to 99.9%. Across the entire test set, only a single video was misclassified.

The third project is E-MotionSpec, a web platform designed for clinicians and researchers who want to track motor development over time. It runs in any browser, uses MediaPipe to estimate body pose in real time, and extracts 44 movement features grouped into seven domains covering things like how smoothly someone moves, how quickly they initiate actions, and how coordinated their limbs are. We validated the platform on the same 118-participant dataset and found 36 features with statistically significant differences between the ASD and typically developing groups. Smoothness and initiation timing stood out as the strongest discriminators. The platform also includes tools for comparing sessions over time using frequency analysis and dynamic time warping, so a clinician can actually see whether someone's motor patterns are changing across weeks or months.

Taken together, these three projects offer a practical path toward earlier identification and better ongoing monitoring of motor difficulties in autism. Everything runs on a webcam and a web browser. No motion-capture suits, no force plates, no specialized labs. That matters most for the families, schools, and clinics that need these tools the most and can least afford the alternatives.


Md Abu Saeed

Comparative Analysis of Deep Learning Models for Guava Leaf Disease Diagnosis

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

Sumaiya Shomaji, Chair
David Johnson
Hongyang Sun


Abstract

Guava leaf diseases significantly affect crop yield and quality, making timely detection essential for effective disease management. This project presents an end-to-end software system for automated guava leaf disease detection using deep learning and transfer learning techniques. Multiple pretrained convolutional neural network (CNN) architectures, including ResNet, AlexNet, VGG, SqueezeNet, DenseNet, Inception-v3, and EfficientNet, were adapted through feature extraction and trained on a guava leaf image dataset.

The system allows users to either capture an image using a camera or upload an existing leaf image through a software interface. The input image is processed and classified by the trained deep learning model, and the predicted disease class is displayed to the user. The dataset was divided into training, validation, and test sets to ensure robust performance evaluation, and final test accuracy was used to measure generalization on unseen data.

Experimental results demonstrate that transfer learning enables accurate and efficient guava leaf disease classification. Among the evaluated models, the best-performing architecture achieved an accuracy between 97% to 99%. Overall, the developed software provides a practical and user-friendly solution for real-world agricultural disease diagnosis.


Zhaohui Wang

Detection and Mitigation of Cross-App Privacy Leakage and Interaction Threats in IoT Automation

When & Where:


Nichols Hall, Room 250 (Gemini Room); https://kansas.zoom.us/j/86399807556 Meeting ID: 863 9980 7556 Passcode: 697163

Degree Type:

PhD Dissertation Defense

Committee Members:

Fengjun Li, Chair
Alex Bardas
Drew Davidson
Bo Luo
Haiyang Chao

Abstract

The rapid growth of Internet of Things (IoT) technology has brought unprecedented convenience to everyday life, enabling users to deploy automation rules and develop IoT apps tailored to their specific needs. However, modern IoT ecosystems consist of numerous devices, applications, and platforms that interact continuously. As a result, users are increasingly exposed to complex and subtle security and privacy risks that are difficult to fully comprehend. Even interactions among seemingly harmless apps can introduce unforeseen security and privacy threats. In addition, violations of memory integrity can undermine the security guarantees on which IoT apps rely.

The first approach investigates hidden cross-app privacy leakage risks in IoT apps. These risks arise from cross-app interaction chains formed among multiple seemingly benign IoT apps. Our analysis reveals that interactions between apps can expose sensitive information such as user identity, location, tracking data, and activity patterns. We quantify these privacy leaks by assigning probability scores to evaluate risk levels based on inferences. In addition, we provide a fine-grained categorization of privacy threats to generate detailed alerts, enabling users to better understand and address specific privacy risks.

The second approach addresses cross-app interaction threats in IoT automation systems by leveraging a logic-based analysis model grounded in event relations. We formalize event relationships, detect event interferences, and classify rule conflicts, then generate risk scores and conflict rankings to enable comprehensive conflict detection and risk assessment. To mitigate the identified interaction threats, an optimization-based approach is employed to reduce risks while preserving system functionality. This approach ensures comprehensive coverage of cross-app interaction threats and provides a robust solution for detecting and resolving rule conflicts in IoT environments.

To support the development and rigorous evaluation of these security analyses, we further developed a large-scale, manually verified, and comprehensive dataset of real-world IoT apps. This clean and diverse benchmark dataset supports the development and validation of IoT security and privacy solutions. All proposed approaches are evaluated using this dataset of real-world apps, collectively offering valuable insights and practical tools for enhancing IoT security and privacy against cross-app threats. Furthermore, we examine the integrity of the execution environment that supports IoT apps. We show that, even under non-privileged execution, carefully crafted memory access patterns can induce bit flips in physical memory, allowing attackers to corrupt data and compromise system integrity without requiring elevated privileges.


Shawn Robertson

A Low-Power Low-Throughput Communications Solution for At-Risk Populations in Resource Constrained Contested Environments

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

PhD Dissertation Defense

Committee Members:

Alex Bardas, Chair
Drew Davidson
Fengjun Li
Bo Luo
Shawn Keshmiri

Abstract

In resource‑constrained contested environments (RCCEs), communications are routinely censored, surveilled, or disrupted by nation‑state adversaries, leaving at‑risk populations—including protesters, dissidents, disaster‑affected communities, and military units—without secure connectivity. This dissertation introduces MeshBLanket, a Bluetooth Mesh‑based framework designed for low‑power, low‑throughput messaging with minimal electromagnetic spectrum exposure. Built on commercial off‑the‑shelf hardware, MeshBLanket extends the Bluetooth Mesh specification with automated provisioning and network‑wide key refresh to enhance scalability and resilience.

We evaluated MeshBLanket through field experimentation (range, throughput, battery life, and security enhancements) and qualitative interviews with ten senior U.S. Army communications experts. Thematic analysis revealed priorities of availability, EMS footprint reduction, and simplicity of use, alongside adoption challenges and institutional skepticism. Results demonstrate that MeshBLanket maintains secure messaging under load, supports autonomous key refresh, and offers operational relevance at the forward edge of battlefields.

Beyond military contexts, parallels with protest environments highlight MeshBLanket’s broader applicability for civilian populations facing censorship and surveillance. By unifying technical experimentation with expert perspectives, this work contributes a proof‑of‑concept communications architecture that advances secure, resilient, and user‑centric connectivity in environments where traditional infrastructure is compromised or weaponized.


 


Sai Karthik Maddirala

Real-Estate Price Analysis and Prediction Using Ensemble Learning

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Morteza Hashemi
Prasad Kulkarni


Abstract

Accurate real-estate price estimation is crucial for buyers, sellers, investors, lenders, and policymakers, yet traditional valuation practices often rely on subjective judgment, inconsistent expertise, and incomplete market information. With the increasing availability of digital property listings, large volumes of structured real-estate data can now be leveraged to build objective, data-driven valuation systems. This project develops a comprehensive analytical framework for predicting different types of properties prices using real-world listing data collected from 99acres.com across major Indian cities. The workflow includes automated web scraping, extensive data cleaning, normalization of heterogeneous property attributes, and exploratory data analysis to identify important pricing patterns and structural trends within the dataset. A multi-stage learning pipeline is designed—consisting of feature preparation, hyperparameter tuning, cross-validation, and performance evaluation—to ensure that the final predictive system is both reliable and generalizable. In addition to the core prediction engine, the project proposes a future extension using Retrieval-Augmented Generation (RAG) with Large Language Models(LLM’s) to provide transparent, context-aware explanations for each valuation. Overall, this work establishes the foundation for a scalable, interpretable, and data-centric real-estate valuation platform capable of supporting informed decision-making in diverse market contexts.


Ramya Harshitha Bolla

AI Academic Assistant for Summarization and Question Answering

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Rachel Jarvis
Prasad Kulkarni


Abstract

The rapid expansion of academic literature has made efficient information extraction increasingly difficult for researchers, leading to substantial time spent manually summarizing documents and identifying key insights. This project presents an AI-powered Academic Assistant designed to streamline academic reading through multi-level summarization, contextual question answering, and source-grounded traceability. The system incorporates a robust preprocessing pipeline including text extraction, artifact removal, noise filtering, and section segmentation to prepare documents for accurate analysis. After assessing the limitations of traditional NLP and transformer-based summarization models, the project adopts a Large Language Model (LLM) approach using the Gemini API, enabling deeper semantic understanding, long-context processing, and flexible summarization. The assistant provides structured short, medium, and long summaries; contextual keyword extraction; and interactive question answering with transparent source highlighting. Limitations include handling complex visual content and occasional API constraints. Overall, this project demonstrates how modern LLMs, combined with tailored prompt engineering and structured preprocessing, can significantly enhance the academic document analysis workflow.


Keerthi Sudha Borra

Intellinotes – AI-POWERED DOCUMENT UNDERSTANDING PLATFORM

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Han Wang


Abstract

This project presents Intellinotes, an AI-powered platform that transforms educational documents into multiple learning formats to address information-overload challenges in modern education. The system leverages large language models (GPT-4o-mini) to automatically generate four complementary outputs from a single document upload: educational summaries, conversational podcast scripts, hierarchical mind maps, and interactive flashcards.

The platform employs a three-tier architecture built with Next.js, FastAPI, and MongoDB, supporting multiple document formats (PDF, DOCX, PPTX, TXT, images) through a robust parsing pipeline. Comprehensive evaluation on 30 research documents demonstrates exceptional system reliability with a 100% feature success rate across 150 tests (5 features × 30 documents), and strong semantic understanding with a semantic similarity score of 0.72.

While ROUGE scores (ROUGE-1: 0.40, ROUGE-2: 0.09, ROUGE-L: 0.17) indicate moderate lexical overlap typical of abstractive summarization, the high semantic similarity demonstrates that the system effectively captures and conveys the conceptual meaning of source documents—an essential requirement for educational content. This validation of meaning preservation over word matching represents an important contribution to evaluating educational AI systems.

The system processes documents in approximately 65 seconds with perfect reliability, providing students with comprehensive multi-modal learning materials that cater to diverse learning styles. This work contributes to the growing field of AI-assisted education by demonstrating a practical application of large language models for automated educational content generation supported by validated quality metrics.


Sowmya Ambati

AI-Powered Question Paper Generator

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Dongjie Wang


Abstract

Designing a well-balanced exam requires instructors to review extensive course materials, determine key concepts, and design questions that reflect appropriate difficulty and cognitive depth. This project develops an AI-powered Question Paper Generator that automates much of this process while keeping instructors in full control. The system accepts PDFs, Word documents, PPT slides, and text files, extracts their content, and builds a FAISS-based retrieval index using sentence-transformer embeddings. A large language model then generates multiple question types—MCQs, short answers, and true/false—guided by user-selected difficulty levels and Bloom’s Taxonomy distributions to ensure meaningful coverage. Each question is evaluated with a grounding score that measures how closely it aligns with the source material, improving transparency and reducing hallucination. A React frontend enables instructors to monitor progress, review questions, toggle answers, and export to PDF or Word, while an ASP.NET Core backend manages processing and metrics. The system reduces exam preparation time and enhances consistency across assessments.


George Steven Muvva

Automated Fake Content Detection Using TF-IDF-Based Machine Learning and LSTM-Driven Deep Learning Models

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Rachel Jarvis
Prasad Kulkarni


Abstract

The rapid spread of misinformation across online platforms has made automated fake news detection essential. This project develops and compares machine learning (SVM, Decision Tree) and deep learning (LSTM) models to classify news headlines from the GossipCop and PolitiFact datasets as real or fake. After extensive preprocessing— including text cleaning, lemmatization, TF-IDF vectorization, and sequence tokenization—the models are trained and evaluated using standard performance metrics. Results show that SVM provides a strong baseline, but the LSTM model achieves higher accuracy and F1-scores by capturing deeper semantic and contextual patterns in the text. The study highlights the challenges of domain variation and subtle linguistic cues, while demonstrating that context-aware deep learning methods offer superior capability for automated fake content detection.