I2S Masters/ Doctoral Theses


All students and faculty are welcome to attend the final defense of I2S graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Abishek Doodgaon

Photorealistic Synthetic Data Generation for Deep Learning-based Structural Health Monitoring of Concrete Dams

When & Where:


LEEP2, Room 1415A

Degree Type:

MS Thesis Defense

Committee Members:

Zijun Yao, Chair
Caroline Bennett
Prasad Kulkarni
Remy Lequesne

Abstract

Regular inspections are crucial for identifying and assessing damage in concrete dams, including a wide range of damage states. Manual inspections of dams are often constrained by cost, time, safety, and inaccessibility. Automating dam inspections using artificial intelligence has the potential to improve the efficiency and accuracy of data analysis. Computer vision and deep learning models have proven effective in detecting a variety of damage features using images, but their success relies on the availability of high-quality and diverse training data. This is because supervised learning, a common machine-learning approach for classification problems, uses labeled examples, in which each training data point includes features (damage images) and a corresponding label (pixel annotation). Unfortunately, public datasets of annotated images of concrete dam surfaces are scarce and inconsistent in quality, quantity, and representation.

To address this challenge, we present a novel approach that involves synthesizing a realistic environment using a 3D model of a dam. By overlaying this model with synthetically created photorealistic damage textures, we can render images to generate large and realistic datasets with high-fidelity annotations. Our pipeline uses NX and Blender for 3D model generation and assembly, Substance 3D Designer and Substance Automation Toolkit for texture synthesis and automation, and Unreal Engine 5 for creating a realistic environment and rendering images. This generated synthetic data is then used to train deep learning models in the subsequent steps. The proposed approach offers several advantages. First, it allows generation of large quantities of data that are essential for training accurate deep learning models. Second, the texture synthesis ensures generation of high-fidelity ground truths (annotations) that are crucial for making accurate detections. Lastly, the automation capabilities of the software applications used in this process provides flexibility to generate data with varied textures elements, colors, lighting conditions, and image quality overcoming the constraints of time. Thus, the proposed approach can improve the automation of dam inspection by improving the quality and quantity of training data.


Sana Awan

Towards Robust and Privacy-preserving Federated Learning

When & Where:


Zoom (ID: 935 5019 8870 Passcode: 323434)

Degree Type:

PhD Dissertation Defense

Committee Members:

Fengjun Li, Chair
Alex Bardas
Cuncong Zhong
Mei Liu
Haiyang Chao

Abstract

Machine Learning (ML) has revolutionized various fields, from disease prediction to credit risk evaluation, by harnessing abundant data scattered across diverse sources. However, transporting data to a trusted server for centralized ML model training is not only costly but also raises privacy concerns, particularly with legislative standards like HIPAA in place. In response to these challenges, Federated Learning (FL) has emerged as a promising solution. FL involves training a collaborative model across a network of clients, each retaining its own private data. By conducting training locally on the participating clients, this approach eliminates the need to transfer entire training datasets while harnessing their computation capabilities. However, FL introduces unique privacy risks, security concerns, and robustness challenges. Firstly, FL is susceptible to malicious actors who may tamper with local data, manipulate the local training process, or intercept the shared model or gradients to implant backdoors that affect the robustness of the joint model. Secondly, due to the statistical and system heterogeneity within FL, substantial differences exist between the distribution of each local dataset and the global distribution, causing clients’ local objectives to deviate greatly from the global optima, resulting in a drift in local updates. Addressing such vulnerabilities and challenges is crucial before deploying FL systems in critical infrastructures.

In this dissertation, we present a multi-pronged approach to address the privacy, security, and robustness challenges in FL. This involves designing innovative privacy protection mechanisms and robust aggregation schemes to counter attacks during the training process. To address the privacy risk due to model or gradient interception, we present the design of a reliable and accountable blockchain-enabled privacy-preserving federated learning (PPFL) framework which leverages homomorphic encryption to protect individual client updates. The blockchain is adopted to support provenance of model updates during training so that malformed or malicious updates can be identified and traced back to the source. 

We studied the challenges in FL due to heterogeneous data distributions and found that existing FL algorithms often suffer from slow and unstable convergence and are vulnerable to poisoning attacks, particularly in extreme non-independent and identically distributed (non-IID) settings. We propose a robust aggregation scheme, named CONTRA, to mitigate data poisoning attacks and ensure an accuracy guarantee even under attack. This defense strategy identifies malicious clients by evaluating the cosine similarity of their gradient contributions and subsequently removes them from FL training. Finally, we introduce FL-GMM, an algorithm designed to tackle data heterogeneity while prioritizing privacy. It iteratively constructs a personalized classifier for each client while aligning local-global feature representations. By aligning local distributions with global semantic information, FL-GMM minimizes the impact of data diversity. Moreover, FL-GMM enhances security by transmitting derived model parameters via secure multiparty computation, thereby avoiding vulnerabilities to reconstruction attacks observed in other approaches. 


Past Defense Notices

Dates

Aiden Liang

Enhancing Healthcare Resource Demand Forecasting Using Machine Learning: An Integrated Approach to Addressing Temporal Dynamics and External Influences

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

MS Project Defense

Committee Members:

Prasad Kulkami, Chair
Fengjun Li
Zijun Yao


Abstract

This project aims to enhance predictive models for forecasting healthcare resource demand, particularly focusing on hospital bed occupancy and emergency room visits while considering external factors such as disease outbreaks and weather conditions. Utilizing a range of machine learning techniques, the research seeks to improve the accuracy and reliability of these forecasts, essential for optimizing healthcare resource management.

The project involves multiple phases, starting with the collection and preparation of historical data from public health databases and hospital records, enriched with external variables such as weather patterns and epidemiological data. Advanced feature engineering is key, transforming raw data into a machine learning-friendly format, including temporal and lag features to identify patterns and trends.

The study explores various machine learning methods, from traditional models like ARIMA to advanced techniques such as LSTM networks and GRU models, incorporating rigorous training and validation protocols to ensure robust performance. Model effectiveness is evaluated using metrics like MAE, RMSE, and MAPE, with a strong focus on model interpretability and explainability through techniques like SHAP and LIME. The project also addresses practical implementation challenges and ethical considerations, aiming to bridge academic research with practical healthcare applications. Findings are intended for dissemination through academic papers and conferences, ensuring that the models developed meet both the ethical standards and practical needs of the healthcare industry.


Thomas Atkins

Secure and Auditable Academic Collections via Hyperledger Fabric-Based Smart Contracts

When & Where:


Nichols Hall, Room 246

Degree Type:

MS Thesis Defense

Committee Members:

Drew Davidson, Chair
Fengjun Li
Bo Luo


Abstract

This paper introduces a novel approach to manage collections of artifacts through smart contract access control, rooted in on-chain role-based property-level access control. This smart contract facilitates the lifecycle of these artifacts including allowing for the creation, modification, removal, and historical auditing of the artifacts through both direct and suggested actions. This method introduces a collection object designed to store role privileges concerning state object properties. User roles are defined within an on-chain entity that maps users' signed identities to roles across different collections, enabling a single user to assume varying roles in distinct collections.

Unlike existing key-level endorsement mechanisms, this approach offers finer-grained privileges by defining them on a per-property basis, not at the key level. The outcome is a more flexible and fine-grained access control system seamlessly integrated into the smart contract itself, empowering administrators to manage access with precision and adaptability across diverse organizational contexts. This has the added benefit of allowing for the auditing of not only the history of the artifacts, but also for the permissions granted to the users. 


Theodore Harbison

Posting Passwords: How social media information can be leveraged in password guessing attacks

When & Where:


Zoom (ID: 7858648812, Pass: 348348

Degree Type:

MS Thesis Defense

Committee Members:

Hossein Saiedian, Chair
Fengjun Li
Heechul Yun


Abstract

The explosion of social media, while fostering connection, inadvertently exposes personal details that heighten password vulnerability. This thesis tackles this critical link, aiming to raise public awareness of the dangers of weak passwords and excessive online sharing. We introduce a novel password guessing algorithm, SocGuess, which capitalizes on the rich trove of information on social media profiles.

SocGuess leverages Named Entity Recognition (NER) to identify key data points within this information, such as dates, locations, and names. To further enhance its accuracy, SocGuess is trained on the rockyou dataset, a large collection of leaked passwords. By identifying different kinds of entities within these passwords, SocGuess can calculate the probability of these entities appearing in passwords.

Armed with this knowledge, SocGuess dynamically generates password guesses in order of probability by filling these entity placeholders with the corresponding data points harvested from the target’s social media profiles. This targeted approach shows SocGuess to crack 33% more passwords than existing algorithms during experimentation, demonstrably surpassing traditional methods.


Ethan Grantz

Swarm: A Backend-Agnostic Language for Simple Distributed Programming

When & Where:


Nichols Hall, Room 250 (Gemini Room)

Degree Type:

MS Project Defense

Committee Members:

Drew Davidson, Chair
Perry Alexander
Prasad Kulkarni


Abstract

Writing algorithms for a parallel or distributed environment has always been plagued with a variety of challenges, from supervising synchronous reads and writes, to managing job queues and avoiding deadlock. While many languages have libraries or language constructs to mitigate these obstacles, very few attempt to remove those challenges entirely, and even fewer do so while divorcing the means of handling those problems from the means of parallelization or distribution. This project introduces a language called Swarm, which attempts to do just that.

Swarm is a first-class parallel/distributed programming language with modular, swappable parallel drivers. It is intended for everything from multi-threaded local computation on a single machine to large scientific computations split across many nodes in a cluster.

Swarm contains next to no explicit syntax for typical parallel logic, only containing keywords for declaring which variables should reside in shared memory, and describing what code should be parallelized. The remainder of the logic (such as waiting for the results from distributed jobs or locking shared accesses) are added in when compiling to a custom bytecode called Swarm Virtual Instructions (SVI). SVI is then executed by a virtual machine whose parallelization logic is abstracted out, such that the same SVI bytecode can be executed in any parallel/distributed environment.


Johnson Umeike

Optimizing gem5 Simulator Performance: Profiling Insights and Userspace Networking Enhancements

When & Where:


Nichols Hall, Room 250 (Gemini Room)

Degree Type:

MS Thesis Defense

Committee Members:

Mohammad Alian, Chair
Prasad Kulkarni
Heechul Yun


Abstract

Full-system simulation of computer systems is critical for capturing the complex interplay between various hardware and software components in future systems. Modeling the network subsystem is indispensable for the fidelity of full-system simulations due to the increasing importance of scale-out systems. Over the last decade, the network software stack has undergone major changes, with userspace networking stacks and data-plane networks rapidly replacing the conventional kernel network stack. Nevertheless, the current state-of-the-art architectural simulator, gem5, still employs kernel networking, which precludes realistic network application scenarios.

First, we perform a comprehensive profiling study to identify and propose architectural optimizations to accelerate a state-of-the-art architectural simulator. We choose gem5 as the representative architectural simulator, run several simulations with various configurations, perform a detailed architectural analysis of the gem5 source code on different server platforms, tune both system and architectural settings for running simulations, and discuss the future opportunities in accelerating gem5 as an important application. Our detailed profiling of gem5 reveals that its performance is extremely sensitive to the size of the L1 cache. Our experimental results show that a RISC-V core with 32KB data and instruction cache improves gem5’s simulation speed by 31%∼61% compared with a baseline core with 8KB L1 caches. Second, this work extends gem5’s networking capabilities by integrating kernel-bypass/user-space networking based on the DPDK framework, significantly enhancing network throughput and reducing latency. By enabling user-space networking, the simulator achieves a substantial 6.3× improvement in network bandwidth compared to traditional Linux software stacks. Our hardware packet generator model (EtherLoadGen) provides up to a 2.1× speedup in simulation time. Additionally, we develop a suite of networking micro-benchmarks for stress testing the host network stack, allowing for efficient evaluation of gem5’s performance. Through detailed experimental analysis, we characterize the performance differences when running the DPDK network stack on both real systems and gem5, highlighting the sensitivity of DPDK performance to various system and microarchitecture parameters.


Adam Sarhage

Design of Multi-Section Coupled Line Coupler

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

Jim Stiles, Chair
Chris Allen
Glenn Prescott


Abstract

Coupled line couplers are used as directional couplers to enable measurement of forward and reverse power in RF transmitters. These measurements provide valuable feedback to the control loops regulating transmitter power output levels. This project seeks to synthesize, simulate, build, and test a broadband, five-stage coupled line coupler with a 20 dB coupling factor. The coupler synthesis is evaluated against ideal coupler components in Keysight ADS.  Fabrication of coupled line couplers is typically accomplished with a stripline topology, but a microstrip topology is additionally evaluated. Measurements from the fabricated coupled line couplers are then compared to the Keysight ADS EM simulations, and some explanations for the differences are provided. Additionally, measurements from a commercially available broadband directional coupler are provided to show what can be accomplished with the right budget.


Mohsen Nayebi Kerdabadi

Contrastive Learning of Temporal Distinctiveness for Survival Analysis in Electronic Health Records

When & Where:


Nichols Hall, Room 250 (Gemini Room)

Degree Type:

MS Project Defense

Committee Members:

Zijun Yao, Chair
Fengjun Li
Cuncong Zhong


Abstract

Survival analysis plays a crucial role in many healthcare decisions, where the risk prediction for the events of interest can support an informative outlook for a patient's medical journey. Given the existence of data censoring, an effective way of survival analysis is to enforce the pairwise temporal concordance between censored and observed data, aiming to utilize the time interval before censoring as partially observed time-to-event labels for supervised learning. Although existing studies mostly employed ranking methods to pursue an ordering objective, contrastive methods which learn a discriminative embedding by having data contrast against each other, have not been explored thoroughly for survival analysis. Therefore, we propose a novel Ontology-aware Temporality-based Contrastive Survival (OTCSurv) analysis framework that utilizes survival durations from both censored and observed data to define temporal distinctiveness and construct negative sample pairs with adjustable hardness for contrastive learning. Specifically, we first use an ontological encoder and a sequential self-attention encoder to represent the longitudinal EHR data with rich contexts. Second, we design a temporal contrastive loss to capture varying survival durations in a supervised setting through a hardness-aware negative sampling mechanism. Last, we incorporate the contrastive task into the time-to-event predictive task with multiple loss components. We conduct extensive experiments using a large EHR dataset to forecast the risk of hospitalized patients who are in danger of developing acute kidney injury (AKI), a critical and urgent medical condition. The effectiveness and explainability of the proposed model are validated through comprehensive quantitative and qualitative studies.


Jarrett Zeliff

An Analysis of Bluetooth Mesh Security Features in the Context of Secure Communications

When & Where:


Eaton Hall, Room 1

Degree Type:

MS Thesis Defense

Committee Members:

Alexadnru Bardas, Chair
Drew Davidson
Fengjun Li


Abstract

Significant developments in communication methods to help support at-risk populations have increased over the last 10 years. We view at-risk populations as a group of people present in environments where the use of infrastructure or electricity, including telecommunications, is censored and/or dangerous. Security features that accompany these communication mechanisms are essential to protect the confidentiality of its user base and the integrity and availability of the communication network.

In this work, we look at the feasibility of using Bluetooth Mesh as a communication network and analyze the security features that are inherent to the protocol. Through this analysis we determine the strengths and weaknesses of Bluetooth Mesh security features when used as a messaging medium for at risk populations and provide improvements to current shortcomings. Our analysis includes looking at the Bluetooth Mesh Networking Security Fundamentals as described by the Bluetooth Sig: Encryption and Authentication, Separation of Concerns, Area isolation, Key Refresh, Message Obfuscation, Replay Attack Protection, Trashcan Attack Protection, and Secure Device Provisioning.  We look at how each security feature is implemented and determine if these implementations are sufficient in protecting the users from various attack vectors. For example, we examined the Blue Mirror attack, a reflection attack during the provisioning process which leads to the compromise of network keys, while also assessing the under-researched key refresh mechanism. We propose a mechanism to address Blue-Mirror-oriented attacks with the goal of creating a more secure provisioning process.  To analyze the key refresh mechanism, we implemented our own full-fledged Bluetooth Mesh network and implemented a key refresh mechanism. Through this we form an assessment of the throughput, range, and impacts of a key refresh in both lab and field environments that demonstrate the suitability of our solution as a secure communication method.


Daniel Johnson

Probability-Aware Selective Protection for Sparse Iterative Solvers

When & Where:


Nichols Hall, Room 246

Degree Type:

MS Thesis Defense

Committee Members:

Hongyang Sun, Chair
Perry Alexander
Zijun Yao


Abstract

With the increasing scale of high-performance computing (HPC) systems, transient bit-flip errors are now more likely than ever, posing a threat to long-running scientific applications. A substantial portion of these applications involve the simulation of partial differential equations (PDEs) modeling physical processes over discretized spatial and temporal domains, with some requiring the solving of sparse linear systems. While these applications are often paired with system-level application-agnostic resilience techniques such as checkpointing and replication, the utilization of these techniques imposes significant overhead. In this work, we present a probability-aware framework that produces low-overhead selective protection schemes for the widely used Preconditioned Conjugate Gradient (PCG) method, whose performance can heavily degrade due to error propagation through the sparse matrix-vector multiplication (SpMV) operation. Through the use of a straightforward mathematical model and an optimized machine learning model, our selective protection schemes incorporate error probability to protect only certain crucial operations. An experimental evaluation using 15 matrices from the SuiteSparse Matrix Collection demonstrates that our protection schemes effectively reduce resilience overheads, often outperforming or matching both baseline and established protection schemes across all error probabilities.


Javaria Ahmad

Discovering Privacy Compliance Issues in IoT Apps and Alexa Skills Using AI and Presenting a Mechanism for Enforcing Privacy Compliance

When & Where:


LEEP2, Room 2425

Degree Type:

PhD Dissertation Defense

Committee Members:

Bo Luo, Chair
Alex Bardas
Tamzidul Hoque
Fengjun Li
Michael Zhuo Wang

Abstract

The growth of IoT and voice assistant (VA) apps poses increasing concerns about sensitive data leaks. While privacy policies are required to describe how these apps use private user data (i.e., data practice), problems such as missing, inaccurate, and inconsistent policies have been repeatedly reported. Therefore, it is important to assess the actual data practice in apps and identify the potential gaps between the actual and declared data usage. We find that app stores lack in regulating the compliance between the app practices and their declaration, so we use AI to discover the compliance issues in these apps to assist the regulators and developers. For VA apps, we also develop a mechanism to enforce the compliance using AI. In this work, we conduct a measurement study using our framework called IoTPrivComp, which applies an automated analysis of IoT apps’ code and privacy policies to identify compliance gaps. We collect 1,489 IoT apps with English privacy policies from the Play Store. IoTPrivComp detects 532 apps with sensitive external data flows, among which 408 (76.7%) apps have undisclosed data leaks. Moreover, 63.4% of the data flows that involve health and wellness data are inconsistent with the practices disclosed in the apps’ privacy policies. Next, we focus on the compliance issues in skills. VAs, such as Amazon Alexa, are integrated with numerous devices in homes and cars to process user requests using apps called skills. With their growing popularity, VAs also pose serious privacy concerns. Sensitive user data captured by VAs may be transmitted to third-party skills without users’ consent or knowledge about how their data is processed. Privacy policies are a standard medium to inform the users of the data practices performed by the skills. However, privacy policy compliance verification of such skills is challenging, since the source code is controlled by the skill developers, who can make arbitrary changes to the behaviors of the skill without being audited; hence, conventional defense mechanisms using static/dynamic code analysis can be easily escaped. We present Eunomia, the first real-time privacy compliance firewall for Alexa Skills. As the skills interact with the users, Eunomia monitors their actions by hijacking and examining the communications from the skills to the users, and validates them against the published privacy policies that are parsed using a BERT-based policy analysis module. When non-compliant skill behaviors are detected, Eunomia stops the interaction and warns the user. We evaluate Eunomia with 55,898 skills on Amazon skills store to demonstrate its effectiveness and to provide a privacy compliance landscape of Alexa skills.