I2S Masters/ Doctoral Theses


All students and faculty are welcome to attend the final defense of I2S graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Abishek Doodgaon

Photorealistic Synthetic Data Generation for Deep Learning-based Structural Health Monitoring of Concrete Dams

When & Where:


LEEP2, Room 1415A

Degree Type:

MS Thesis Defense

Committee Members:

Zijun Yao, Chair
Caroline Bennett
Prasad Kulkarni
Remy Lequesne

Abstract

Regular inspections are crucial for identifying and assessing damage in concrete dams, including a wide range of damage states. Manual inspections of dams are often constrained by cost, time, safety, and inaccessibility. Automating dam inspections using artificial intelligence has the potential to improve the efficiency and accuracy of data analysis. Computer vision and deep learning models have proven effective in detecting a variety of damage features using images, but their success relies on the availability of high-quality and diverse training data. This is because supervised learning, a common machine-learning approach for classification problems, uses labeled examples, in which each training data point includes features (damage images) and a corresponding label (pixel annotation). Unfortunately, public datasets of annotated images of concrete dam surfaces are scarce and inconsistent in quality, quantity, and representation.

To address this challenge, we present a novel approach that involves synthesizing a realistic environment using a 3D model of a dam. By overlaying this model with synthetically created photorealistic damage textures, we can render images to generate large and realistic datasets with high-fidelity annotations. Our pipeline uses NX and Blender for 3D model generation and assembly, Substance 3D Designer and Substance Automation Toolkit for texture synthesis and automation, and Unreal Engine 5 for creating a realistic environment and rendering images. This generated synthetic data is then used to train deep learning models in the subsequent steps. The proposed approach offers several advantages. First, it allows generation of large quantities of data that are essential for training accurate deep learning models. Second, the texture synthesis ensures generation of high-fidelity ground truths (annotations) that are crucial for making accurate detections. Lastly, the automation capabilities of the software applications used in this process provides flexibility to generate data with varied textures elements, colors, lighting conditions, and image quality overcoming the constraints of time. Thus, the proposed approach can improve the automation of dam inspection by improving the quality and quantity of training data.


Sana Awan

Towards Robust and Privacy-preserving Federated Learning

When & Where:


Zoom (ID: 935 5019 8870 Passcode: 323434)

Degree Type:

PhD Dissertation Defense

Committee Members:

Fengjun Li, Chair
Alex Bardas
Cuncong Zhong
Mei Liu
Haiyang Chao

Abstract

Machine Learning (ML) has revolutionized various fields, from disease prediction to credit risk evaluation, by harnessing abundant data scattered across diverse sources. However, transporting data to a trusted server for centralized ML model training is not only costly but also raises privacy concerns, particularly with legislative standards like HIPAA in place. In response to these challenges, Federated Learning (FL) has emerged as a promising solution. FL involves training a collaborative model across a network of clients, each retaining its own private data. By conducting training locally on the participating clients, this approach eliminates the need to transfer entire training datasets while harnessing their computation capabilities. However, FL introduces unique privacy risks, security concerns, and robustness challenges. Firstly, FL is susceptible to malicious actors who may tamper with local data, manipulate the local training process, or intercept the shared model or gradients to implant backdoors that affect the robustness of the joint model. Secondly, due to the statistical and system heterogeneity within FL, substantial differences exist between the distribution of each local dataset and the global distribution, causing clients’ local objectives to deviate greatly from the global optima, resulting in a drift in local updates. Addressing such vulnerabilities and challenges is crucial before deploying FL systems in critical infrastructures.

In this dissertation, we present a multi-pronged approach to address the privacy, security, and robustness challenges in FL. This involves designing innovative privacy protection mechanisms and robust aggregation schemes to counter attacks during the training process. To address the privacy risk due to model or gradient interception, we present the design of a reliable and accountable blockchain-enabled privacy-preserving federated learning (PPFL) framework which leverages homomorphic encryption to protect individual client updates. The blockchain is adopted to support provenance of model updates during training so that malformed or malicious updates can be identified and traced back to the source. 

We studied the challenges in FL due to heterogeneous data distributions and found that existing FL algorithms often suffer from slow and unstable convergence and are vulnerable to poisoning attacks, particularly in extreme non-independent and identically distributed (non-IID) settings. We propose a robust aggregation scheme, named CONTRA, to mitigate data poisoning attacks and ensure an accuracy guarantee even under attack. This defense strategy identifies malicious clients by evaluating the cosine similarity of their gradient contributions and subsequently removes them from FL training. Finally, we introduce FL-GMM, an algorithm designed to tackle data heterogeneity while prioritizing privacy. It iteratively constructs a personalized classifier for each client while aligning local-global feature representations. By aligning local distributions with global semantic information, FL-GMM minimizes the impact of data diversity. Moreover, FL-GMM enhances security by transmitting derived model parameters via secure multiparty computation, thereby avoiding vulnerabilities to reconstruction attacks observed in other approaches. 


Past Defense Notices

Dates

Xiangyu Chen

Toward Efficient Deep Learning for Computer Vision Applications

When & Where:


Nichols Hall, Room 246

Degree Type:

PhD Dissertation Defense

Committee Members:

Cuncong Zhong, Chair
Prasad Kulkarni
Bo Luo
Fengjun Li
Honguo Xu

Abstract

Deep learning leads the performance in many areas of computer vision. However, after a decade of research, it tends to require larger datasets and more complex models, leading to heightened resource consumption across all fronts. Regrettably, meeting these requirements proves challenging in many real-life scenarios. First, both data collection and labeling processes entail substantial labor and time investments. This challenge becomes especially pronounced in domains such as medicine, where identifying rare diseases demands meticulous data curation. Secondly, the large size of state-of-the-art models, such as ViT, Stable Diffusion, and ConvNext, hinders their deployment on resource-constrained platforms like mobile devices. Research indicates pervasive redundancies within current neural network structures, exacerbating the issue. Lastly, even with ample datasets and optimized models, the time required for training and inference remains prohibitive in certain contexts. Consequently, there is a burgeoning interest among researchers in exploring avenues for efficient artificial intelligence.

This study endeavors to delve into various facets of efficiency within computer vision, including data efficiency, model efficiency, as well as training and inference efficiency. The data efficiency is improved from the perspective of increasing information brought by given image inputs and reducing redundancies of RGB image formats. To achieve this, we propose to integrate both spatial and frequency representations to finetune the classifier. Additionally, we propose explicitly increasing the input information density in the frequency domain by deleting unimportant frequency channels. For model efficiency, we scrutinize the redundancies present in widely used vision transformers. Our investigation reveals that trivial attention in their attention modules covers useful non-trivial attention due to its large amount. We propose mitigating the impact of accumulated trivial attention weights. To increase training efficiency, we propose SuperLoRA, a generation of LoRA adapter, to fine-tune pretrained models with few iterations and extremely-low parameters. Finally, a model simplification pipeline is proposed to further reduce inference time on mobile devices. By addressing these challenges, we aim to advance the practicality and performance of computer vision systems in real-world applications.


Grace Young

Quantum Polynomial-Time Reduction for the Dihedral Hidden Subgroup Problem

When & Where:


Nichols Hall, Room 246

Degree Type:

PhD Dissertation Defense

Committee Members:

Perry Alexander, Chair
Esam El-Araby
Matthew Moore
Cuncong Zhong
KC Kong

Abstract

The last century has seen incredible growth in the field of quantum computing. Quantum computation offers the opportunity to find efficient solutions to certain computational problems which are intractable on classical computers. One class of problems that seems to benefit from quantum computing is the Hidden Subgroup Problem (HSP). The HSP includes, as special cases, the problems of integer factoring, discrete logarithm, shortest vector, and subset sum - making the HSP incredibly important in various fields of research.                               

The presented research examines the HSP for Dihedral groups with order 2^n and proves a quantum polynomial-time reduction to the so-called Codomain Fiber Intersection Problem (CFIP). The usual approach to the HSP relies on harmonic analysis in the domain of the problem and the best-known algorithm using this approach is sub-exponential, but still super-polynomial. The algorithm we will present deviates from the usual approach by focusing on the structure encoded in the codomain and uses this structure to direct a “walk” down the subgroup lattice terminating at the hidden subgroup.                               

Though the algorithm presented here is specifically designed for the DHSP, it has potential applications to many other types of the HSP. It is hypothesized that any group with a sufficiently structured subgroup lattice could benefit from the analysis developed here. As this approach diverges from the standard approach to the HSP it could be a promising step in finding an efficient solution to this problem.


Daniel Herr

Information Theoretic Physical Waveform Design with Application to Waveform-Diverse Adaptive-on-Transmit Radar

When & Where:


Nichols Hall, Room 246

Degree Type:

PhD Comprehensive Defense

Committee Members:

James Stiles, Chair
Chris Allen
Shannon Blunt
Carl Leuschen
Chris Depcik

Abstract

Information theory provides methods for quantifying the information content of observed signals and has found application in the radar sensing space for many years. Here, we examine a type of information derived from Fisher information known as Marginal Fisher Information (MFI) and investigate its use to design pulse-agile waveforms. By maximizing this form of information, the expected error covariance about an estimation parameter space may be minimized. First, a novel method for designing MFI optimal waveforms given an arbitrary waveform model is proposed and analyzed. Next, a transformed domain approach is proposed in which the estimation problem is redefined such that information is maximized about a linear transform of the original estimation parameters. Finally, informationally optimal waveform design is paired with informationally optimal estimation (receive processing) and are combined into a cognitive radar concept. Initial experimental results are shown and a proposal for continued research is presented.


Rachel Chang

Designing Pseudo-Random Staggered PRI Sequences

When & Where:


Nichols Hall, Room 246

Degree Type:

MS Thesis Defense

Committee Members:

Shannon Blunt, Chair
Chris Allen
James Stiles


Abstract

In uniform pulse-Doppler radar, there is a well known trade-off between unambiguous Doppler and unambiguous range. Pulse repetition interval (PRI) staggering, a technique that involves modulating the interpulse times, addresses this trade-space allowing for expansion of the unambiguous Doppler domain with little range swath incursion. Random PRI staggering provides additional diversity, but comes at the cost of increased Doppler sidelobes. Thus, careful PRI sequence design is required to avoid spurious sidelobe peaks that could result in false alarms.

In this thesis, two random PRI stagger models are defined and compared, and sidelobe peak mitigation is discussed. First, the co-array concept (borrowed from the intuitively related field of sparse array design in the spatial domain) is utilized to examine the effect of redundancy on sidelobe peaks for random PRI sequences. Then, a sidelobe peak suppression technique is introduced that involves a gradient-based optimization of the random PRI sequences, producing pseudo-random sequences that are shown to significantly reduce spurious Doppler sidelobes in both simulation and experimentally.


Fatima Al-Shaikhli

Fiber Property Characterization based on Electrostriction

When & Where:


Nichols Hall 250 | Gemini Room

Degree Type:

MS Thesis Defense

Committee Members:

Ron Hui, Chair
Shannon Blunt
Shima Fardad


Abstract

Electrostriction in an optical fiber is introduced by the interaction between the forward propagated optical signal and the acoustic standing waves in the radial direction resonating between the center of the core and the cladding circumference of the fiber. The response of electrostriction is dependent on fiber parameters, especially the mode field radius. A novel technique is demonstrated to characterize fiber properties by means of measuring their electrostriction response under intensity modulation. As the spectral envelope of electrostriction-induced propagation loss is anti-symmetrical, the signal-to-noise ratio can be significantly increased by subtracting the measured spectrum from its complex conjugate. It is shown that if the transversal field distribution of the fiber propagation mode is Gaussian, the envelope of the electrostriction-induced loss spectrum closely follows a Maxwellian distribution whose shape can be specified by a single parameter determined by the mode field radius. 


Venkata Nadha Reddy Karasani

Implementing Web Presence For The History Of Black Writing

When & Where:


LEEP2, Room 1415

Degree Type:

MS Thesis Defense

Committee Members:

Drew Davidson, Chair
Perry Alexander
Hossein Saiedian


Abstract

The Black Literature Network Project is a comprehensive initiative to disseminate literature knowledge to students, academics, and the general public. It encompasses four distinct portals, each featuring content created and curated by scholars in the field. These portals include the Novel Generator Machine, Literary Data Gallery, Multithreaded Literary Briefs, and Remarkable Receptions Podcast Series. My significant contribution to this project was creating a standalone website for the Current Archives and Collections Index that offers an easily searchable index of black-themed collections. Additionally, I was exclusively responsible for the complete development of the novel generator tool. This application provides customized book recommendations based on user preferences. As a part of the History of Black Writing (HBW) Program, I had the opportunity to customize an open-source annotation tool called Hypothesis. This customization allowed for its use on all websites related to the Black Literature Network Project by the end users. The Black Book Interactive Project (BBIP) collaborates with institutions and groups nationwide to promote access to Black-authored texts and digital publishing. Through BBIP, we plan to increase black literature’s visibility in digital humanities research.


Sohaib Kiani

Exploring Trustworthy Machine Learning from a Broader Perspective: Advancements and Insights

When & Where:


Nichols Hall 250 | Gemini Room

Degree Type:

PhD Dissertation Defense

Committee Members:

Bo Luo, Chair
Alex Bardas
Fengjun Li
Cuncong Zhong
Xuemin Tu

Abstract

Machine learning (ML) has transformed numerous domains, demonstrating exceptional performance in autonomous driving, medical diagnosis, and decision-making tasks. Nevertheless, ensuring the trustworthiness of ML models remains a persistent challenge, particularly with the emergence of new applications. The primary challenges in this context are the selection of an appropriate solution from a multitude of options, mitigating adversarial attacks, and advancing towards a unified solution that can be applied universally.

The thesis comprises three interconnected parts, all contributing to the overarching goal of improving trustworthiness in machine learning. Firstly, it introduces an automated machine learning (AutoML) framework that streamlines the training process, achieving optimum performance, and incorporating existing solutions for handling trustworthiness concerns. Secondly, it focuses on enhancing the robustness of machine learning models, particularly against adversarial attacks. A robust detector named "Argos" is introduced as a defense mechanism, leveraging the concept of two "souls" within adversarial instances to ensure robustness against unknown attacks. It incorporates the visually unchanged content representing the true label and the added invisible perturbation corresponding to the misclassified label. Thirdly, the thesis explores the realm of causal ML, which plays a fundamental role in assisting decision-makers and addressing challenges such as interpretability and fairness in traditional ML. By overcoming the difficulties posed by selective confounding in real-world scenarios, the proposed scheme utilizes dual-treatment samples and two-step procedures with counterfactual predictors to learn causal relationships from observed data. The effectiveness of the proposed scheme is supported by theoretical error bounds and empirical evidence using synthetic and real-world child placement data. By reducing the requirement for observed confounders, the applicability of causal ML is enhanced, contributing to the overall trustworthiness of machine learning systems.


Oluwanisola Ibikunle

DEEP LEARNING ALGORITHMS FOR RADAR ECHOGRAM LAYER TRACKING

When & Where:


Richard K. Moore Conference Room

Degree Type:

PhD Comprehensive Defense

Committee Members:

Shannon Blunt, Chair
Carl Leuschen
Jilu Li
James Stiles
Chris Depcik

Abstract

The accelerated melting of ice sheets in the polar regions of the world, specifically in Greenland and Antarctica, due to contemporary climate warming is contributing to global sea level rise. To understand and quantify this phenomenon, airborne radars have been deployed to create echogram images that map snow accumulation patterns in these regions. Using advanced radar systems developed by the Center for Remote Sensing and Integrated Systems (CReSIS), a significant amount (1.5 petabytes) of climate data has been collected. However, the process of extracting ice phenomenology information, such as accumulation rate, from the data is limited. This is because the radar echograms require tracking of the internal layers, a task that is still largely manual and time-consuming. Therefore, there is a need for automated tracking.

Machine learning and deep learning algorithms are well-suited for this problem given their near-human performance on optical images. Moreover, the significant overlap between classical radar signal processing and machine learning techniques suggests that fusion of concepts from both fields can lead to optimized solutions for the problem. However, supervised deep learning algorithms suffer the circular problem of first requiring large amounts of labeled data to train the models which do not exist currently.

In this work, we propose custom algorithms, including supervised, semi-supervised, and self-supervised approaches, to deal with the limited annotated data problem to achieve accurate tracking of radiostratigraphic layers in echograms. Firstly, we propose an iterative multi-class classification algorithm, called “Row Block,” which sequentially tracks internal layers from the top to the bottom of an echogram given the surface location. We aim to use the trained iterative model in an active learning paradigm to progressively increase the labeled dataset. We also investigate various deep learning semantic segmentation algorithms by casting the echogram layer tracking problem as a binary and multiclass classification problem. These require post-processing to create the desired vector-layer annotations, hence, we propose a custom connected-component algorithm as a post-processing routine. Additionally, we propose end-to-end algorithms that avoid the post-processing to directly create annotations as vectors. Furthermore, we propose semi-supervised algorithms using weakly-labeled annotations and unsupervised algorithms that can learn the latent distribution of echogram snow layers while reconstructing echogram images from a sparse embedding representation.

A concurrent objective of this work is to provide the deep learning and science community with a large fully-annotated dataset. To achieve this, we propose synchronizing radar data with outputs from a regional climate model to provide a dataset with overlapping measurements that can enhance the performance of the trained models.


Prashanthi Mallojula

On the Security of Mobile and Auto Companion Apps

When & Where:


Nichols Hall 246 | Executive Conference Room

Degree Type:

PhD Comprehensive Defense

Committee Members:

Bo Luo, Chair
Alex Bardas
Fengjun Li
Hongyang Sun
Huazhen Fang

Abstract

Today’s smartphone platforms have millions of applications, which not only access users’ private data but also information from the connected external services and IoT/CPS devices. Mobile application security involves protecting sensitive information and securing communication between the application and external services or devices. We focus on these two key aspects of mobile application security.

In the first part of this dissertation, we aim to ensure the security of user information collected by mobile apps. Mobile apps seek consent from users to approve various permissions to access sensitive information such as location and personal information. However, users often blindly accept permission requests and apps start to abuse this mechanism. As long as a permission is requested, the state-of-the-art security mechanisms will treat it as legitimate. We ask the question whether the permission requests are valid? We attempt to validate permission requests using statistical analysis on permission sets extracted from groups of functionally similar apps. We detected mobile applications with abusive permission access and measure the risk of information leaks through each mobile application.

Second, we propose to investigate the security of auto companion apps. Auto companion apps are mobile apps designed to remotely connect with cars to provide features such as diagnostics, navigation, entertainment, and safety alerts. However, this can lead to several security threats, for instance, onboard information of vehicles can be tracked or altered through a malicious app. We design a comprehensive security analysis framework on automotive companion apps all stages of communication and collaboration between vehicles and companion apps such as connection establishment, authentication, encryption, information storage, and Vehicle diagnostic and control command access. By conducting static and network traffic analysis of Android OBD apps, we identify a series of vulnerability scenarios. We further evaluate these vulnerabilities with vehicle-based testing and identify potential security threats associated with auto companion apps.


Michael Nieses

Trustworthy Measurements of a Linux Kernel and Layered Attestation via a Verified Microkernel

When & Where:


Nichols Hall, Room 246

Degree Type:

PhD Comprehensive Defense

Committee Members:

Perry Alexander, Chair
Drew Davidson
Matthew Moore
Cuncong Zhong
Corey Maley

Abstract

Layered attestation is a process by which one can establish trust in a remote party. It is a special case of attestation in which different layers of the attesting system are handled distinctly. This type of trust is desirable because a vast and growing number of people depend on networked devices to go about their daily lives. Current architectures for remote attestation are lacking in process isolation, which is evidenced by the existence of virtual machine escape exploits. This implies a deficiency of trustworthy ways to determine whether a networked Linux system has been exploited. The seL4 microkernel, uniquely in the world, has machine-checked proofs concerning process confidentiality and integrity. The seL4 microkernel is leveraged here to provide a verified level of software-based process isolation. When complemented with a comprehensive collection of measurements, this architecture can be trusted to report its own corruption. The architecture is described, implemented, and tested against a variety of exploits, which are detected using introspective measurement techniques.