I2S Masters/ Doctoral Theses


All students and faculty are welcome to attend the final defense of I2S graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Ashish Adhikari

Towards assessing the security of program binaries

When & Where:


Eaton Hall, Room 2001B

Degree Type:

PhD Comprehensive Defense

Committee Members:

Prasad Kulkarni, Chair
Alex Bardas
Fengjun Li
Bo Luo

Abstract

Software vulnerabilities are widespread, often resulting from coding weaknesses and poor development practices. These vulnerabilities can be exploited by attackers, posing risks to confidentiality, integrity, and availability. To protect themselves, end-users of software may have an interest in knowing whether the software they purchase, and use is secure from potential attacks. Our work is motivated by this need to automatically assess and rate the security properties of binary software.

While many researchers focus on developing techniques and tools to detect and mitigate vulnerabilities in binaries, our approach is different. We aim to determine whether the software has been developed with proper care. Our hypothesis is that software created with meticulous attention to security is less likely to contain exploitable vulnerabilities. As a first step, we examined the current landscape of binary-level vulnerability detection. We categorized critical coding weaknesses in compiled programming languages and conducted a detailed survey comparing static analysis techniques and tools designed to detect these weaknesses. Additionally, we evaluated the effectiveness of open-source CWE detection tools and analyzed their challenges. To further understand their efficacy, we conducted independent assessments using standard benchmarks.

To determine whether software is carefully and securely developed, we propose several techniques. So far, we have used machine learning and deep learning methods to identify the programming language of a binary at the functional level, enabling us to handle complex cases like mixed-language binaries and we assess whether vulnerable regions in the binary are protected with appropriate security mechanisms. Additionally, we explored the feasibility of detecting secure coding practices by examining adherence to SonarQube’s security-related coding conventions.

Next, we investigate whether compiler warnings generated during binary creation are properly addressed. Furthermore, we also aim to optimize the array bounds detection in the program binary. This enhanced array bounds detection will also increase the effectiveness of detecting secure coding conventions that are related to memory safety and buffer overflow vulnerabilities.

Our ultimate goal is to combine these techniques to rate the overall security quality of a given binary software.


Bayn Schrader

Implementation and Analysis of an Efficient Dual-Beam Radar-Communications Technique

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

MS Thesis Defense

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jonathan Owen


Abstract

Fully digital arrays enable realization of dual-function radar-communications systems which generate multiple simultaneous transmit beams with different modulation structures in different spatial directions. These spatially diverse transmissions are produced by designing the individual wave forms transmitted at each antenna element that combine in the far-field to synthesize the desired modulations at the specified directions. This thesis derives a look-up table (LUT) implementation of the existing Far-Field Radiated Emissions Design (FFRED) optimization framework. This LUT implementation requires a single optimization routine for a set of desired signals, rather than the previous implementation which required pulse-to-pulse optimization, making the LUT approach more efficient. The LUT is generated by representing the waveforms transmitted by each element in the array as a sequence of beamformers, where the LUT contains beamformers based on the phase difference between the desired signal modulations. The globally optimal beamformers, in terms of power efficiency, can be realized via the Lagrange dual problem for most beam locations and powers. The Phase-Attached Radar-Communications (PARC) waveform is selected for the communications waveform alongside a Linear Frequency Modulated (LFM) waveform for the radar signal. A set of FFRED LUTs are then used to simulate a radar transmission to verify the utility of the radar system. The same LUTs are then used to estimate the communications performance of a system with varying levels of the array knowledge uncertainty.


Will Thomas

Static Analysis and Synthesis of Layered Attestation Protocols

When & Where:


Eaton Hall, Room 2001B

Degree Type:

PhD Comprehensive Defense

Committee Members:

Perry Alexander, Chair
Alex Bardas
Drew Davidson
Sankha Guria
Eileen Nutting

Abstract

Trust is a fundamental issue in computer security. Frequently, systems implicitly trust in other
systems, especially if configured by the same administrator. This fallacious reasoning stems from the belief
that systems starting from a known, presumably good, state can be trusted. However, this statement only
holds for boot-time behavior; most non-trivial systems change state over time, and thus runtime behavior is
an important, oft-overlooked aspect of implicit trust in system security.

To address this, attestation was developed, allowing a system to provide evidence of its runtime behavior to a
verifier. This evidence allows a verifier to make an explicit informed decision about the system’s trustworthiness.
As systems grow more complex, scalable attestation mechanisms become increasingly important. To apply
attestation to non-trivial systems, layered attestation was introduced, allowing attestation of individual
components or layers, combined into a unified report about overall system behavior. This approach enables
more granular trust assessments and facilitates attestation in complex, multi-layered architectures. With the
complexity of layered attestation, discerning whether a given protocol is sufficiently measuring a system, is
executable, or if all measurements are properly reported, becomes increasingly challenging.

In this work, we will develop a framework for the static analysis and synthesis of layered attestation protocols,
enabling more robust and adaptable attestation mechanisms for dynamic systems. A key focus will be the
static verification of protocol correctness, ensuring the protocol behaves as intended and provides reliable
evidence of the underlying system state. A type system will be added to the Copland layered attestation
protocol description language to allow basic static checks, and extended static analysis techniques will be
developed to verify more complex properties of protocols for a specific target system. Further, protocol
synthesis will be explored, enabling the automatic generation of correct-by-construction protocols tailored to
system requirements.


David Felton

Optimization and Evaluation of Physical Complementary Radar Waveforms

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

PhD Comprehensive Defense

Committee Members:

Shannon Blunt, Chair
Rachel Jarvis
Patrick McCormick
James Stiles
Zsolt Talata

Abstract

In high dynamic-range environments, matched-filter radar performance is often sidelobe-limited with correlation error being fundamentally constrained by the TB of the collective emission. To contend with the regulatory necessity of spectral containment, the gradient-based complementary-FM framework was developed to produce complementary sidelobe cancellation (CSC) after coherently combining responses from distinct pulses from within a pulse-agile emission. In contrast to most complementary subsets, which were discovered via brute force under the notion of phase-coding, these comp-FM waveform subsets achieve CSC while preserving hardware compatibility since they are FM. Although comp-FM addressed a primary limitation of complementary signals (i.e., hardware distortion), CSC hinges on the exact reconstruction of autocorrelation terms to suppress sidelobes, from which optimality is broken for Doppler shifted signals. This work introduces a Doppler-generalized comp-FM (DG-comp-FM) framework that extends the cancellation condition to account for the anticipated unambiguous Doppler span after post-summing. While this framework is developed for use within a combine-before-Doppler processing manner, it can likewise be employed to design an entire coherent processing interval (CPI) to minimize range-sidelobe modulation (RSM) within the radar point-spread-function (PSF), thereby introducing the potential for cognitive operation if sufficient scattering knowledge is available a-priori. 

Some radar systems operate with multiple emitters, as in the case of Multiple-input-multiple-output (MIMO) radar. Whereas a single emitter must contend with the self-inflicted autocorrelation sidelobes, MIMO systems must likewise contend with the cross-correlation with coincident (in time and spectrum) emissions from other emitters. As such, the determination of "orthogonal waveforms" comprises a large portion of research within the MIMO space, with a small majority now recognizing that true orthogonality is not possible for band-limited signals (albeit, with the exclusion of TDMA). The notion of complementary-FM is proposed for exploration within a MIMO context, whereby coherently combining responses can achieve CSC as well as cross-correlation cancellation for a wide Doppler space. By effectively minimizing cross-correlation terms, this enables improved channel separation on receive as well as improved estimation capability due to reduced correlation error. Proposal items include further exploration/characterization of the space, incorporating an explicit spectral.


Jigyas Sharma

SEDPD: Sampling-Enhanced Differentially Private Defense against Backdoor Poisoning Attacks of Image Classification

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

MS Thesis Defense

Committee Members:

Han Wang, Chair
Drew Davidson
Dongjie Wang


Abstract

Recent advancements in explainable artificial intelligence (XAI) have brought significant transparency to machine learning by providing interpretable explanations alongside model predictions. However, this transparency has also introduced vulnerabilities, enhancing adversaries’ ability for the model decision processes through explanation-guided attacks. In this paper, we propose a robust, model-agnostic defense framework to mitigate these vulnerabilities by explanations while preserving the utility of XAI. Our framework employs a multinomial sampling approach that perturbs explanation values generated by techniques such as SHAP and LIME. These perturbations ensure differential privacy (DP) bounds, disrupting adversarial attempts to embed malicious triggers while maintaining explanation quality for legitimate users. To validate our defense, we introduce a threat model tailored to image classification tasks. By applying our defense framework, we train models with pixel-sampling strategies that integrate DP guarantees, enhancing robustness against backdoor poisoning attacks with XAI. Extensive experiments on widely used datasets, such as CIFAR-10, MNIST, CIFAR-100 and Imagenette, and models, including ConvMixer and ResNet-50, show that our approach effectively mitigates explanation-guided attacks without compromising the accuracy of the model. We also test our defense performance against other backdoor attacks, which shows our defense framework can detect other type backdoor triggers very well. This work highlights the potential of DP in securing XAI systems and ensures safer deployment of machine learning models in real-world applications.


Dimple Galla

Intelligent Application for Cold Email Generation: Business Outreach

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Dongjie Wang


Abstract

Cold emailing remains an effective strategy for software service companies to improve organizational reach by acquiring clients. Generic emails often fail to get a response.

This project leverages Generative AI to automate the cold email generation. This project is built with the Llama-3.1 model and a Chroma vector database that supports the semantic search of keywords in the job description that matches the project portfolio links of software service companies. The application automatically extracts the technology related job openings for Fortune 500 companies. Users can either select from these extracted job postings or manually enter URL of a job posting, after which the system generates email and sends email upon approval. Advanced techniques like Chain-of-Thought Prompting and Few-Shot Learning were applied to improve the relevance making the email more responsive. This AI driven approach improves engagement and simplifies the business development process for software service companies.


Past Defense Notices

Dates

Andrew Stratmann

Efficient Index-Based Multi-User Scheduling for Mobile mmWave Networks: Balancing Channel Quality and User Experience

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Thesis Defense

Committee Members:

Morteza Hashemi, Chair
Prasad Kulkarni
Erik Perrins


Abstract

Millimeter Wave (mmWave) communication technologies have the potential to establish high data rates for next-generation wireless networks, as well as enable novel applications that were previously untenable due to high throughput requirements.  Yet reliable and efficient mmWave communication remains challenged by intermittent link quality due to user mobility and frequent line-of-sight (LoS) blockage, thereby making the links unavailable or more costly to use.  These factors are further exacerbated in multi-user settings where beam alignment overhead, limited RF chains, and heterogeneous user requirements must be balanced.  In this work, we present a hybrid multi-user scheduling solution that jointly accounts for mobility-and blockage-induced unavailability to enhance user experience in mmWave video streaming applications.  Our approach integrates two key components: (i) a blockage-aware scheduling strategy modeled via a Restless Multi-Armed Bandit (RMAB) formulation and prioritized using Whittle Indexing, and (ii) a mobility-aware geometric model that estimates beam alignment overhead cost as a function of receiver motion.  We develop a comprehensive and efficient index-based scheduler that fuses these models and leverages contextual information, such as receiver distance, mobility history, and queue state, to schedule multiple users in order to maximize throughput. Simulation results demonstrate that our approach reduces system queue backlog and improves fairness compared to round-robin and traditional index-based baselines.


 


Tianxiao Zhang

Efficient and Effective Object Detection and Recognition: from Convolutions to Transformers

When & Where:


Eaton Hall, Room 2001B

Degree Type:

PhD Dissertation Defense

Committee Members:

Bo Luo, Chair
Prasad Kulkarni
Fengjun Li
Cuncong Zhong
Guanghui Wang

Abstract

With the development of Convolutional Neural Networks (CNNs), computer vision has entered a new era, significantly enhancing the performance of tasks such as image classification, object detection, segmentation, and recognition. Furthermore, the introduction of Transformer architectures has brought the attention mechanism and a global perspective to computer vision, advancing the field to a new level. The inductive bias inherent in CNNs makes convolutional models particularly well-suited for processing images and videos. On the other hand, the attention mechanism in Transformer models allows them to capture global relationships between tokens. While Transformers often require more data and longer training periods compared to their convolutional counterparts, they have the potential to achieve comparable or even superior performance when the constraints of data availability and training time are mitigated.

In this work, we propose more efficient and effective CNNs and Transformers to increase the performance of object detection and recognition. (1) A novel approach is proposed for real-time detection and tracking of small golf balls by combining object detection with the Kalman filter. Several classical object detection models were implemented and compared in terms of detection precision and speed. (2) To address the domain shift problem in object detection, we employ generative adversarial networks (GANs) to generate images from different domains. The original RGB images are concatenated with the corresponding GAN-generated images to form a 6-channel representation, improving model performance across domains. (3) A dynamic strategy for improving label assignment in modern object detection models is proposed. Rather than relying on fixed or statistics-based adaptive thresholds, a dynamic paradigm is introduced to define positive and negative samples. This allows more high-quality samples to be selected as positives, reducing the gap between classification and IoU scores and producing more accurate bounding boxes. (4) An efficient hybrid architecture combining Vision Transformers and convolutional layers is introduced for object recognition, particularly for small datasets. Lightweight depth-wise convolution modules bypass the entire Transformer block to capture local details that the Transformer backbone might overlook. The majority of the computations and parameters remain within the Transformer architecture, resulting in significantly improved performance with minimal overhead. (5) An innovative Multi-Overlapped-Head Self-Attention mechanism is introduced to enhance information exchange between heads in the Multi-Head Self-Attention mechanism of Vision Transformers. By overlapping adjacent heads during self-attention computation, information can flow between heads, leading to further improvements in vision recognition.


Faris El-Katri

Source Separation using Sparse Bayesian Learning

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Thesis Defense

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
James Stiles


Abstract

Wireless communication in recent decades has allowed for a substantial increase in both the speed and capacity of information which may be transmitted over large distances. However, given the expanding societal needs coupled with a finite available spectrum, the question arises of how to increase the efficiency by which information may be transmitted. One natural answer to this question lies in spectrum sharing—that is, in allowing multiple noncooperative agents to inhabit the same spectrum bands. In order to achieve this, we must be able to reliably separate the desired signals from those of other agents in the background. However, since our agents are noncooperative, we must develop a model-agnostic approach at tackling this problem. For this work, we will consider cohabitation between radar signals and communication signals, with the former being the desired signal and the latter being the noncooperative agent. In order to approach such problems involving highly underdetermined linear systems, we propose utilizing Sparse Bayesian Learning and present our results on selected problems. 


Koyel Pramanick

Detect Evidence of Compiler Triggered Security Measures in Binary Code

When & Where:


Eaton Hall, Room 2001B

Degree Type:

PhD Dissertation Defense

Committee Members:

Prasad Kulkami, Chair
Drew Davidson
Fengjun Li
Bo Luo
John Symons

Abstract

The primary goal of this thesis is to develop and explore techniques to identify security measures added by compilers in software binaries. These measures, added automatically during the build process, include runtime security checks like stack canaries, AddressSanitizer (ASan), and Control Flow Integrity (CFI), which help protect against memory errors, buffer overflows, and control flow attacks. This work also investigates how unresolved compiler warnings, especially those related to security, can be identified in binaries when the source code is unavailable. By studying the patterns and markers left by these compiler features, this thesis provides methods to analyze and verify the security provisions embedded in software binaries. These efforts aim to bridge the gap between compile-time diagnostics and binary-level analysis, offering a way to better understand the security protections applied during software compilation. Ultimately, this work seeks to make software more transparent and give users the tools to independently assess the security measures present in compiled software, fostering greater trust and accountability in software systems.


Srinitha Kale

AUTOMATING SYMBOL RECOGNITION IN SPOT IT: ADVANCING AI-POWERED DETECTION

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Esam El-Araby
Prasad Kulkarni


Abstract

The "Spot It!" game, featuring 55 cards each with 8 unique symbols, presents a complex challenge of identifying a single matching symbol between any two cards. Addressing this challenge, machine learning has been employed to automate symbol recognition, enhancing gameplay and extending applications into areas like pattern recognition and visual search. Due to the scarcity of available datasets, a comprehensive collection of 57 distinct Spot It symbols was created, with each class consisting of 1,800 augmented images. These images were manipulated through techniques such as scaling, rotation, and resizing to represent various visual scenarios. Then developed a convolutional neural network (CNN) with five convolutional layers, batch normalization, and dropout layers, and employed the Adam optimizer to train model to accurately recognize these symbols. The robust dataset included over 102,600 images, each subject to extensive augmentation to improve the model's ability to generalize across different orientation and scaling conditions. 

The model was evaluated using 55 scanned "Spot It!" cards, where symbols were extracted and preprocessed for prediction. It achieved high accuracy in symbol identification, demonstrating significant resilience to common challenges such as rotations and scaling. This project illustrates the effective integration of data augmentation, deep learning, and computer vision techniques in tackling complex pattern recognition tasks, proving that artificial intelligence can significantly enhance traditional gaming experiences and create new opportunities in various fields. This project delves into the design, implementation, and testing of the CNN, providing a detailed analysis of its performance and highlighting its potential as a transformative tool in image recognition and categorization.


Sudha Chandrika Yadlapalli

BERT-Driven Sentiment Analysis: Automated Course Feedback Classification and Ratings

When & Where:


Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Hongyang Sun


Abstract

Automating the analysis of unstructured textual data, such as student course feedback, is crucial for gaining actionable insights. This project focuses on developing a sentiment analysis system leveraging the DeBERTa-v3-base model, a variant of BERT (Bidirectional Encoder Representations from Transformers), to classify feedback sentiments and generate corresponding ratings on a 1-to-5 scale.

A dataset of 100,000+ student reviews was preprocessed and fine-tuned on the model to handle class imbalances and capture contextual nuances. Training was conducted on high-performance A100 GPUs, which enhanced computational efficiency and reduced training times significantly. The trained BERT sentiment model demonstrated superior performance compared to traditional machine learning models, achieving ~82% accuracy in sentiment classification.

The model was seamlessly integrated into a functional web application, providing a streamlined approach to evaluate and visualize course reviews dynamically. Key features include a course ratings dashboard, allowing students to view aggregated ratings for each course, and a review submission functionality where new feedback is analyzed for sentiment in real-time. For the department, an admin page provides secure access to detailed analytics, such as the distribution of positive and negative reviews, visualized trends, and the access to view individual course reviews with their corresponding sentiment scores.

This project includes a comprehensive pipeline, starting from data preprocessing and model training to deploying an end-to-end application. Traditional machine learning models, such as Logistic Regression and Decision Tree, were initially tested but yielded suboptimal results. The adoption of BERT, trained on a large dataset of 100k reviews, significantly improved performance, showcasing the benefits of advanced transformer-based models for sentiment analysis tasks.


Rizwan Khan

Fatigue crack segmentation of steel bridges using deep learning models - a comparative study.

When & Where:


Learned Hall, Room 3131

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Hyongyang Sun



Abstract

Structural health monitoring (SHM) is crucial for maintaining the safety and durability of infrastructure. To address the limitations of traditional inspection methods, this study leverages cutting-edge deep learning-based segmentation models for autonomous crack identification. Specifically, we utilized the recently launched YOLOv11 model, alongside the established DeepLabv3+ model for crack segmentation. Mask R-CNN, a widely recognized model in crack segmentation studies, is used as the baseline approach for comparison. Our approach integrates the CREC cropping strategy to optimize dataset preparation and employs post-processing techniques, such as dilation and erosion, to refine segmentation results. Experimental results demonstrate that our method—combining state-of-the-art models, innovative data preparation strategies, and targeted post-processing—achieves superior mean Intersection-over-Union (mIoU) performance compared to the baseline, showcasing its potential for precise and efficient crack detection in SHM systems.


Zhaohui Wang

Enhancing Security and Privacy of IoT Systems: Uncovering and Resolving Cross-App Threats

When & Where:


Nichols Hall, Room 250 (Gemini Room)

Degree Type:

PhD Comprehensive Defense

Committee Members:

Fengjun Li, Chair
Alex Bardas
Drew Davidson
Bo Luo
Haiyang Chao

Abstract

The rapid growth of Internet of Things (IoT) technology has brought unprecedented convenience to our daily lives, enabling users to customize automation rules and develop IoT apps to meet their specific needs. However, as IoT devices interact with multiple apps across various platforms, users are exposed to complex security and privacy risks. Even interactions among seemingly harmless apps can introduce unforeseen security and privacy threats.

In this work, we introduce two innovative approaches to uncover and address these concealed threats in IoT environments. The first approach investigates hidden cross-app privacy leakage risks in IoT apps. These risks arise from cross-app chains that are formed among multiple seemingly benign IoT apps. Our analysis reveals that interactions between apps can expose sensitive information such as user identity, location, tracking data, and activity patterns. We quantify these privacy leaks by assigning probability scores to evaluate the risks based on inferences. Additionally, we provide a fine-grained categorization of privacy threats to generate detailed alerts, enabling users to better understand and address specific privacy risks. To systematically detect cross-app interference threats, we propose to apply principles of logical fallacies to formalize conflicts in rule interactions. We identify and categorize cross-app interference by examining relations between events in IoT apps. We define new risk metrics for evaluating the severity of these interferences and use optimization techniques to resolve interference threats efficiently. This approach ensures comprehensive coverage of cross-app interference, offering a systematic solution compared to the ad hoc methods used in previous research.

To enhance forensic capabilities within IoT, we integrate blockchain technology to create a secure, immutable framework for digital forensics. This framework enables the identification, tracing, storage, and analysis of forensic information to detect anomalous behavior. Furthermore, we developed a large-scale, manually verified, comprehensive dataset of real-world IoT apps. This clean and diverse benchmark dataset supports the development and validation of IoT security and privacy solutions. Each of these approaches has been evaluated using our dataset of real-world apps, collectively offering valuable insights and tools for enhancing IoT security and privacy against cross-app threats.


Manu Chaudhary

Utilizing Quantum Computing for Solving Multidimensional Partial Differential Equations

When & Where:


Eaton Hall, Room 2001B

Degree Type:

PhD Comprehensive Defense

Committee Members:

Esam El-Araby, Chair
Perry Alexander
Tamzidul Hoque
Prasad Kulkarni
Tyrone Duncan

Abstract

Quantum computing has the potential to revolutionize computational problem-solving by leveraging the quantum mechanical phenomena of superposition and entanglement, which allows for processing a large amount of information simultaneously. This capability is significant in the numerical solution of complex and/or multidimensional partial differential equations (PDEs), which are fundamental to modeling various physical phenomena. There are currently many quantum techniques available for solving partial differential equations (PDEs), which are mainly based on variational quantum circuits. However, the existing quantum PDE solvers, particularly those based on variational quantum eigensolver (VQE) techniques, suffer from several limitations. These include low accuracy, high execution times, and low scalability on quantum simulators as well as on noisy intermediate-scale quantum (NISQ) devices, especially for multidimensional PDEs.

In this work, we propose an efficient and scalable algorithm for solving multidimensional PDEs. We present two variants of our algorithm: the first leverages finite-difference method (FDM), classical-to-quantum (C2Q) encoding, and numerical instantiation, while the second employs FDM, C2Q, and column-by-column decomposition (CCD). Both variants are designed to enhance accuracy and scalability while reducing execution times. We have validated and evaluated our algorithm using the multidimensional Poisson equation as a case study. Our results demonstrate higher accuracy, higher scalability, and faster execution times compared to VQE-based solvers on noise-free and noisy quantum simulators from IBM. Additionally, we validated our approach on hardware emulators and actual quantum hardware, employing noise mitigation techniques. We will also focus on extending these techniques to PDEs relevant to computational fluid dynamics and financial modeling, further bridging the gap between theoretical quantum algorithms and practical applications.


Hao Xuan

A Unified Algorithmic Framework for Biological Sequence Alignment

When & Where:


Nichols Hall, Room 250 (Gemini Room)

Degree Type:

PhD Comprehensive Defense

Committee Members:

Cuncong Zhong, Chair
Fengjun Li
Suzanne Shontz
Hongyang Sun
Liang Xu

Abstract

Sequence alignment is pivotal in both homology searches and the mapping of reads from next-generation sequencing (NGS) and third-generation sequencing (TGS) technologies. Currently, the majority of sequence alignment algorithms utilize the “seed-and-extend” paradigm, designed to filter out unrelated or nonhomologous sequences when no highly similar subregions are detected. A well-known implementation of this paradigm is BLAST, one of the most widely used multipurpose aligners. Over time, this paradigm has been optimized in various ways to suit different alignment tasks. However, while these specialized aligners often deliver high performance and efficiency, they are typically restricted to one or few alignment applications. To the best of our knowledge, no existing aligner can perform all alignment tasks while maintaining superior performance and efficiency.

In this work, we introduce a unified sequence alignment framework to address this limitation. Our alignment framework is built on the seed-and-extend paradigm but incorporates novel designs in its seeding and indexing components to maximize both flexibility and efficiency. The resulting software, the Versatile Alignment Toolkit (VAT), allows the users to switch seamlessly between nearly all major alignment tasks through command-line parameter configuration. VAT was rigorously benchmarked against leading aligners for DNA and protein homolog searches, NGS and TGS read mapping, and whole-genome alignment. The results demonstrated VAT’s top-tier performance across all benchmarks, underscoring the feasibility of using a unified algorithmic framework to handle diverse alignment tasks. VAT can simplify and standardize bioinformatic analysis workflows that involve multiple alignment tasks.