I2S Masters/ Doctoral Theses

All students and faculty are welcome to attend the final defense of I2S graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Shahima Kalluvettu Kuzhikkal

Machine Learning Based Predictive Maintenance for Automotive Systems

When & Where:

Wed, 10/29/2025 - 15:15
Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Rachel Jarvis
Prasad Kulkarni
Hongyang Sun

Abstract

Predictive maintenance plays a central role in reducing vehicle downtime and improving operational efficiency by using data-driven methods to classify the condition of automotive engines. Rather than relying on fixed service schedules or reacting to unexpected breakdowns, this approach leverages machine learning to distinguish between healthy and failed engines based on operational data.

In this project, engine telemetry data capturing key parameters such as engine speed, fuel pressure, and coolant temperature was used to train and evaluate several machine learning models, including logistic regression, random forest, k-nearest neighbors, and a neural network. To further enhance predictive performance, ensemble strategies such as soft voting and stacking were applied. The stacking ensemble, which combines the strengths of multiple classifiers through a meta-learning approach, demonstrated particularly effective results.

This classification-based framework demonstrates how data-driven fault detection can enhance automotive maintenance operations. By identifying engine failures more reliably, machine learning enables safer transportation, reduces maintenance costs, and enhances overall vehicle dependability. Beyond individual vehicles, such approaches have broader applications in fleet management, where proactive decision-making can improve service continuity, reduce operational risks, and increase customer satisfaction.

Jennifer Quirk

Aspects of Doppler-Tolerant Radar Waveforms

When & Where:

Fri, 10/24/2025 - 13:00
Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

PhD Comprehensive Defense

Committee Members:

Shannon Blunt, Chair
Patrick McCormick
Charles Mohr
James Stiles
Zsolt Talata

Abstract

The Doppler tolerance of a waveform refers to its behavior when subjected to a fast-time Doppler shift imposed by scattering that involves nonnegligible radial velocity. While previous efforts have established decision-based criteria that lead to a binary judgment of Doppler tolerant or intolerant, it is also useful to establish a measure of the degree of Doppler tolerance. The purpose in doing so is to establish a consistent standard, thereby permitting assessment across different parameterizations, as well as introducing a Doppler “quasi-tolerant” trade-space that can ultimately inform automated/cognitive waveform design in increasingly complex and dynamic radio frequency (RF) environments.

Separately, the application of slow-time coding (STC) to the Doppler-tolerant linear FM (LFM) waveform has been examined for disambiguation of multiple range ambiguities. However, using STC with non-adaptive Doppler processing often results in high Doppler “cross-ambiguity” side lobes that can hinder range disambiguation despite the degree of separability imparted by STC. To enhance this separability, a gradient-based optimization of STC sequences is developed, and a “multi-range” (MR) modification to the reiterative super-resolution (RISR) approach that accounts for the distinct range interval structures from STC is examined. The efficacy of these approaches is demonstrated using open-air measurements.

The proposed work to appear in the final dissertation focuses on the connection between Doppler tolerance and STC. The first proposal includes the development of a gradient-based optimization procedure to generate Doppler quasi-tolerant random FM (RFM) waveforms. Other proposals consider limitations of STC, particularly when processed with MR-RISR. The final proposal introduces an “intrapulse” modification of the STC/LFM structure to achieve enhanced sup pression of range-folded scattering in certain delay/Doppler regions while retaining a degree of Doppler tolerance.

Mary Jeevana Pudota

Assessing Processor Allocation Strategies for Online List Scheduling of Moldable Task Graphs

When & Where:

Mon, 10/20/2025 - 12:00
Eaton Hall, Room 2001B

Degree Type:

MS Thesis Defense

Committee Members:

Hongyang Sun, Chair
David Johnson
Prasad Kulkarni

Abstract

Scheduling a graph of moldable tasks, where each task can be executed by a varying number of processors with execution time depending on the processor allocation, represents a fundamental problem in high-performance computing (HPC). The online version of the scheduling problem introduces an additional constraint: each task is only discovered when all its predecessors have been completed. A key challenge for this online problem lies in making processor allocation decisions without complete knowledge of the future tasks or dependencies. This uncertainty can lead to inefficient resource utilization and increased overall completion time, or makespan. Recent studies have provided theoretical analysis (i.e., derived competitive ratios) for certain processor allocation algorithms. However, the algorithms’ practical performance remains under-explored, and their reliance on fixed parameter settings may not consistently yield optimal performance across varying workloads. In this thesis, we conduct a comprehensive evaluation of three processor allocation strategies by empirically assessing their performance under widely used speedup models and diverse graph structures. These algorithms are integrated into a List scheduling framework that greedily schedules ready tasks based on the current processor availability. We perform systematic tuning of the algorithms’ parameters and report the best observed makespan together with the corresponding parameter settings. Our findings highlight the critical role of parameter tuning in obtaining optimal makespan performance, regardless of the differences in allocation strategies. The insights gained in this study can guide the deployment of these algorithms in practical runtime systems.

Past Defense Notices

Aidan Schmelzle

Exploration of Human Design with Genetic Algorithms as Artistic Medium for Color Images

When & Where:

Wed, 08/27/2025 - 13:00
Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

Arvin Agah, Chair
David Johnson
Jennifer Lohoefener

Abstract

Genetic Algorithms (GAs), a subclass of evolutionary algorithms, seek to apply the concept of natural selection to promote the optimization and furtherance of “something” designated by the user. GAs generate a population of chromosomes represented as value strings, score each chromosome with a “fitness function” on a defined set of criteria, and mutate future generations depending on the scores ascribed to each chromosome. In this project, each chromosome is a bitstring representing one canvased color artwork. Artworks are scored with a variety of design fundamentals and user preference. The artworks are then evolved through thousands of generations and the final piece is computationally drawn for analysis. While the rise of gradient-based optimization has resulted in more limited use-cases of GAs, genetic algorithms still have applications in various settings such as hyperparameter tuning, mathematical optimization, reinforcement learning, and black box scenarios. Neural networks are favored presently in image generation due to their pattern recognition and ability to produce new content; however, in cases where a user is seeking to implement their own vision through careful algorithmic refinement, genetic algorithms still find a place in visual computing.

Andrew Riachi

An Investigation Into The Memory Consumption of Web Browsers and A Memory Profiling Tool Using Linux Smaps

When & Where:

Mon, 08/11/2025 - 11:00
Nichols Hall, Room 250 (Gemini Room)

Degree Type:

MS Thesis Defense

Committee Members:

Prasad Kulkami, Chair
Perry Alexander
Drew Davidson
Heechul Yun

Abstract

Web browsers are notorious for consuming large amounts of memory. Yet, they have become the dominant framework for writing GUIs because the web languages are ergonomic for programmers and have a cross-platform reach. These benefits are so enticing that even a large portion of mobile apps, which have to run on resource-constrained devices, are running a web browser under the hood. Therefore, it is important to keep the memory consumption of web browsers as low as practicable.

In this thesis, we investigate the memory consumption of web browsers, in particular, compared to applications written in native GUI frameworks. We introduce smaps-profiler, a tool to profile the overall memory consumption of Linux applications that can report memory usage other profilers simply do not measure. Using this tool, we conduct experiments which suggest that most of the extra memory usage compared to native applications could be due the size of the web browser program itself. We discuss our experiments and findings, and conclude that even more rigorous studies are needed to profile GUI applications.

Elizabeth Wyss

A New Frontier for Software Security: Diving Deep into npm

When & Where:

Wed, 07/23/2025 - 09:30
Eaton Hall, Room 2001B

Degree Type:

PhD Dissertation Defense

Committee Members:

Drew Davidson, Chair
Alex Bardas
Fengjun Li
Bo Luo
J. Walker

Abstract

Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week.

However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the end-users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.

This research provides a deep dive into the npm-centric software supply chain, exploring distinctive phenomena that impact its overall security and usability. Such factors include (i) hidden code clones--which may stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts of open-source development practices, (v) package compromise via malicious updates, (vi) spammers disseminating phishing links within package metadata, and (vii) abuse of cryptocurrency protocols designed to reward the creators of high-impact packages. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains.

Alfred Fontes

Optimization and Trade-Space Analysis of Pulsed Radar-Communication Waveforms using Constant Envelope Modulations

When & Where:

Tue, 07/22/2025 - 14:00
Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

MS Thesis Defense

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jonathan Owen

Abstract

Dual function radar communications (DFRC) is a method of co-designing a single radio frequency system to perform simultaneous radar and communications service. DFRC is ultimately a compromise between radar sensing performance and communications data throughput due to the conflicting requirements between the sensing and information-bearing signals.

A novel waveform-based DFRC approach is phase attached radar communications (PARC), where a communications signal is embedded onto a radar pulse via the phase modulation between the two signals. The PARC framework is used here in a new waveform design technique that designs the radar component of a PARC signal to match the PARC DFRC waveform expected power spectral density (PSD) to a desired spectral template. This provides better control over the PARC signal spectrum, which mitigates the issue of PARC radar performance degradation from spectral growth due to the communications signal.

The characteristics of optimized PARC waveforms are then analyzed to establish a trade-space between radar and communications performance within a PARC DFRC scenario. This is done by sampling the DFRC trade-space continuum with waveforms that contain a varying degree of communications bandwidth, from a pure radar waveform (no embedded communications) to a pure communications waveform (no radar component). Radar performance, which is degraded by range sidelobe modulation (RSM) from the communications signal randomness, is measured from the PARC signal variance across pulses; data throughput is established as the communications performance metric. Comparing the values of these two measures as a function of communications symbol rate explores the trade-offs in performance between radar and communications with optimized PARC waveforms.

Qua Nguyen

Hybrid Array and Privacy-Preserving Signaling Optimization for NextG Wireless Communications

When & Where:

Tue, 07/22/2025 - 11:00
Zoom (ID: 87142881713 Passcode: 135902)

Degree Type:

PhD Dissertation Defense

Committee Members:

Erik Perrins, Chair
Morteza Hashemi
Zijun Yao
Taejoon Kim
KC Long

Abstract

This PhD research tackles two critical challenges in NextG wireless networks: hybrid precoder design for wideband sub-Terahertz (sub-THz) massive multiple-input multiple-output (MIMO) communications and privacy-preserving federated learning (FL) over wireless networks.

In the first part, we propose a novel hybrid precoding framework that integrates true-time delay (TTD) devices and phase shifters (PS) to counteract the beam squint effect - a significant challenge in the wideband sub-THz massive MIMO systems that leads to considerable loss in array gain. Unlike previous methods that only designed TTD values while fixed PS values and assuming unbounded time delay values, our approach jointly optimizes TTD and PS values under realistic time delays constraint. We determine the minimum number of TTD devices required to achieve a target array gain using our proposed approach. Then, we extend the framework to multi-user wideband systems and formulate a hybrid array optimization problem aiming to maximize the minimum data rate across users. This problem is decomposed into two sub-problems: fair subarray allocation, solved via continuous domain relaxation, and subarray gain maximization, addressed via a phase-domain transformation.

The second part focuses on preserving privacy in FL over wireless networks. First, we design a differentially-private FL algorithm that applies time-varying noise variance perturbation. Taking advantage of existing wireless channel noise, we jointly design differential privacy (DP) noise variances and users transmit power to resolve the tradeoffs between privacy and learning utility. Next, we tackle two critical challenges within FL networks: (i) privacy risks arising from model updates and (ii) reduced learning utility due to quantization heterogeneity. Prior work typically addresses only one of these challenges because maintaining learning utility under both privacy risks and quantization heterogeneity is a non-trivial task. We approach to improve the learning utility of a privacy-preserving FL that allows clusters of devices with different quantization resolutions to participate in each FL round. Specifically, we introduce a novel stochastic quantizer (SQ) that ensures a DP guarantee and minimal quantization distortion. To address quantization heterogeneity, we introduce a cluster size optimization technique combined with a linear fusion approach to enhance model aggregation accuracy. Lastly, inspired by the information-theoretic rate-distortion framework, a privacy-distortion tradeoff problem is formulated to minimize privacy loss under a given maximum allowable quantization distortion. The optimal solution to this problem is identified, revealing that the privacy loss decreases as the maximum allowable quantization distortion increases, and vice versa.

This research advances hybrid array optimization for wideband sub-THz massive MIMO and introduces novel algorithms for privacy-preserving quantized FL with diverse precision. These contributions enable high-throughput wideband MIMO communication systems and privacy-preserving AI-native designs, aligning with the performance and privacy protection demands of NextG networks.

Arin Dutta

Performance Analysis of Distributed Raman Amplification with Different Pumping Configurations

When & Where:

Mon, 07/21/2025 - 10:00
Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

PhD Dissertation Defense

Committee Members:

Rongqing Hui, Chair
Morteza Hashemi
Rachel Jarvis
Alessandro Saladrino
Hui Zhao

Abstract

As internet services like high-definition videos, cloud computing, and artificial intelligence keep growing, optical networks need to keep up with the demand for more capacity. Optical amplifiers play a crucial role in offsetting fiber loss and enabling long-distance wavelength division multiplexing (WDM) transmission in high-capacity systems. Various methods have been proposed to enhance the capacity and reach of fiber communication systems, including advanced modulation formats, dense wavelength division multiplexing (DWDM) over ultra-wide bands, space-division multiplexing, and high-performance digital signal processing (DSP) technologies. To maintain higher data rates along with maximizing the spectral efficiency of multi-level modulated signals, a higher Optical Signal-to-Noise Ratio (OSNR) is necessary. Despite advancements in coherent optical communication systems, the spectral efficiency of multi-level modulated signals is ultimately constrained by fiber nonlinearity. Raman amplification is an attractive solution for wide-band amplification with low noise figures in multi-band systems.

Distributed Raman Amplification (DRA) have been deployed in recent high-capacity transmission experiments to achieve a relatively flat signal power distribution along the optical path and offers the unique advantage of using conventional low-loss silica fibers as the gain medium, effectively transforming passive optical fibers into active or amplifying waveguides. Also, DRA provides gain at any wavelength by selecting the appropriate pump wavelength, enabling operation in signal bands outside the Erbium doped fiber amplifier (EDFA) bands. Forward (FW) Raman pumping configuration in DRA can be adopted to further improve the DRA performance as it is more efficient in OSNR improvement because the optical noise is generated near the beginning of the fiber span and attenuated along the fiber. Dual-order FW pumping scheme helps to reduce the non-linear effect of the optical signal and improves OSNR by more uniformly distributing the Raman gain along the transmission span.

The major concern with Forward Distributed Raman Amplification (FW DRA) is the fluctuation in pump power, known as relative intensity noise (RIN), which transfers from the pump laser to both the intensity and phase of the transmitted optical signal as they propagate in the same direction. Additionally, another concern of FW DRA is the rise in signal optical power near the start of the fiber span, leading to an increase in the non-linear phase shift of the signal. These factors, including RIN transfer-induced noise and non-linear noise, contribute to the degradation of system performance in FW DRA systems at the receiver.

As the performance of DRA with backward pumping is well understood with relatively low impact of RIN transfer, our research is focused on the FW pumping configuration, and is intended to provide a comprehensive analysis on the system performance impact of dual order FW Raman pumping, including signal intensity and phase noise induced by the RINs of both 1st and the 2nd order pump lasers, as well as the impacts of linear and nonlinear noise. The efficiencies of pump RIN to signal intensity and phase noise transfer are theoretically analyzed and experimentally verified by applying a shallow intensity modulation to the pump laser to mimic the RIN. The results indicate that the efficiency of the 2nd order pump RIN to signal phase noise transfer can be more than 2 orders of magnitude higher than that from the 1st order pump. Then the performance of the dual order FW Raman configurations is compared with that of single order Raman pumping to understand trade-offs of system parameters. The nonlinear interference (NLI) noise is analyzed to study the overall OSNR improvement when employing a 2nd order Raman pump. Finally, a DWDM system with 16-QAM modulation is used as an example to investigate the benefit of DRA with dual order Raman pumping and with different pump RIN levels. We also consider a DRA system using a 1st order incoherent pump together with a 2nd order coherent pump. Although dual order FW pumping corresponds to a slight increase of linear amplified spontaneous emission (ASE) compared to using only a 1st order pump, its major advantage comes from the reduction of nonlinear interference noise in a DWDM system. Because the RIN of the 2nd order pump has much higher impact than that of the 1st order pump, there should be more stringent requirement on the RIN of the 2nd order pump laser when dual order FW pumping scheme is used for DRA for efficient fiber-optic communication. Also, the result of system performance analysis reveals that higher baud rate systems, like those operating at 100Gbaud, are less affected by pump laser RIN due to the low-pass characteristics of the transfer of pump RIN to signal phase noise.

Audrey Mockenhaupt

Using Dual Function Radar Communication Waveforms for Synthetic Aperture Radar Automatic Target Recognition

When & Where:

Fri, 07/18/2025 - 14:30
Nichols Hall, Room 246 (Executive Conference Room)

Degree Type:

MS Thesis Defense

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jonathan Owen

Abstract

As machine learning (ML), artificial intelligence (AI), and deep learning continue to advance, their applications become more diverse – one such application is synthetic aperture radar (SAR) automatic target recognition (ATR). These SAR ATR networks use different forms of deep learning such as convolutional neural networks (CNN) to classify targets in SAR imagery. An emerging research area of SAR is dual function radar communication (DFRC) which performs both radar and communications functions using a single co-designed modulation. The utilization of DFRC emissions for SAR imaging impacts image quality, thereby influencing SAR ATR network training. Here, using the Civilian Vehicle Data Dome dataset from the AFRL, SAR ATR networks are trained and evaluated with simulated data generated using Gaussian Minimum Shift Keying (GMSK) and Linear Frequency Modulation (LFM) waveforms. The networks are used to compare how the target classification accuracy of the ATR network differ between DFRC (i.e., GMSK) and baseline (i.e., LFM) emissions. Furthermore, as is common in pulse-agile transmission structures, an effect known as ’range sidelobe modulation’ is examined, along with its impact on SAR ATR. Finally, it is shown that SAR ATR network can be trained for GMSK emissions using existing LFM datasets via two types of data augmentation.

Rich Simeon

Delay-Doppler Channel Estimation for High-Speed Aeronautical Mobile Telemetry Applications

When & Where:

Tue, 07/15/2025 - 11:30
Eaton Hall, Room 2001B

Degree Type:

PhD Comprehensive Defense

Committee Members:

Erik Perrins, Chair
Shannon Blunt
Morteza Hashemi
James Stiles
Craig McLaughlin

Abstract

The next generation of digital communications systems aims to operate in high-Doppler environments such as high-speed trains and non-terrestrial networks that utilize satellites in low-Earth orbit. Current generation systems use Orthogonal Frequency Division Multiplexing modulation which is known to suffer from inter-carrier interference (ICI) when different channel paths have dissimilar Doppler shifts.

A new Orthogonal Time Frequency Space (OTFS) modulation (also known as Delay-Doppler modulation) is proposed as a candidate modulation for 6G networks that is resilient to ICI. To date, OTFS demodulation designs have focused on the use cases of popular urban terrestrial channel models where path delay spread is a fraction of the OTFS symbol duration. However, wireless wide-area networks that operate in the aeronautical mobile telemetry (AMT) space can have large path delay spreads due to reflections from distant geographic features. This presents problems for existing channel estimation techniques which assume a small maximum expected channel delay, since data transmission is paused to sound the channel by an amount equal to twice the maximum channel delay. The dropout in data contributes to a reduction in spectral efficiency.

Our research addresses OTFS limitations in the AMT use case. We start with an exemplary OTFS framework with parameters optimized for AMT. Following system design, we focus on two distinct areas to improve OTFS performance in the AMT environment. First we propose a new channel estimation technique using a pilot signal superimposed over data that can measure large delay spread channels with no penalty in spectral efficiency. A successive interference cancellation algorithm is used to iteratively improve channel estimates and jointly decode data. A second aspect of our research aims to equalize in delay-Doppler space. In the delay-Doppler paradigm, the rapid channel variations seen in the time-frequency domain is transformed into a sparse quasi-stationary channel in the delay-Doppler domain. We propose to use machine learning using Gaussian Process Regression to take advantage of the sparse and stationary channel and learn the channel parameters to compensate for the effects of fractional Doppler in which simpler channel estimation techniques cannot mitigate. Both areas of research can advance the robustness of OTFS across all communications systems.

Mohammad Ful Hossain Seikh

AAFIYA: Antenna Analysis in Frequency-domain for Impedance and Yield Assessment

When & Where:

Thu, 07/10/2025 - 10:00
Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

Jim Stiles, Chair
Rachel Jarvis
Alessandro Salandrino

Abstract

This project presents AAFIYA (Antenna Analysis in Frequency-domain for Impedance and Yield Assessment), a modular Python toolkit developed to automate and streamline the characterization and analysis of radiofrequency (RF) antennas using both measurement and simulation data. Motivated by the need for reproducible, flexible, and publication-ready workflows in modern antenna research, AAFIYA provides comprehensive support for all major antenna metrics, including S-parameters, impedance, gain and beam patterns, polarization purity, and calibration-based yield estimation. The toolkit features robust data ingestion from standard formats (such as Touchstone files and beam pattern text files), vectorized computation of RF metrics, and high-quality plotting utilities suitable for scientific publication.

Validation was carried out using measurements from industry-standard electromagnetic anechoic chamber setups involving both Log Periodic Dipole Array (LPDA) reference antennas and Askaryan Radio Array (ARA) Bottom Vertically Polarized (BVPol) antennas, covering a frequency range of 50–1500 MHz. Key performance metrics, such as broadband impedance matching, S11 and S21 related calculations, 3D realized gain patterns, vector effective lengths, and cross-polarization ratio, were extracted and compared against full-wave electromagnetic simulations (using HFSS and WIPL-D). The results demonstrate close agreement between measurement and simulation, confirming the reliability of the workflow and calibration methodology.

AAFIYA’s open-source, extensible design enables rapid adaptation to new experiments and provides a foundation for future integration with machine learning and evolutionary optimization algorithms. This work not only delivers a validated toolkit for antenna research and pedagogy but also sets the stage for next-generation approaches in automated antenna design, optimization, and performance analysis.

Soumya Baddham

Battling Toxicity: A Comparative Analysis of Machine Learning Models for Content Moderation

When & Where:

Mon, 07/07/2025 - 14:00
Eaton Hall, Room 2001B

Degree Type:

MS Project Defense

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Hongyang Sun

Abstract

With the exponential growth of user-generated content, online platforms face unprecedented challenges in moderating toxic and harmful comments. Due to this, Automated content moderation has emerged as a critical application of machine learning, enabling platforms to ensure user safety and maintain community standards. Despite its importance, challenges such as severe class imbalance, contextual ambiguity, and the diverse nature of toxic language often compromise moderation accuracy, leading to biased classification performance.

This project presents a comparative analysis of machine learning approaches for a Multi-Label Toxic Comment Classification System using the Toxic Comment Classification dataset from Kaggle. The study examines the performance of traditional algorithms, such as Logistic Regression, Random Forest, and XGBoost, alongside deep architectures, including Bi-LSTM, CNN-Bi-LSTM, and DistilBERT. The proposed approach utilizes word-level embeddings across all models and examines the effects of architectural enhancements, hyperparameter optimization, and advanced training strategies on model robustness and predictive accuracy.

The study emphasizes the significance of loss function optimization and threshold adjustment strategies in improving the detection of minority classes. The comparative results reveal distinct performance trade-offs across model architectures, with transformer models achieving superior contextual understanding at the cost of computational complexity. At the same time, deep learning approaches(LSTM models) offer efficiency advantages. These findings establish evidence-based guidelines for model selection in real-world content moderation systems, striking a balance between accuracy requirements and operational constraints.