ICTACT Journal on Image and Video Processing

ICTACT Journal on Image and Video Processing https://i-scholar.in/index.php/IJIVP ICTACT Journal on Image and Video Processing (IJIVP) is a peer – reviewed International Journal published quarterly. IJIVP welcomes Scientists, Researchers, Academicians and Engineers to submit their original research papers which is neither published nor currently under review by other journals or conferences. Papers should emphasize original results relating to both theoretical and application issues of Image and Video Processing. Review articles, focusing on multi disciplinary views, are also welcome. ICT Academy of Tamil Nadu en-US ICTACT Journal on Image and Video Processing 0976-9099 Hybrid Transformer-CNN Models for Enhanced Autism Spectrum Disorder Classification using Clinical and Neuroimaging Data https://i-scholar.in/index.php/IJIVP/article/view/226637 Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by a highly heterogeneous presentation, posing significant challenges for early diagnosis. The subtle differences between ASD and non-ASD individuals, especially during early developmental stages, make accurate classification difficult. However, early detection plays a crucial role in improving developmental outcomes through timely intervention, enabling affected children and families to access specialized therapies and support systems. This study explores the potential of using clinical data combined with deep learning techniques for automated ASD classification. We evaluated various deep learning models, including 3D CNN ResNet50, sequential CNN, 2D CNN combined with XGBoost, 2D CNN ResNet101, and Transformer-based architectures like the standard Transformer and Swin Transformer integrated with CNN. The incorporation of clinical parameters alongside neuroimaging features facilitated more nuanced pattern recognition associated with ASD. Conventional CNN models yielded moderate classification accuracy, ranging from 60% to 78%. Transformer-based models demonstrated superior performance, with Swin Transformer achieving the accuracy of 75%, highlighting their importance in capturing intricate patterns and relationships in the data. The Swin Transformer, or "Shifted Window Transformer," is a type of Vision Transformer (ViT) architecture designed for computer vision tasks. It introduces a hierarchical structure with multi-scale feature representation, making it more efficient for image recognition tasks compared to traditional ViTs. The results show that hybrid models, specifically the Hybrid CNN+Swin Transformer, outperform both traditional CNN architectures and pure transformer-based methods, achieving the maximum classification accuracy at 80%. This implies that a more thorough method of identifying ASD-related patterns in brain imaging data can be achieved by fusing the global contextual understanding of the Swin Transformer with CNN's spatial feature extraction capabilities. These findings underscore the potential of using Transformer-based architectures in ASD classification, leveraging clinical data to improve precision in early detection. This research provides a foundation for future investigations into hybrid approaches that integrate multiple data sources, advancing automated diagnostic systems for neurodevelopmental disorders. Sanju S Anand Shashidhar Kini 2024-11-01 2024-11-01 15 Optimizing Skin Lesion Classification with Confusion-Aware Loss Functions https://i-scholar.in/index.php/IJIVP/article/view/226636 Early diagnosis of skin cancer is critical to treatment and saving patients’ lives, many studies have used Convolutional Neural Networks (CNNs) to achieve this goal. Traditional methods using the Cross Entropy (CE) loss function, however, often struggle with classes that are easily confused, such as Nevus and Melanoma, leading to reduced diagnostic accuracy. To address this, we propose the Confusion-aware Cross Entropy (CCE) loss function, which enhances classification performance by focusing on these easily confused classes. Our method computes the mean of the negative class logits to identify these classes, ensuring the loss calculation prioritizes their accurate classification. Experiments conducted on the publicly available HAM10000 dataset using ResNet50, EfficientNet-B4, Inception-V3, and DenseNet121 demonstrate that our approach significantly outperforms the traditional CE loss function, achieving higher Accuracy, Sensitivity, and Precision. These results underscore the potential of the CCE loss function to improve clinical outcomes by providing more reliable skin lesion classifications. Qichen Su Haza Nuzly Abdull Hamed 2024-11-01 2024-11-01 15 De-Noising Paddy Seed Images by Noisenixie Rejuvenation Filter: A Novel Preprocessing Algorithm for Enhanced Image Quality https://i-scholar.in/index.php/IJIVP/article/view/226635 The digital age thrives on image processing, a technology critical for healthcare and security. This paper proposes a robust approach to improve image quality and empower further analysis through innovative preprocessing techniques. Our approach attempts to implement image data systematically, ensuring it's ready for advanced processing. Standardization with Bicubic Interpolation: Input images are resized to a uniform dimension using Bicubic Interpolation. This ensures compatibility within datasets, regardless of their original sizes, while preserving the image's proportions. Separating Brightness for Sharper Analysis: Images are converted from RGB to YCbCr color space. This separates the image data into brightness (luma) and color (chrominance) components. Focusing on the bright information is crucial for noise reduction and edge detection. Enhanced Clarity with NoiseNixie Rejuvenation Filter: Our novel NoiseNixie Rejuvenation Filter (NNRF) tackles noise, a standard image quality hurdle. This filter incorporates noise variation and light correction adjustments, resulting in sharper and clearer images. Fast Fourier Transform for Refined Processing: The Fast Fourier Transform (FFT) converts image data into the frequency domain. This transformation unveils hidden patterns within the image and allows for precise adjustments. The data is then converted back using the Inverse FFT, preparing the image for in-depth analysis. By implementing these techniques, our preprocessing pipeline empowers researchers and practitioners to unlock valuable insights from image data. This comprehensive approach paves the way for advancements in image processing across various applications. From medical imaging to autonomous vehicles, high-quality image analysis is essential, and this method provides a robust foundation for achieving that goal. R. Subha Sree S. Karthikeyan 2024-11-01 2024-11-01 15 Image and Video Retrieval and Authentication Using AI-Driven Techniques for Secure Media Management https://i-scholar.in/index.php/IJIVP/article/view/226634 The proliferation of digital media has led to an increased need for secure and efficient systems for image and video retrieval and authentication. Traditional approaches often struggle with scalability and vulnerability to tampering, compromising the integrity of media management systems. The rise of artificial intelligence and deep neural networks (DNNs) offers transformative potential to address these challenges. By leveraging DNNs, this study proposes an advanced framework for secure media management, integrating robust retrieval and authentication mechanisms. The method employs a convolutional neural network (CNN)-based encoder-decoder architecture to extract and match high-dimensional features for image and video retrieval. For authentication, a blockchain-backed hash validation ensures the originality and integrity of media assets. The system is trained and evaluated on benchmark datasets, such as MS-COCO and UCF101, with augmentation techniques enhancing its adaptability across diverse media formats and resolutions. Key performance metrics include retrieval accuracy, processing time, and authentication robustness. Experimental results show a retrieval accuracy of 96.8%, with a mean processing time of 0.85 seconds per query. Authentication robustness achieves a 99.2% success rate in detecting altered media, significantly outperforming existing systems. The proposed framework ensures both scalability and security, offering an innovative solution for media management in domains such as journalism, legal evidence management, and social media platforms. Aparajita Dixit Suresh Kumar Sharma Mamta Dhaka Nisha Jain 2024-11-01 2024-11-01 15 Image Pattern Recognition with an Improvised Deep Learning Regression Technique https://i-scholar.in/index.php/IJIVP/article/view/226633 Advancements in image pattern recognition have revolutionized diverse domains such as healthcare, autonomous systems, and security. Despite these advancements, existing deep learning techniques often encounter challenges in achieving high accuracy, particularly when handling complex image datasets with significant noise or variations. The need for an enhanced approach that balances computational efficiency with superior predictive performance has become critical. This study introduces an Improvised Deep Learning Regression Technique based on InceptionNet for robust image pattern recognition. The proposed method incorporates optimized inception modules with tailored hyperparameter tuning to address limitations in feature extraction and pattern generalization. By employing an adaptive learning rate and advanced regularization mechanisms, the model achieves better performance on large-scale, heterogeneous datasets. The experimental evaluation was conducted using publicly available image datasets, including CIFAR-10 and ImageNet, to ensure comprehensive benchmarking. The results show significant improvements over existing methods. The proposed InceptionNet model achieved an accuracy of 96.5% on the CIFAR-10 dataset and a mean absolute error (MAE) reduction of 15.2% compared to traditional regression techniques. On the ImageNet dataset, the model recorded an accuracy improvement of 7.8% and reduced training time by 12%, validating its computational efficiency. The incorporation of deep inception modules contributed to precise recognition of intricate patterns and subtle variations, making the technique suitable for realtime applications. D.K. Mohanty P. Joy Kiruba N. Ragunath P. Kanagaraju Aditya Bommaraju 2024-11-01 2024-11-01 15 Multiframe Image Restoration - Enhancing Image Quality Through Advanced Reconstruction Techniques https://i-scholar.in/index.php/IJIVP/article/view/226632 The degradation of image quality due to noise, blur, and low contrast remains a significant challenge in various imaging applications, particularly in medical diagnostics, remote sensing, and surveillance. Effective restoration of such images is essential to enhance visual clarity and extract meaningful information. Conventional techniques often struggle to balance noise reduction and detail preservation. To address these limitations, this study proposes an advanced multiframe image restoration approach combining Contrast Limited Adaptive Histogram Equalization (CLAHE) and Deep Belief Networks (DBN). CLAHE is employed to enhance contrast adaptively, improving visibility in regions with varying luminance. Subsequently, DBN, a deep learning model, is applied to refine the reconstruction process by leveraging its feature extraction and noise suppression capabilities. This combination ensures that the restored images retain fine details while effectively mitigating noise and distortions. Experimental evaluation was conducted on a dataset of 500 degraded images, including medical scans and natural scenes. The proposed method achieved a Peak Signal-to-Noise Ratio (PSNR) of 36.2 dB, a Structural Similarity Index (SSIM) of 0.92, and a contrast improvement rate of 48%, surpassing traditional methods like Bilateral Filtering and Wavelet Transform. Processing time per image was maintained at an efficient 1.8 seconds, ensuring practicality for real-time applications. This novel integration of CLAHE and DBN shows significant advancements in multiframe image restoration, making it a valuable tool for applications requiring enhanced image quality. The approach combines the strengths of contrast enhancement and deep learningbased reconstruction, paving the way for improved image analysis and decision-making in critical domains. Allen Paul Esteban Prithviraj Singh Chouhan Aman Ahlawat Tarunika Dursinhbhai Chaudhari 2024-11-01 2024-11-01 15 Video Segmentation and Object Tracking using Improvised Deep Learning Algorithms https://i-scholar.in/index.php/IJIVP/article/view/226631 Video segmentation and object tracking are critical tasks in computer vision, with applications ranging from autonomous driving to surveillance and video analytics. Traditional approaches often struggle with challenges like occlusion, background clutter, and high computational costs, limiting their accuracy and efficiency in realworld scenarios. This research addresses these issues by employing improvised deep learning algorithms, specifically Convolutional Neural Networks (CNN), VGG, and AlexNet, to enhance the precision and speed of video segmentation and object tracking. The proposed method integrates feature extraction capabilities of CNN with the deeper architecture of VGG for improved feature representation and AlexNet's computational efficiency to ensure scalability. A novel multistage training process is implemented, where CNN provides initial object localization, VGG refines segmentation boundaries, and AlexNet accelerates tracking in real-time. The framework was trained and evaluated on benchmark datasets such as DAVIS and MOT17, covering diverse scenarios with varying complexities. The results show significant improvements in accuracy and speed compared to existing methods. On the DAVIS dataset, the approach achieved a segmentation accuracy of 89.7% and an Intersection over Union (IoU) score of 86.5%. For object tracking on MOT17, the system attained a MultiObject Tracking Accuracy (MOTA) of 82.3% and an average frame processing rate of 35 frames per second (FPS), outperforming baseline methods by 8.5% in accuracy and 15% in computational efficiency. The CNN, VGG, and AlexNet in a unified framework offers a robust solution for video segmentation and object tracking, demonstrating enhanced accuracy, adaptability, and real-time performance. These findings hold promise for applications in areas requiring reliable and efficient visual analysis. G. Shanmugapriya G. Pavithra M.K. Anandkumar D. Pavankumar 2024-11-01 2024-11-01 15 Holographic Video Processing with Multimedia Integration using AI and Machine Learning Algorithms https://i-scholar.in/index.php/IJIVP/article/view/226630 The rise of holographic video processing has transformed multimedia experiences by providing highly immersive and realistic visuals. However, efficiently processing these high-dimensional holographic datasets poses significant computational challenges. Current methods often struggle with latency, scalability, and maintaining quality during real-time rendering. Addressing these limitations requires the integration of advanced Artificial Intelligence (AI) and Machine Learning (ML) techniques. This research introduces a novel approach leveraging an adaptive Support Vector Machine (adaSVM) algorithm for holographic video processing, integrated with multimedia data fusion. The adaSVM dynamically adjusts its parameters based on input data complexity, ensuring robust classification and processing of holographic frames. The proposed method incorporates intelligent feature extraction, dimensionality reduction, and predictive modeling to optimize resource utilization while maintaining visual quality. Experimental evaluation using a dataset of 500 holographic video sequences shown superior performance. The adaSVM achieved an accuracy of 96.8%, a processing speed improvement of 34.2%, and a reduction in latency by 28.7% compared to traditional SVM and Convolutional Neural Network-based approaches. Additionally, the method shown enhanced scalability in handling large datasets, with consistent performance across varying resolutions and frame rates. The results underscore the potential of adaSVM in revolutionizing holographic video processing for applications in entertainment, education, and medical imaging. This integration of AI and ML represents a significant step toward efficient and scalable solutions for next-generation multimedia systems. Anand Karuppannan E. Vijayakumar P. Bhanupriya Suneel Kumar Asileti 2024-11-01 2024-11-01 15 High Resolution Radar Target Recognition Using Deep Video Processing Technique https://i-scholar.in/index.php/IJIVP/article/view/226629 Radar-based target recognition plays a crucial role in a variety of applications, such as surveillance, defense, and autonomous systems. High-resolution radar imagery, when processed effectively, can provide detailed information about objects of interest. However, due to the complex nature of radar signals and the limitations of traditional processing methods, extracting accurate and reliable target information remains challenging. Recent advancements in deep learning, particularly in the domain of image and video processing, have opened new avenues for improving radar-based target recognition. The primary challenge in radar target recognition is the effective use of high-resolution radar imagery, which often contains noise, motion blur, and other distortions. Traditional signal processing techniques struggle to handle these complexities, leading to reduced accuracy in real-world applications. Further, most existing methods are not well-equipped to handle the temporal dynamics and motion information inherent in radar-based video data, which is vital for identifying and tracking moving targets. This paper proposes a novel deep video processing technique designed for radar-based target recognition using high-resolution images. The approach leverages convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to extract spatial and temporal features from radar video sequences. By integrating image enhancement algorithms and advanced feature fusion techniques, the system is capable of processing high-resolution radar frames in real-time. The method involves a twostage process: first, extracting high-level spatial features from individual radar images using CNNs; second, capturing temporal relationships between frames with RNNs for robust target identification and tracking. Experimental results on a radar video dataset show significant improvements in target recognition accuracy. The proposed technique achieves a recognition rate of 94.3% in identifying static and dynamic targets, outperforming traditional methods by 15-20%. In terms of processing speed, the method demonstrates real-time performance with an average frame processing time of 32 ms, ensuring its suitability for operational environments. The system also demonstrates robustness against noise, with a decrease in false positive rates by 12%. R. Krithika A.N. Jayanthi 2024-11-01 2024-11-01 15 An Enhanced Adaptive Image Filtering and Enhancement with Multimedia Video Streaming https://i-scholar.in/index.php/IJIVP/article/view/226628 Image filtering and enhancement play a pivotal role in ensuring the quality and clarity of visual content, particularly in multimedia video streaming applications. Existing filtering techniques often struggle with balancing noise reduction, detail preservation, and real-time performance, resulting in suboptimal outcomes in dynamic video environments. Furthermore, video streaming systems demand adaptive solutions that cater to diverse lighting and noise conditions. To address these challenges, a novel Enhanced Adaptive Image Filtering and Enhancement framework combining Deep Artificial Neural Networks (Deep ANN) with Adaptive Histogram Equalization (AHE) is proposed. This method leverages the powerful learning capabilities of Deep ANN to identify noise patterns and preserve critical details, while AHE dynamically adjusts contrast to improve visual quality in varying lighting conditions. The proposed framework is tested on real-time video streaming datasets, simulating environments with low light, noise, and high-motion scenarios. The results show significant improvements over traditional filtering methods. Experimental evaluations show an increase in Peak Signal-to-Noise Ratio (PSNR) to 42.3 dB, compared to 37.1 dB achieved by conventional methods. Structural Similarity Index Measure (SSIM) reached 0.96, reflecting enhanced detail preservation and perceptual quality. Moreover, the framework achieved a 35% reduction in Mean Squared Error (MSE) and maintained an average processing speed of 28 frames per second, making it suitable for real-time applications. These findings highlight the potential of combining advanced neural network capabilities with adaptive histogram techniques to enhance multimedia video streaming quality. This method ensures superior performance in diverse environments, paving the way for immersive and reliable video streaming experiences. Renuka Deshpande Kavita Tukaram Patil Swati Sah Sameer Yadav 2024-11-01 2024-11-01 15