Ragav Venkatesan

Zero-shot transfer of domain-adapted base networks.

Ragav Venkatesan, Gurumurthy Swaminathan, Xiong Zhou, Fedor Zhdanov,

Patents

Abstract

Techniques for zero-shot and few-shot transfer of domain-adapted base networks are described. Multiple machine learning task layers are trained using a shared base feature extractor network. At least one task layer is trained with samples and corresponding labels from a first domain as well as a second domain. At least one other task layer is trained with samples and corresponding labels from only the first domain. Ultimately, the other task layer (s) are adapted to generate labels for the first domain via the base network being weighted based on all trainings.

Decoupled machine learning training.

Ragav Venkatesan, Saurabh Gupta, Gurumurthy Swaminathan, Vineet Khare, Bharathan Balaji, Leo Parker Dirac, Sahika Genc

Patents

Abstract

A machine learning environment utilizing training data generated by customer environments. A reinforced learning machine learning environment receives and processes training data generated by independently hosted, or decoupled, customer environments. The reinforced learning machine learning environment corresponds to machine learning clusters that receive and process training data sets provided by the decoupled customer environments. The customer environments include an agent process that collects training data and forwards the training data to the machine learning clusters without exposing the customer environment. The machine learning clusters can be configured in a manner to automatically process the training data without requiring additional user inputs or controls to configured the application of the reinforced learning machine learning processes.

Searching compression profiles for trained neural networks.

Ragav Venkatesan, Gurumurthy Swaminathan, Xiong Zhou, Anna Luo, Vineet Khare

Patents

Abstract

Compression profiles may be searched for trained neural networks. An iterative compression profile search may be performed response to a search request. Different prospective compression profiles may be generated for trained neural networks according to a search policy. Performance of compressed versions of the trained neural networks according to the compression profiles may be tracked. The search policy may be updated according to an evaluation of the performance of the compression profiles for the compressed versions of the trained neural networks using compression performance criteria. When a search criteria is satisfied, a result for the compression profile search may be provided.

Applying compression profiles across similar neural network architectures.

Ragav Venkatesan, Gurumurthy Swaminathan, Xiong Zhou, Anna Luo, Vineet Khare

Patents

Abstract

Neural networks with similar architectures may be compressed using shared compression profiles. A request to compress a trained neural network may be received and an architecture of the neural network identified. The identified architecture may be compared with the different network architectures mapped to compression profiles to select a compression profile for the neural network. The compression profile may be applied to remove features of the neural network to generate a compressed version of the neural network.

Reinforcement learning for training compression policies for machine learning models.

Ragav Venkatesan, Gurumurthy Swaminathan, Xiong Zhou, Anna Luo, Vineet Khare

Patents

Abstract

A compression policy to produce compression profiles for compressing trained machine learning models may be trained using reinforcement learning. An iterative reinforcement learning may be performed response to a search request. Different prospective compression profiles may be generated for received machine learning models according to a compression policy being trained. Performance of compressed versions of the trained neural networks according to the compression profiles may be caused using data sets used to train the machine learning models. The compression policy may be updated according to reward signal determined from an application of a reward function for performance criteria to performance results of the different versions of the machine learning models. When a search criteria is satisfied, the trained compression policy may be provided.

Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks.

Ansel MacLaughlin, Jwala Dhamala, Anoop Kumar, Sriram Venkatapathy, Ragav Venkatesan, Rahul Gupta

Conference Papers Workshop on Insights from Negative Results in NLP at the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

Abstract

Neural Architecture Search (NAS) methods, which automatically learn entire neural model or individual neural cell architectures, have recently achieved competitive or state-of-the- art (SOTA) performance on variety of natural language processing and computer vision tasks, including language modeling, natural language inference, and image classification. In this work, we explore the applicability of a SOTA NAS algorithm, Efficient Neural Architecture Search (ENAS) to two sentence pair tasks, paraphrase detection and semantic textual similarity. We use ENAS to perform a micro-level search and learn a task-optimized RNN cell architecture as a drop-in replacement for an LSTM. We explore the effectiveness of ENAS through experiments on three datasets (MRPC, SICK, STS-B), with two different models (ESIM, BiLSTM-Max), and two sets of embeddings (Glove, BERT). In contrast to prior work applying ENAS to NLP tasks, our results are mixed – we find that ENAS architectures sometimes, but not always, outperform LSTMs and perform similarly to random architecture search.

Domain Adaptation in Computer Vision with Deep Learning. - Chapter 1.

Xiang Xu, Xiong Zhou, Ragav Venkatesan, Gurumurthy Swaminathan, Orchid Majumdar

BooksSpringer Publications.

About the book

On the one hand, deep neural networks are effective in learning large datasets. On the other, they are inefficient with their data usage. They often require copious amount of labeled-data to train their scads of parameters. Training larger and deeper networks is hard without appropriate regularization, particularly while using a small dataset. Laterally, collecting well-annotated data is expensive, time consuming and often infeasible. A popular way to regularize these networks is to simply train the network with more data from an alternate representative dataset. This can lead to adverse effects if the statistics of the representative dataset are dissimilar to our target. This predicament is due to the problem of domain shift. Data from a shifted domain might not produce bespoke features when a feature extractor from the representative domain is used.

Several techniques of domain adaptation have been proposed in the past to solve this problem. In this paper, we propose a new technique (d-SNE) of domain adaptation that cleverly uses stochastic neighborhood embedding techniques and a novel modified-Hausdorff distance. The proposed technique is learnable end-to-end and is therefore, ideally suited to train neural networks. Extensive experiments demonstrate that d-SNE outperforms the current states-of-the-art and is robust to the variances in different datasets, even in the one-shot and semi-supervised learning settings. d-SNE also demonstrates the ability to generalize to multiple domains concurrently

Out-of-the-box channel pruned networks

Ragav Venkatesan, Gurumurthy Swaminathan, Xiong Zhou, Anna Luo

arXiv PapersarXiv:1705.00744 2017.

Abstract

In the last decade convolutional neural networks have become gargantuan. Pre-trained models, when used as initializers are able to fine-tune ever larger networks on small datasets. Consequently, not all the convolutional features that these fine-tuned models detect are requisite for the end-task. Several works of channel pruning have been proposed to prune away compute and memory from models that were trained already. Typically, these involve policies that decide which and how many channels to remove from each layer leading to channel-wise and/or layer-wise pruning profiles, respectively. In this paper, we conduct several baseline experiments and establish that profiles from random channel-wise pruning policies are as good as metric-based ones. We also establish that there may exist profiles from some layer-wise pruning policies that are measurably better than common baselines. We then demonstrate that the top layer-wise pruning profiles found using an exhaustive random search from one datatset are also among the top profiles for other datasets. This implies that we could identify out-of-the-box layer-wise pruning profiles using benchmark datasets and use these directly for new datasets. Furthermore, we develop a Reinforcement Learning (RL) policy-based search algorithm with a direct objective of finding transferable layer-wise pruning profiles using many models for the same architecture. We use a novel reward formulation that drives this RL search towards an expected compression while maximizing accuracy. Our results show that our transferred RL-based profiles are as good or better than best profiles found on the original dataset via exhaustive search. We then demonstrate that if we found the profiles using a mid-sized dataset such as Cifar10/100, we are able to transfer them to even a large dataset such as Imagenet.

Domain mapping for privacy preservation

Ragav Venkatesan, Gurumurthy Swaminathan

Patents

Abstract

Implementations detailed herein include description of a computer-implemented method. In an implementation, the computer-implemented method including training a machine learning model using domain mapped third party data; and performing inference using the machine learning model by: receiving scoring data, domain mapping the received scoring data using a domain mapper that was used to generate the domain mapped third party data, and applying the machine learning model to the domain mapped received scoring data to generate an output result.

d-SNE: Domain Adaptation using Stochastic Neighborhood Embedding

Xiang Xu, Xiong Zhou, Ragav Venkatesan, Gurumurthy Swaminathan, Orchid Majumdar

Conference Papers IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, California, USA, 2019. [ORAL with 5.5% acceptance rate.]

Abstract

On the one hand, deep neural networks are effective in learning large datasets. On the other, they are inefficient with their data usage. They often require copious amount of labeled-data to train their scads of parameters. Training larger and deeper networks is hard without appropriate regularization, particularly while using a small dataset. Laterally, collecting well-annotated data is expensive, time consuming and often infeasible. A popular way to regularize these networks is to simply train the network with more data from an alternate representative dataset. This can lead to adverse effects if the statistics of the representative dataset are dissimilar to our target. This predicament is due to the problem of domain shift. Data from a shifted domain might not produce bespoke features when a feature extractor from the representative domain is used.

Several techniques of domain adaptation have been proposed in the past to solve this problem. In this paper, we propose a new technique (d-SNE) of domain adaptation that cleverly uses stochastic neighborhood embedding techniques and a novel modified-Hausdorff distance. The proposed technique is learnable end-to-end and is therefore, ideally suited to train neural networks. Extensive experiments demonstrate that d-SNE outperforms the current states-of-the-art and is robust to the variances in different datasets, even in the one-shot and semi-supervised learning settings. d-SNE also demonstrates the ability to generalize to multiple domains concurrently

Feature Engineering for Machine Learning and Data Analytics. - Chapter 3.

Ragav Venkatesan, Parag Shridhar Chandakkar, Baoxin Li

BooksChapman & Hall/CRC Press.

About the book

Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if there are few features to represent the underlying data objects, and the quality of results of those algorithms largely depends on the quality of the available features. Feature Engineering for Machine Learning and Data Analytics provides a comprehensive introduction to feature engineering, including feature generation, feature extraction, feature transformation, feature selection, and feature analysis and evaluation.

The book presents key concepts, methods, examples, and applications, as well as chapters on feature engineering for major data types such as texts, images, sequences, time series, graphs, streaming data, software engineering data, Twitter data, and social media data. It also contains generic feature generation approaches, as well as methods for generating tried-and-tested, hand-crafted, domain-specific features.

The first chapter defines the concepts of features and feature engineering, offers an overview of the book, and provides pointers to topics not covered in this book. The next six chapters are devoted to feature engineering, including feature generation for specific data types. The subsequent four chapters cover generic approaches for feature engineering, namely feature selection, feature transformation based feature engineering, deep learning based feature engineering, and pattern based feature generation and engineering. The last three chapters discuss feature engineering for social bot detection, software management, and Twitter-based applications respectively.

This book can be used as a reference for data analysts, big data scientists, data preprocessing workers, project managers, project developers, prediction modelers, professors, researchers, graduate students, and upper level undergraduate students. It can also be used as the primary text for courses on feature engineering, or as a supplement for courses on machine learning, data mining, and big data analytics.

Novel image representations and learning taks.

Ragav Venkatesan

Doctoral DissertationDoctoral Dissertation, Arizona State University, Tempe 2017.

Abstract

Computer Vision as a field has gone through significant changes in the last decade. The field has seen tremendous success in designing learning systems with hand-crafted features and in using representation learning to extract better features. In this dissertation some novel approaches to representation learning and task learning are studied. Multiple-instance learning which is generalization of supervised learning, is one example of task learning that is discussed. In particular, a novel non-parametric $k$-NN-based multiple-instance learning is proposed, which is shown to outperform other existing approaches. This solution is applied to a diabetic retinopathy pathology detection problem effectively.

In cases of representation learning, generality of neural features are investigated first. This investigation leads to some critical understanding and results in feature generality among datasets. The possibility of learning from a mentor network instead of from labels is then investigated. Distillation of dark knowledge is used to efficiently mentor a small network from a pre-trained large mentor network. These studies help in understanding representation learning with smaller and compressed networks.

Convolutional Neural Networks in Visual Computing: A Concise Guide.

卷积神经网络与视觉计算.

Ragav Venkatesan, Baoxin Li

BooksCRC Press, Tylor and Francis Group, LLC. 2017

机械工业出版社. 2019

About the book

This book is intended to be a guide for engineers and ML practitioners to effortlessly and in a much simpler manner than what a textbook would offer, learn the nuts and bolts of CNNS. We target this book for undergraduate students, graduate research students starting off with CNNs but mainly industrial practitioners and kagglers. The book is available in English and in Chinese.

There is a Toolbox associated with this work I developed.

MIRank-KNN: Multiple Instance Retrieval of Clinically-Relevant Diabetic Retinopathy Images

Parag Shridhar Chandakkar, Ragav Venkatesan, Baoxin Li

Journal PapersSPIE Journal of Medical Imaging, 2017.

Abstract

Diabetic retinopathy (DR) is a consequence of diabetes and is the leading cause of blindness among working adults. Regular screening is critical to early detection and treatment of DR. Computer-aided diagnosis has the potential of improving the practice in DR screening or diagnosis. This paper presents an automated approach to retrieving clinically-relevant images from a set of previously-diagnosed fundus camera images for improving the efficiency of screening and diagnosis of DR. Considering that DR lesions are often localized, we propose a multi-class multiple-instance framework for the retrieval task. Considering the special visual properties of DR images, we develop a feature space of a modified color correlogram appended with statistics of steerable Gaussian filter responses selected by fast symmetric radial transform points. Experiments with real DR images demonstrate that the proposed approach is able to outperform existing methods.

A strategy for an uncompromising incremental learner

Ragav Venkatesan, Hemanth Venkateswara, Sethuraman Panchanathan, Baoxin Li

arXiv PapersarXiv:1705.00744 2017.

Abstract

Multi-class supervised learning systems require the knowledge of the entire range of labels they predict. Often when learnt incrementally, they suffer from catastrophic forgetting. To avoid this, generous leeways have to be made to the philosophy of incremental learning that either forces a part of the machine to not learn, or to retrain the machine again with a selection of the historic data. While these tricks work to various degrees, they do not adhere to the spirit of incremental learning. In this article, we redefine incremental learning with stringent conditions that do not allow for any undesirable relaxations and assumptions. We design a strategy involving generative models and the distillation of dark knowledge as a means of hallucinating data along with appropriate targets from past distributions. We call this technique phantom sampling. We show that phantom sampling helps avoid catastrophic forgetting during incremental learning. Using an implementation based on deep neural networks, we demonstrate that phantom sampling dramatically avoids catastrophic forgetting. We apply these strategies to competitive multi-class incremental learning of deep neural networks. Using various benchmark datasets through our strategy, we demonstrate that strict incremental learning could be achieved.

Diving deeper into mentee networks.

Ragav Venkatesan, Baoxin Li

arXiv PapersarXiv:1604.08220 2016.

Abstract

Modern computer vision is all about the possession of powerful image representations. Deeper and deeper convolutional neural networks have been built using larger and larger datasets and are made publicly available. A large swath of computer vision scientists use these pre-trained networks with varying degrees of successes in various tasks. Even though there is tremendous success in copying these networks, the representational space is not learnt from the target dataset in a traditional manner. One of the reasons for opting to use a pre-trained network over a network learnt from scratch is that small datasets provide less supervision and require meticulous regularization, smaller and careful tweaking of learning rates to even achieve stable learning without weight explosion. It is often the case that large deep networks are not portable, which necessitates the ability to learn mid-sized networks from scratch.

In this article, we dive deeper into training these mid-sized networks on small datasets from scratch by drawing additional supervision from a large pre-trained network. Such learning also provides better generalization accuracies than networks trained with common regularization techniques such as l2, l1 and dropouts. We show that features learnt thus, are more general than those learnt independently. We studied various characteristics of such networks and found some interesting behaviors.

On the generality of neural image features.

Ragav Venkatesan, Jaya Vijetha Gattupalli, Baoxin Li

Conference PapersIEEE International Conference on Image Processing, Phoenix, Arizona, USA, 2016. [ORAL]

Abstract

Often the filters learned by Convolutional Neural Networks (CNNs) from different image datasets appear similar. This similarity of filters is often exploited for the purposes of transfer learning. This is also being used as an initialization technique for different tasks in the same dataset or for the same task in similar datasets. Off-the-shelf CNN features have capitalized on this idea to promote their networks as best transferable and most general and are used in a cavalier manner in day-to-day computer vision tasks.

It is curious that while the filters learned by these CNNs are related to the atomic structures of the images from which they are learnt, all datasets learn similar looking low-level filters. With the understanding that a dataset that contains many such atomic structures learn general filters and are therefore useful to initialize other networks with, we propose a way to analyse and quantify generality. We applied this metric on several popular character recognition, natural image and a medical image dataset, and arrive at some interesting conclusions. On further experimentation we also discovered that particular classes in a dataset themselves are more general than others.

Evolution of fashion brands on Twitter and Instagram.

Lydia Manikonda, Ragav Venkatesan, Subbarao Khambampati and Baoxin Li

arXiv PapersarXiv:1512.01174 2015.

Abstract

Social media platforms are popular venues for fashion brand marketing and advertising. With the introduction of native advertising, users don't have to endure banner ads that hold very little saliency and are unattractive. Using images and subtle text overlays, even in a world of ever-depreciating attention span, brands can retain their audience and have a capacious creative potential. While an assortment of marketing strategies are conjectured, the subtle distinctions between various types of marketing strategies remain under-explored. This paper presents a qualitative analysis on the influence of social media platforms on different behaviors of fashion brand marketing. We employ both linguistic and computer vision techniques while comparing and contrasting strategic idiosyncrasies. We also analyze brand audience retention and social engagement hence providing suggestions in adapting advertising and marketing strategies over Twitter and Instagram.

Simpler non-parametric methods provide as good or better results to multiple-instance learning.

Ragav Venkatesan, Parag Shridhar Chandakkar, Baoxin Li

Conference PapersIEEE International Conference on Computer Vision, Santiago, Chile, 2015. [Poster]

Abstract

Multiple-instance learning (MIL) is a unique learning problem in which training data labels are available only for collections of objects (called bags) instead of individual objects (called instances). A plethora of approaches have been developed to solve this problem in the past years. Popular methods include the diverse density, MILIS and DD-SVM. While having been widely used, these methods, particularly those in computer vision have attempted fairly sophisticated solutions to solve certain unique and particular configurations of the MIL space.

In this paper, we analyze the MIL feature space using modified versions of traditional non-parametric techniques like the Parzen window and k-nearest-neighbour, and develop a learning approach employing distances to k-nearest neighbours of a point in the feature space. We show that these methods work as well, if not better than most recently published methods on benchmark datasets. We compare and contrast our analysis with the well-established diversedensity approach and its variants in recent literature, using benchmark datasets including the Musk, Andrews’ and Corel datasets, along with a diabetic retinopathy pathology diagnosis dataset. Experimental results demonstrate that, while enjoying an intuitive interpretation and supporting fast learning, these method have the potential of delivering improved performance even for complex data arising from real-world applications.

Spatio-temporal video deinterlacing using control grid interpolation

Ragav Venkatesan, Christine Zwart, David Frakes, Baoxin Li

Journal PapersSPIE Journal of Electronic Imaging, 2015.

Abstract

With the advent of progressive format display and broadcast technologies, video deinterlacing has become an important video processing technique. Numerous approaches exist in literature to accomplish deinterlacing. While most earlier methods were simple linear filtering-based approaches, the emergence of faster computing technologies and even dedicated video processing hardware in display units has allowed higher quality, but also more computationally intense, deinterlacing algorithms to become practical. Most modern approaches analyze motion and content in video to select different deinterlacing methods for various spatiotemporal regions. In this paper, we introduce a family of deinterlacers that employs spectral residue to choose between and weight control grid interpolation based spatial and temporal deinterlacing methods. The proposed approaches perform better than the prior state-of-the-art based on peak signal-to-noise ratio (PSNR), other visual quality metrics, and simple perception-based subjective evaluations conducted by human viewers. We further study the advantages of using soft and hard decision thresholds on th visual performance.

Video-Based Self-Positioning for Intelligent Transport Systems Applications

Parag Sridhar Chandakkar, Ragav Venkatesan, Baoxin Li

Conference Papers International Symposium on Visual Computing, Las Vegas, USA, 2014. [ORAL]

Abstract

Many urban areas face traffic congestion. Automatic traffic management systems and congestion pricing are getting prominence in recent research. An important stage in such systems is lane prediction and on-road self-positioning. We introduce a novel problem of vehicle self-positioning which involves predicting the number of lanes on the road and localizing the vehicle within those lanes,using the video captured by a dashboard camera. To overcome the disadvantages of most existing low-level vision-based techniques while tackling this complex problem, we formulate a model in which the video is a key observation. The model consists of the number of lanes and vehicle position in those lanes as parameters, hence allowing the use of high-level semantic knowledge. Under this formulation, we employ a lane-width-based model and a maximum-likelihood-estimator making the method tolerant to slight viewing angle variation. The overall approach is tested on real-world videos and is found to be effective.

Perception-Inspired Spatio-Temporal Video Deinterlacing

Ragav Venkatesan, Christine Zwart, Baoxin Li, David Frakes

Conference PapersInternational Workshop on Video Processing and Quality Metrics for Consumer Electronics, Phoenix, USA , 2014 [ORAL]

Abstract

With the advent of progressive format display and broadcast technologies, video deinterlacing has become an important video processing technique. Numerous approaches exist in literature to accomplish deinterlacing. While most earlier methods were simple linear filtering-based approaches, the emergence of faster computing technologies and even dedicated video processing hardware in display units has allowed higher quality, but also more computationally intense, deinterlacing algorithms to become practical. Most modern approaches analyze motion and content in video to select different deinterlacing methods for various spatiotemporal regions. In this paper, we introduce a family of deinterlacers that employs spectral residue to choose between and weight control grid interpolation based spatial and temporal deinterlacing methods. The proposed approaches perform better than the prior state-of-the-art based on peak signal-to-noise ratio (PSNR), other visual quality metrics, and simple perception-based subjective evaluations conducted by human viewers. We further study the advantages of using soft and hard decision thresholds on th visual performance.

Retrieving clinically relevant diabetic retinopathy images using a multi-class multiple-instance framework

Parag Sridhar Chandakkar, Ragav Venkatesan, Baoxin Li, Helen Li

Conference PapersSPIE conference on Medical Imaging, Florida, USA, 2013. [ORAL]

Abstract

Diabetic retinopathy (DR) is a vision - threatening complication that arises due to prolo nged presenc e of diabetes. When detected and diagnosed at early stages, the effect of DR on vision can be greatly reduced . Content - based image retrieval can be employed to provide a clinician with instant references to archival and standardized images that are clinica lly relevant to the image under diagnosis. This is an innovative way of utilizing the vast expert knowledge hidden in archives of previously diagnosed fundus camera images that helps an ophthalmologist in improving the performance of diagnosis. In this pap er, with a focus on two significant DR clinical findings, namely, microaneurysm and neovascularization that are representative symptoms of non - proliferate and proliferate diabetic retinopathy , the authors propose a multi - class multiple - instance image retri eval framework that makes use of a modified color correlogram and statistics of steerable Gaussian Filter responses for retrieving clinically relevant images from a database. Experiments are performed using fundus camera images and the results compared wit h other prior art methods demonstrate the improved performance of the proposed approach.

Decomposed Multidimensional Control Grid Interpolation for Common Interpolation-Based Image Processing Applications in Consumer Electronics

Christine Zwart, Ragav Venkatesan, David Frakes

Journal PapersSPIE Journal of Electronic Imaging, 2012.

Abstract

Interpolation is an essential and broadly employed function of signal processing. Accordingly, considerable development has focused on advancing interpolation algorithms toward optimal accuracy. Such development has motivated a clear shift in the state-of-the art from classical interpolation to more intelligent and resourceful approaches, registration-based interpolation for example.As a natural result, many of the most accurate current algorithms are highly complex, specific, and computationally demanding. However, the diverse hardware destinations for interpolation algorithms present unique constraints that often preclude use of the most accurate available options. For example, while computationally demanding interpolators may be suitable for highly equipped image processing platforms (e.g., computer workstations and clusters), only more efficient interpolators may be practical for less well equipped platforms (e.g., smartphones and tablet computers). The latter examples of consumer electronics present a design tradeoff in this regard: high accuracy interpolation benefits the consumer experience but computing capabilities are limited. It follows that interpolators with favorable combinations of accuracy and efficiency are of great practical value to the consumer electronics industry. We address multidimensional interpolation-based image processing problems that are common to consumer electronic devices through a decomposition approach. The multidimensional problems are first broken down into multiple, independent, one- dimensional (1-D) interpolation steps that are then executed with a newly modifiedregistration-based one-dimensional controlgrid interpolator. The proposed approach, decomposed multidimensional control grid interpolation (DMCGI), combines the accuracy of registrationbased interpolation with the simplicity, flexibility, and computational efficiency of a 1-D interpolation framework. Results demonstrate that DMCGI provides improved interpolation accuracy (and other benefits) in image resizing, color sample demosaicing, and video deinterlacing applications, at a computational cost that is manageable or reduced in comparison to popular alternatives.

Clinically Relevant Diabetic Retinopathy Image Retrieval Using a Multi-Class Multiple Instance Framework

Parag Sridhar Chandakkar, Ragav Venkatesan, Baoxin Li, Helen Li

Poster PapersACM conference on Bio-informatics, Computational Biology and Biomedicine, Florida, USA, 2012. [Poster]

Abstract

Diabetic retinopathy (DR) is a vision-threatening complication that affects people suffering from diabetes. Diagnosis of DR during early stages can significantly reduce the risk of severe vision loss. The process of DR severity grading is prone to human error and it also depends on the expertise of the ophthalmologist. As a result, many researchers have started exploring automated detection and evaluation of diabetic retinal lesions. Unfortunately, to date there is no automated system that can perform DR lesion detection with the accuracy that is comparable to a human expert. In this poster, we present a novel way of employing content-based image retrieval for providing a clinician with instant reference to archival and standardized DR images that are used for assisting the ophthalmologist with the diagnosis of a given DR image. The focus of the poster is on retrieving DR images with two significant DR clinical findings, namely, microaneurysm (MA) and neovascularization (NV). We propose a multi-class multiple-instance DR image retrieval framework that makes use of a modified color correlogram (CC) and statistics of steerable Gaussian filter (SGF) responses. Experiments using real DR images with comparisons to other prior-art methods demonstrate the improved performance of the proposed approach.

Classification of Diabetic Retinopathy Images Using Multi-Class Multiple-Instance Learning Based on Color Correlogram Features

Ragav Venkatesan, Parag Sridhar Chandakkar, Baoxin Li, Helen Li

Conference PapersIEEE International Conference Engineering in Medicine and Biology Society, San Diego, USA, 2012. [Poster]

Abstract

All people with diabetes have the risk of developing diabetic retinopathy (DR), a vision-threatening complication. Early detection and timely treatment can reduce the occurrence of blindness due to DR. Computer-aided diagnosis has the potential benefit of improving the accuracy and speed in DR detection. This study is concerned with automatic classification of images with microaneurysm (MA) and neovascularization (NV), two important DR clinical findings. Together with normal images, this presents a 3-class classification problem. We propose a modified color auto-correlogram feature (AutoCC) with low dimensionality that is spectrally tuned towards DR images. Recognizing the fact that the images with or without MA or NV are generally different only in small, localized regions, we propose to employ a multi-class, multiple-instance learning framework for performing the classification task using the proposed feature. Extensive experiments including comparison with a few state-of-art image classification approaches have been performed and the results suggest that the proposed approach is promising as it outperforms other methods by a large margin.

Video deinterlacing with control grid interpolation

Ragav Venkatesan, Christine Zwart, David Frakes

Conference PapersIEEE International Conference on Image Processing, Florida, USA, 2012. [Poster]

Abstract

Video deinterlacing is a key technique in digital video processing, particularly with the widespread usage of LCD and plasma TVs. This paper proposes a novel spatio-temporal video deinterlacing technique that adaptively chooses between results from segment adaptive gradient angle interpolation (SAGA), vertical temporal filter (VTF) and temporal line averaging (LA). The proposed method performs better than several popular benchmarking methods in terms of both visual quality and PSNR and requires minimal computational overhead. The algorithm performs better than existing approaches on fine moving edges and semi-static regions of videos, which are recognized as particularly challenging deinterlacing cases.

Video Deinterlacing using Control Grid Interpolation Frameworks

Ragav Venkatesan

Masters ThesisMasters Thesis, Arizona State University, Tempe 2012.

Abstract

Video deinterlacing is a key technique in digital video processing, particularly with the widespread usage of LCD and plasma TVs. This thesis proposes a novel spatio-temporal, non-linear video deinterlacing technique that adaptively chooses between the results from one dimensional control grid interpolation (1DCGI), vertical temporal filter (VTF) and temporal line averaging (LA). The proposed method performs better than several popular benchmarking methods in terms of both visual quality and peak signal to noise ratio (PSNR). The algorithm performs better than existing approaches like edge-based line averaging (ELA) and spatio-temporal edge-based median filtering (STELA) on fine moving edges and semi-static regions of videos, which are recognized as particularly challenging deinterlacing cases. The proposed approach also performs better than the state-of-the-art content adaptive vertical temporal filtering (CAVTF) approach. Along with the main approach several spin-off approaches are also proposed each with its own characteristics