Interview with Humayun Irshad: Leveraging Street View Imagery and Deep Learning Models to Build Parking Augmented Maps

One of our MLconf Program Committee Members, Reshama Shaikh, recently interviewed Humayun Irshad, Lead Scientist of Machine Learning at Figure Eight regarding his work using Deep Learning Models to Build Parking Augmented Maps.

Finding proper parking spot in cities such as San Francisco is a driver’s everyday struggle as streets are reaching their capacities. Parking signs are installed to inform drivers about parking regulations applied to the curbs. However, in many cases parking signs can create confusion, cause more traffic and harm the transportation, environment and pedestrian safety as drivers spend a lot of time circling around the streets and reading parking signs to avoid tickets. Parking rules often change a lot among curbs. Multiple rules may be applied to curbs and parking signs may contain too much text that drivers need to read and understand correctly to avoid tickets. On the other hand, public street-level databases such as google street view contains tons of visual information about parking regulations which is locked in the images. Computer vision models can leverage such databases to extract parking rules from images. Parking rules in digitized format then can be used to build tools which presents parking regulations in a format that be easier for drivers to understand and follow or send them notifications about change of rules. The key benefit of exploiting public street-level imagery databases such as google street-view imagery is that once parking signs are detected and read from images, they can be located on the same platform and used as metadata with maps to provide parking related services to drivers navigating through those maps. Building a computer vision based system for extracting and digitizing parking regulations have several steps. First, parking sign data is collected and labeled for training deep learning models. Afterwards, state of the art object detector are used to find the location of parking signs in images. Parking rules are extracted from parking signs using multiple text reading API combined with human verification. In final step metadata about actual location to which panoramas belong are extracted to locate parking signs on the map. In this study, we focus more on the data collection, annotation and training and evaluation of deep learning models for parking signs detection from google street-view images. –Humayun Irshad

RS)  Tell us briefly about yourself and your work.

HI) I am currently the Lead Scientist of Machine Learning at Figure Eight, the essential human-in-the-loop AI platform for data science and machine learning teams. My expertise is in developing machine learning, more specifically deep learning frameworks for various applications like object detection, segmentation and classification in fields ranging from medical, retail, self-driving car, satellite, fashion, etc. Nowadays, I am building Active Learning frameworks for selection of training data from labeled or unlabeled dataset to build model to avoid over-training and dealing corner cases. I have 3 years PostDoc experience at Harvard Medical School where I developed machine learning and deep learning frameworks for Computer Aided Diagnosis system including region of interest detection, nuclei and gland detection, segmentation and classification in 2D and 3D medical images. I received my PhD in Computer Science from University of Grenoble France.

RS)  Regarding one of your current projects at Figure Eight, “Finding Proper Parking Spot”, at what stage is this application?  When will it be available for the public to use?

HI) Parking Sign Recognition is a research project that I started early this year. I proposed and developed Naive Active Learning framework to select the training data for building object detection model to recognize parking sign in unlabeled dataset using human-in-the-loop (crowdsourcing). This framework manages to reach more than 86% accuracy in prediction of parking sign in San Francisco Street View images (downloaded from Google Street View), after 5 iterations of Active Learning. The paper is under-review in one of the AI related conference. Training and Validation dataset is available on Figure Eight website for download. We are brainstorming about how to make this service available to public either with web API or mobile app.

RS)  How has your experience in cancer research augmented your skill set for your current role as Lead Scientist of Machine Learning and Computer Vision at Figure Eight?

HI) I am lucky that I got extensive training and experience working with wide range of medical images including 2D and 3D image modalities in H&E stained, immunohistochemistry and fluorescence images. These images are acquired with a wide range of scanners and microscopes like bright field, confocal, lightsheet and super resolution microscopy. At Harvard, I developed a number of computational image analysis frameworks for cancer diagnosis and prognosis, more specifically for Breast Cancer and published in the top Journals and presented at well know conferences and meetings. I faced different challenges like image acquisition issues including noses inherited from scanners and tissue preparation, problems in collection of ground truth labeling from medical experts, handling imbalance dataset and building models to outperform in any type of images, not only across scanners but also across hospitals. These experiences increased my domain knowledge and expertise in machine learning, computer vision, image acquisition, image analysis and statistical methods. Eventually, it gave me skills and confidence to deal with a wide range of image analysis problems and issues.

RS)  What tools are you using for computer vision?  What are the deep learning approaches that you are incorporating into your current projects?

HI) Nowadays, I am extensively using current state of the arts frameworks and models like Tensorflow, Keras and PyTorch with Python. Mostly, I am working with image classification, image enhancement, object detection, instance segmentation and semantic segmentation problems. Depending on dataset and complexity of problem, I selected different models for different applications. For image classification, I built a pipeline of all state-of-the-art deep models including VGG, Resnet, Inceptions, DenseNet, NASNet and all their derivations. For image enhancement, I tried either traditional computer vision techniques or GAN (Generative Adversarial Network) models to improve image from nose or reconstruct images from occluded and missing parts. I built customized object detection pipelines of Faster R-CNN, SSD (Single Shot Multibox Detector) and YOLO (You Only Looks Once) models for a wide range of object detection in number of domains but not limited to self-driving cars, drones, satellite, medical, fashion and retails. For object segmentation, I developed frameworks of auto-encoders and GAN models for instance and semantic segmentations. All these deep learning and machine learning frameworks are also part of our crowdsourcing platform to assist and increase the performance of our crowd on different labeling jobs.

Now a days, I am working on Active Learning techniques and strategies for optimal selection of  data set for labeling and training the machine learning models.

RS)  What are some of the recent research papers you have read?

HI) Here is a list of research papers that I read in the past month:

  • Large scale GAN training for high fidelity natural image synthesis (ICLR 2019)
  • Learning transferable architectures for scalable image recognition (Barret Zoph, 2018)
  • 3D MRI brain tumor segmentation using autoencoder regularization (Andrily Myronenko, MICCAI 2018) – won the MICCAI 3D MRI Brain Tumor Segmentation Challenge
  • Dermatologist level of classification of skin cancer with deep neural networks (Andre Esteva , Nature 2017)

RS)  I noticed that you released a video entitled An Active Learning Approach to Image Recognition.  Can you elaborate on that approach?

HI) Deep learning models have been used extensively to solve real-world problems in recent years. The performance of such models relies heavily on large amounts of labeled data for training. While the advances of data collection technology have enabled the acquisition of a massive volume of data, labeling the data remains an expensive and time-consuming task. Active learning techniques are being progressively adopted to accelerate the development of machine learning solutions by allowing the model to query the data they learn from. In this approach, we introduce a real-world problem, the recognition of parking signs, and present a framework that combines active learning techniques with a transfer learning approach and crowd-sourcing tools to create and train a machine learning solution to the problem. We discuss how such a framework contributes to building an accurate model in a cost-effective and fast way to solve the parking sign recognition problem in spite of the unevenness of the data associated with the fact that street-level images (such as parking signs) vary in shape, color, orientation and scale, and often appear on top of different types of background.

RS)  As someone who has transitioned from academia to industry, have you been able to continue your research and publish?

HI) Yes, I am lucky in a sense that my current employer (Figure Eight) also encourages to do research and publish in well know conferences and journals along with development of frameworks and models for customer and platform. I have submitted two research papers (which are under-review) in well-known conferences of AI and Machine Learning. Besides, I am also finishing few of my research projects and publishing in Journals which I started at Harvard.

RS)  Here are a few of the trending topics in data science.  What are your high-level thoughts, in 1-2 sentences, for each topic.    

Algorithm reliability

  • Algorithm reliability is one of the top measure to evaluate the algorithm. In term of classification algorithm, the reliability of classification as an estimated probability that the classification is in fact the correct one.

Reproducibility

  • Reproducibility is very critical in deploying the model in industry. It minimizes variations in case of rerunning the prior experiments, makes the model fault tolerance and iterative improvement of models.

Diversity

  • Machine learning models should be better fit for special requirements of different tasks. Many factors can affect the performance of the ML process, among which diversity of the machine learning is an important one.

Open source

  • I always have support for open source frameworks and encourage other as well.

Opportunities for entry level data scientists

  • Entry level data scientists should focus on one aspect/problem at one time and after mastering and gaining significant knowledge, then go for next one. Try to avoid ML cocktail in start of your journey.

 

RS)  Thank you for participating in this interview.

 

Humayun Irshad is currently the Lead Scientist of Machine Learning at Figure Eight, the essential human-in-the-loop AI platform for data science and machine learning teams. He has expertise in developing machine learning, more specifically deep learning frameworks for various applications like object detection, segmentation and classification in fields ranging from medical, retail, self-driving car, satellite, fashion, etc. Nowadays, he is building Active Learning frameworks for selection of training data from labeled or unlabeled dataset to build model to avoid over-training and dealing corner cases. He has 3 years PostDoc experience at Harvard Medical School where he developed machine learning and deep learning frameworks for Computer Aided Diagnosis system including region of interest detection, nuclei and gland detection, segmentation and classification in 2D and 3D medical images. He got a PhD in Computer Science from University of Grenoble France.

A computer scientist with expertise in machine learning, deep learning, computer vision, bio-medical image analysis and statistical methods. He likes to use his analytical mind not only when building complex models, but also as part of his leadership philosophy. Humayun also enjoys sharing his experiences in technical and non-technical audiences in conferences, seminars and meetings.

Reshama Shaikh is a data scientist/statistician and MBA with skills in Python, R and SAS. She worked for over 10 years as a biostatistician in the pharmaceutical industry. She is also an organizer of the meetup groups NYC Women in Machine Learning & Data Science (http://wimlds.org) and PyLadies.  She received her M.S. in statistics from Rutgers University and her M.B.A. from NYU Stern School of Business.

Twitter: @reshamas