Public Lecturer: Ali Ghodsi, University of Waterloo

What is missing from common practice in machine learning?

AI, and machine learning in particular, is enjoying its golden age. Machine learning has changed the face of the world over the past two decades but we are still a long way from achieving a general artificial intelligence. In this talk, I will discuss a couple of elements that I believe are missing from common practice in machine learning, including incorporating causality and creating a new framework for unsupervised learning.

Biography

Ali Ghodsi is a Professor in the Department of Statistics and Actuarial Science at the University of Waterloo. His research involves statistical machine-learning methods. Ghodsi's research spans a variety of areas in computational statistics. He studies theoretical frameworks and develops new machine learning algorithms for analyzing large-scale data sets, with applications to bioinformatics, data mining, pattern recognition, robotics, computer vision, and sequential decision making.



Back to Agenda

Keynote Presenter: Benjamin C. M. Fung, McGill University

Data Mining and Machine Learning for Authorship and Malware Analyses

In this talk, we will discuss two research problems in authorship and malware analyses with imbalance characteristics in the data and a new project in neuroscience. (1) Given an anonymous e-mail or some tweets, can we identify the author or infer the author's characteristics based on his/her writing styles? I will present a representation learning method for authorship analysis and give a live software demonstration. (2) Assembly code analysis is one of the critical processes for mitigating the exponentially increasing threats from malicious software. However, it is a manually intensive and time-consuming process even for experienced reverse engineers. An effective and efficient assembly code clone search engine can greatly reduce the effort of this process. We have implemented an award-winning assembly clone search engine called Kam1n0. It is the first clone search engine that can efficiently identify a given query assembly function's subgraph clones from a large assembly code repository. I will give a live demonstration of Kam1n0. (3) Our team has recently started a new collaborative research project with a neuroscience team at McGill to study stress and memory. We do not have any significant result yet, but the research problem itself is interesting. I would like to share it with the goal of initiating students' interests in machine learning.

Slides

Biography

Prof. Benjamin Fung is a Canada Research Chair in Data Mining for Cybersecurity, an Associate Professor of School of Information Studies (SIS) at McGill University, a Co-curator of Cybersecurity in the World Economic Forum (WEF), and an Associate Editor of IEEE Transactions on Knowledge and Data Engineering (TKDE). He received a Ph.D. degree in computing science from Simon Fraser University in 2007. Collaborating closely with the national defense, law enforcement, transportation, and healthcare sectors, he has published over 120 refereed articles that span across the research forums of data mining, machine learning, privacy protection, cyber forensics, services computing, and building engineering with over 8000 citations. His data mining works in crime investigation and authorship analysis have been reported by media worldwide, including New York Times, BBC, and CBC. Fung is an Associate Member of MILA and a licensed professional engineer in software engineering. His research website is located here



Back to Agenda

Isar Nejadgholi, National Research Council

Privacy-preserving data augmentation in medical text analysis

Having access to high volume and variety of data is the key to success for many learning-based data processing techniques. However, in the field of medical information processing, data sharing has been a challenge mainly because of privacy concerns. To address this issue, researchers have been focusing on various methods of privacy-preserving data sharing approaches. Specifically, principles of differential privacy were used for data de-identification or generating synthetic data with similar attributes to original sensitive data. Although differential privacy is a well-established mathematical method, it has some limitations due to the trade-off between protection and utility of the de-identified data. Another solution to protect personal information while benefiting from it, is learning privacy-preserved representations of sensitive text and sharing the learned latent representations rather than sharing the sensitive text itself.
This talk briefly touches on various methods used for ethical and responsible sharing of sensitive data and explains the concept of privacy-preserving sharing of text representations in more details. The talk specifically explains how this idea can be used to augment low resource natural language processing tasks such as medical text processing.

Slides

Biography

Isar Nejadgholi is a research officer at National Research Council Canada (NRC). She received her PhD in biomedical engineering in 2012. In her MSc and PhD research, she designed neural network models and used them to solve speech-to-text and computer vision problems. In her postdoctoral studies, she applied machine learning methods to process biomedical signals in a variety of applications. She published the results of her research in more than 20 peer- reviewed journal and conference papers.
After her postdoctoral studies, Isar joined the industry where she led a team of machine learning researchers. They worked on the design and implementation of machine learning solutions for processing a variety of data types, with applications in legal, health, retail, and auditing businesses. Isar has recently joined text analytics team at NRC where the focus of her work is addressing some of the social aspects of AI applications, specifically fairness and privacy issues.



Back to Agenda

Robin Grosset, MindBridge Analytics Inc.

Class Imbalance in Fraud Detection

At MindBridge, we help to uncover irregularities in financial data and whether caused by fraud or human error. Fraud is, and one hopes, a rare circumstance thus when building models to detect or predict fraud we are typically dealing with a significant class imbalance. In this presentation we will cover the types of issue that we have to unwrap and understand before we can predict fraud but also diving deeper in strategies which help us identify the sources of weak signals which can significantly improve our predictive outcomes.

Biography

Robin currently works at MindBridge where he leads the development of a next generation AI Auditor which helps professionals detect and prevent financial anomalies including fraud. Robin has a track record as an entrepreneur having founded successful software start- ups. He joined Cognos and subsequently IBM through acquisitions. In 2012 he was appointed to be an IBM Distinguished Engineer. At IBM, he was a part of the Watson Group where he served as technical lead and chief architect of IBM Watson Analytics. Robin holds many patents in the areas of analytics, data processing and security. MindBridge Ai is a venture-backed FinTech company based in Ottawa, Canada. Through the application of machine learning and artificial intelligence technologies, the MindBridge platform detects anomalous patterns of activities, unintentional errors and intentional misstatements. Using MindBridge Ai Auditor, organizations across multiple industries can minimize financial loss, reduce corporate liability and can focus on providing higher value services to their clients.



Back to Agenda

Isuru Gunasekara, IMRSV Data Labs

Handling class imbalance in natural language processing

This talk will focus on how to train Natural Language Processing (NLP) models in the presence of class imbalance. Unique methods available to NLP datasets will be explored in addition to general methods that could be applied to mitigate problems arising from imbalanced datasets. A real life example of a sentence classification task will be explored in this talk.

Slides

Biography

Isuru’s work is primarily in Natural Language Understanding (NLU) and real-time video processing, and he is the Lead Machine Learning Engineer at IMRSV Data Labs. He utilizes his strong research background to identify and build new AI-based systems to solve customer problems. His ability to quickly implement new solutions driven by customer need is extremely valuable for IMRSV and their customers. Isuru has over 6 years experience in designing and implementing machine learning systems.



Back to Agenda

Herna L. Viktor, University of Ottawa

Adaptive learning with class imbalanced streams

In numerous application domains such as weather monitoring, sensor networks, web logs and video surveillance, data are generated as streams. The distribution of instances of the classes in such streams may be significantly skewed and the number of class labels is often numerous. The task of learning in such multi-class imbalanced settings, where instances of some classes occur much more frequently than others, is challenging. In an evolving stream, this difficulty is further aggravated, due to the temporal and often interleaved rates of data arrival. In addition, streams are often susceptible to changes in data distributions, or so-called concept drifts, to be handled during learning. This talk highlights our current research in this area and outlines future research directions.

Biography

Herna Viktor is a professor at the School of Electrical Engineering and Computer Science (EECS) at the University of Ottawa (uOttawa) and the director of the Applied Artificial Intelligence in Computer Science program at uOttawa. Dr. Viktor’s research focuses on data-driven discovery, with an emphasis on advanced machine learning algorithms to extract deep semantic meaning from data. She is the author of more than 150 journal articles, conference papers and book chapters. The end results of her research have been applied across numerous and diverse domains. Some of these are: the study of anemia pediatric patients, in collaboration with the Hospital for Sick Children in Toronto; exploring the media discourse regarding the Alberta oil sands debate; sentiment analysis for opinion mining in elections; and, most recently, a study of the oral and dental health of Canadians.



Back to Agenda

Hamidreza Sadreazami, McGill University

Radar-based fall monitoring using deep learning

Radar-based sensors for daily activities monitoring provide attractive advantages compared with other technologies, particularly in terms of privacy preservation and non-cooperative monitoring capabilities. More specifically, ambient sensors, especially vision-based sensors, can raise sensitive issues in terms of the confidentiality of the data and privacy of the patients, which may not be an issue for wearable sensors. Wearable sensors, however, require users' cooperation and compliance to be worn or carried, which could be potentially problematic and uncomfortable. Radar sensors can address these issues as no images of the monitored people are collected. Furthermore, there is also an element of non-stigmatizing the subjects to be monitored and their specific needs, as with this technology, there is no need to alter one's usual behavior because of the introduction of the sensor at home, or to wear unusual devices. All these aspects can help address some of the key users' acceptance issues highlighted for wearables, smart phones, and video-cameras, making radar interesting technology to be more appealing in the assisted living context.

Biography

Hamidreza Sadreazami received a Ph.D. degree in Electrical Engineering from Concordia University, Montreal, Canada, in 2016. He is currently a Postdoctoral Fellow at the Bio-engineering department, McGill University, Montreal, Canada. His research interest includes biomedical signal processing, machine learning and statistical modeling. He was nominated for 2017 CAGS/UMI Distinguished Dissertation Award at Concordia University and 2017 Prix d’excellence de l’Association des doyens des tudes suprieures auQubec (ADESAQ). He is the Chair of IEEE Montreal Industry Relations Committee.



Back to Agenda

Julio J. Valdés, National Research Council

Failure modelling of a propulsion subsystem: unsupervised and semi-supervised approaches to anomaly detection.

This work analyses sensor data related to a diesel engine system and specifically its turbocharger subsystem. An incident where the turbocharger seized was recorded by the dozens of standard turbocharger-related sensors. By training models to distinguish between normal healthy operating conditions and deteriorated conditions, there is an opportunity to develop prognostic and predictive tools to ideally help prevent a similar occurrence in the future. Analysis of this event provides an opportunity to identify changes in equipment indicators with a known outcome. A number of data analysis tools were used to characterize the healthy and deteriorated states of the turbocharger system, including various supervised classification as well as semi-supervised and unsupervised anomaly detection techniques. Although this problem posed challenges due to the severely imbalanced class distribution, the supervised classifiers, in particular Support Vector Machine and Random Forest, performed very well in all metrics while the unsupervised anomaly detection models achieved near-perfect accuracy for identifying healthy turbocharger states.

Biography

Dr. Valdés’s academic formation and scientific activity cover two domains: exact sciences (mathematics and computer science), and natural sciences (earth and environmental sciences). He has a PhD in Mathematics and a BS in physics (geophysics).

His topics of interest in the first stream are: data analytics/mining, machine learning, computational intelligence (neural networks, evolutionary computation, fuzzy logic, probabilistic reasoning, rough sets), and hybrid systems including pattern recognition, data visualization and digital image and signal processing. In the field of natural sciences, his topics of interest are: geomathematics, mathematical modeling of natural processes, computational intelligence data mining and analysis of earth, environmental sciences and astronomical data, remote sensing, geophysics, geochemistry, paleoclimate and climate change.

He worked at research institutes and universities in Europe and America. Since 2001 he joined the National Research Council Canada and he is Adjunct Professor at the University of Ottawa and at Carleton University. His record includes 232 publications in books, journals, conference papers and technical reports.



Back to Agenda

Shaun Pilkington, Interset

Cybersecurity: Top 5 class imbalance ML challenges and data sets

Abstract coming soon!

Biography

Shaun Pilkington is a Principal Data Scientist at Interset, a Micro Focus company, a leading cybersecurity and In-Q-Tel portfolio company that uses machine learning and behavioral analytics. Shaun joined as the first data scientist while it was an early start-up and has seen the company grow to be the leader in its space and through its recent acquisition. He holds a dual B.S. in physics and computer science from Rowan University, a masters in computer science from the University of Louisiana (ULL) at Lafayette, and was in a PhD program for cognitive science, also at ULL before joining Interset. He enjoys problems that require novel math and algorithmic approaches to be solved.



Back to Agenda

Daniel Shapiro, Lemay.ai

AuditMap.ai: Hierarchical Sentence Classification in Unstructured Audit Reports

Corporate reports contain sentences that describe risks, controls, and more. In this talk, we will look at the set imbalance between these sentence classes, how imbalances are addressed in neural network training, and a real life example of classified sentences.

Biography

Dr. Shapiro is a Canadian entrepreneur with a PhD in Electrical and Computer Engineering from the University of Ottawa. He has performed various top management and management consulting roles throughout his career. The focus of Daniel’s work has been deep learning artificial intelligence. He has worked on trading algorithms (Investifai), enterprise consulting (Lemay.ai and Stallion.ai), audit automation (AuditMap.ai), and biological neural networks (Nuraleve). Daniel has over 30 peer-reviewed publications and patents.



Back to Agenda



Dušan Sovilj, RANK Software Inc.

Deep Learning techniques for unsupervised anomaly detection

Deep Learning has had success in supervised learning, in the presentation we will discuss various Deep Learning architectures in situations where labelled data might not be present. It’s also important to understand why a RNN made a certain decision, we will also present architectures that can provide more “explainable” AI using Deep Learning.

Biography

Dr. Dušan Sovilj obtained his D.Sc. from Aalto University in Finland. He was a postdoctoral fellow at IIHR−Hydroscience & Engineering at the University of Iowa and at the Mechanical and Industrial Engineering Department at the University of Toronto. He is currently working as a Research Scientist at Rank Software Inc. His main topics of interest are time series prediction and variable selection for regression problems, application of deep machine learning algorithms to anomaly detection, adaptive user-interfaces, and weather forecasting.



Back to Agenda