AI, and machine learning in particular, is enjoying its golden age. Machine learning has changed the face of the world over the past two decades but we are still a long way from achieving a general artificial intelligence. In this talk, I will discuss a couple of elements that I believe are missing from common practice in machine learning, including incorporating causality and creating a new framework for unsupervised learning.
Ali Ghodsi is a Professor in the Department of Statistics and Actuarial Science at the University of Waterloo. His research involves statistical machine-learning methods. Ghodsi's research spans a variety of areas in computational statistics. He studies theoretical frameworks and develops new machine learning algorithms for analyzing large-scale data sets, with applications to bioinformatics, data mining, pattern recognition, robotics, computer vision, and sequential decision making.
In this talk, we will discuss two research problems in authorship and malware analyses with imbalance characteristics in the data and a new project in neuroscience. (1) Given an anonymous e-mail or some tweets, can we identify the author or infer the author's characteristics based on his/her writing styles? I will present a representation learning method for authorship analysis and give a live software demonstration. (2) Assembly code analysis is one of the critical processes for mitigating the exponentially increasing threats from malicious software. However, it is a manually intensive and time-consuming process even for experienced reverse engineers. An effective and efficient assembly code clone search engine can greatly reduce the effort of this process. We have implemented an award-winning assembly clone search engine called Kam1n0. It is the first clone search engine that can efficiently identify a given query assembly function's subgraph clones from a large assembly code repository. I will give a live demonstration of Kam1n0. (3) Our team has recently started a new collaborative research project with a neuroscience team at McGill to study stress and memory. We do not have any significant result yet, but the research problem itself is interesting. I would like to share it with the goal of initiating students' interests in machine learning.
Prof. Benjamin Fung is a Canada Research Chair in Data Mining for Cybersecurity, an Associate Professor of School of Information Studies (SIS) at McGill University, a Co-curator of Cybersecurity in the World Economic Forum (WEF), and an Associate Editor of IEEE Transactions on Knowledge and Data Engineering (TKDE). He received a Ph.D. degree in computing science from Simon Fraser University in 2007. Collaborating closely with the national defense, law enforcement, transportation, and healthcare sectors, he has published over 120 refereed articles that span across the research forums of data mining, machine learning, privacy protection, cyber forensics, services computing, and building engineering with over 8000 citations. His data mining works in crime investigation and authorship analysis have been reported by media worldwide, including New York Times, BBC, and CBC. Fung is an Associate Member of MILA and a licensed professional engineer in software engineering. His research website is located here
Having access to high volume and variety of data is the key to success for many learning-based data processing techniques. However, in the field of medical information processing, data sharing has been a challenge mainly because of privacy concerns. To address this issue, researchers have been focusing on various methods of privacy-preserving data sharing approaches. Specifically, principles of differential privacy were used for data de-identification or generating synthetic data with similar attributes to original sensitive data. Although differential privacy is a well-established mathematical method, it has some limitations due to the trade-off between protection and utility of the de-identified data. Another solution to protect personal information while benefiting from it, is learning privacy-preserved representations of sensitive text and sharing the learned latent representations rather than sharing the sensitive text itself.
This talk briefly touches on various methods used for ethical and responsible sharing of sensitive data and explains the concept of privacy-preserving sharing of text representations in more details. The talk specifically explains how this idea can be used to augment low resource natural language processing tasks such as medical text processing.
Isar Nejadgholi is a research officer at National Research Council Canada (NRC). She received her PhD in biomedical engineering in 2012. In her MSc and PhD research, she designed neural network models and used them to solve speech-to-text and computer vision problems. In her postdoctoral studies, she applied machine learning methods to process biomedical signals in a variety of applications. She published the results of her research in more than 20 peer- reviewed journal and conference papers.
After her postdoctoral studies, Isar joined the industry where she led a team of machine learning researchers. They worked on the design and implementation of machine learning solutions for processing a variety of data types, with applications in legal, health, retail, and auditing businesses. Isar has recently joined text analytics team at NRC where the focus of her work is addressing some of the social aspects of AI applications, specifically fairness and privacy issues.
At MindBridge, we help to uncover irregularities in financial data and whether caused by fraud or human error. Fraud is, and one hopes, a rare circumstance thus when building models to detect or predict fraud we are typically dealing with a significant class imbalance. In this presentation we will cover the types of issue that we have to unwrap and understand before we can predict fraud but also diving deeper in strategies which help us identify the sources of weak signals which can significantly improve our predictive outcomes.
Robin currently works at MindBridge where he leads the development of a next generation AI Auditor which helps professionals detect and prevent financial anomalies including fraud. Robin has a track record as an entrepreneur having founded successful software start- ups. He joined Cognos and subsequently IBM through acquisitions. In 2012 he was appointed to be an IBM Distinguished Engineer. At IBM, he was a part of the Watson Group where he served as technical lead and chief architect of IBM Watson Analytics. Robin holds many patents in the areas of analytics, data processing and security. MindBridge Ai is a venture-backed FinTech company based in Ottawa, Canada. Through the application of machine learning and artificial intelligence technologies, the MindBridge platform detects anomalous patterns of activities, unintentional errors and intentional misstatements. Using MindBridge Ai Auditor, organizations across multiple industries can minimize financial loss, reduce corporate liability and can focus on providing higher value services to their clients.
This talk will focus on how to train Natural Language Processing (NLP) models in the presence of class imbalance. Unique methods available to NLP datasets will be explored in addition to general methods that could be applied to mitigate problems arising from imbalanced datasets. A real life example of a sentence classification task will be explored in this talk.
Isuru’s work is primarily in Natural Language Understanding (NLU) and real-time video processing, and he is the Lead Machine Learning Engineer at IMRSV Data Labs. He utilizes his strong research background to identify and build new AI-based systems to solve customer problems. His ability to quickly implement new solutions driven by customer need is extremely valuable for IMRSV and their customers. Isuru has over 6 years experience in designing and implementing machine learning systems.
In numerous application domains such as weather monitoring, sensor networks, web logs and video surveillance, data are generated as streams. The distribution of instances of the classes in such streams may be significantly skewed and the number of class labels is often numerous. The task of learning in such multi-class imbalanced settings, where instances of some classes occur much more frequently than others, is challenging. In an evolving stream, this difficulty is further aggravated, due to the temporal and often interleaved rates of data arrival. In addition, streams are often susceptible to changes in data distributions, or so-called concept drifts, to be handled during learning. This talk highlights our current research in this area and outlines future research directions.
Herna Viktor is a professor at the School of Electrical Engineering and Computer Science (EECS) at the University of Ottawa (uOttawa) and the director of the Applied Artificial Intelligence in Computer Science program at uOttawa. Dr. Viktor’s research focuses on data-driven discovery, with an emphasis on advanced machine learning algorithms to extract deep semantic meaning from data. She is the author of more than 150 journal articles, conference papers and book chapters. The end results of her research have been applied across numerous and diverse domains. Some of these are: the study of anemia pediatric patients, in collaboration with the Hospital for Sick Children in Toronto; exploring the media discourse regarding the Alberta oil sands debate; sentiment analysis for opinion mining in elections; and, most recently, a study of the oral and dental health of Canadians.
Radar-based sensors for daily activities monitoring provide attractive advantages compared with other technologies, particularly in terms of privacy preservation and non-cooperative monitoring capabilities. More specifically, ambient sensors, especially vision-based sensors, can raise sensitive issues in terms of the confidentiality of the data and privacy of the patients, which may not be an issue for wearable sensors. Wearable sensors, however, require users' cooperation and compliance to be worn or carried, which could be potentially problematic and uncomfortable. Radar sensors can address these issues as no images of the monitored people are collected. Furthermore, there is also an element of non-stigmatizing the subjects to be monitored and their specific needs, as with this technology, there is no need to alter one's usual behavior because of the introduction of the sensor at home, or to wear unusual devices. All these aspects can help address some of the key users' acceptance issues highlighted for wearables, smart phones, and video-cameras, making radar interesting technology to be more appealing in the assisted living context.
Hamidreza Sadreazami received a Ph.D. degree in Electrical Engineering from Concordia University, Montreal, Canada, in 2016. He is currently a Postdoctoral Fellow at the Bio-engineering department, McGill University, Montreal, Canada. His research interest includes biomedical signal processing, machine learning and statistical modeling. He was nominated for 2017 CAGS/UMI Distinguished Dissertation Award at Concordia University and 2017 Prix d’excellence de l’Association des doyens des tudes suprieures auQubec (ADESAQ). He is the Chair of IEEE Montreal Industry Relations Committee.
This work analyses sensor data related to a diesel engine system and speciﬁcally its turbocharger subsystem. An incident where the turbocharger seized was recorded by the dozens of standard turbocharger-related sensors. By training models to distinguish between normal healthy operating conditions and deteriorated conditions, there is an opportunity to develop prognostic and predictive tools to ideally help prevent a similar occurrence in the future. Analysis of this event provides an opportunity to identify changes in equipment indicators with a known outcome. A number of data analysis tools were used to characterize the healthy and deteriorated states of the turbocharger system, including various supervised classification as well as semi-supervised and unsupervised anomaly detection techniques. Although this problem posed challenges due to the severely imbalanced class distribution, the supervised classifiers, in particular Support Vector Machine and Random Forest, performed very well in all metrics while the unsupervised anomaly detection models achieved near-perfect accuracy for identifying healthy turbocharger states.
Dr. Valdés’s academic formation and scientific activity cover two domains: exact sciences (mathematics and computer science), and natural sciences (earth and environmental sciences). He has a PhD in Mathematics and a BS in physics (geophysics).
His topics of interest in the first stream are: data analytics/mining, machine learning, computational intelligence (neural networks, evolutionary computation, fuzzy logic, probabilistic reasoning, rough sets), and hybrid systems including pattern recognition, data visualization and digital image and signal processing. In the field of natural sciences, his topics of interest are: geomathematics, mathematical modeling of natural processes, computational intelligence data mining and analysis of earth, environmental sciences and astronomical data, remote sensing, geophysics, geochemistry, paleoclimate and climate change.
He worked at research institutes and universities in Europe and America. Since 2001 he joined the National Research Council Canada and he is Adjunct Professor at the University of Ottawa and at Carleton University. His record includes 232 publications in books, journals, conference papers and technical reports.
Abstract coming soon!
Shaun Pilkington is a Principal Data Scientist at Interset, a Micro Focus company, a leading cybersecurity and In-Q-Tel portfolio company that uses machine learning and behavioral analytics. Shaun joined as the first data scientist while it was an early start-up and has seen the company grow to be the leader in its space and through its recent acquisition. He holds a dual B.S. in physics and computer science from Rowan University, a masters in computer science from the University of Louisiana (ULL) at Lafayette, and was in a PhD program for cognitive science, also at ULL before joining Interset. He enjoys problems that require novel math and algorithmic approaches to be solved.
Corporate reports contain sentences that describe risks, controls, and more. In this talk, we will look at the set imbalance between these sentence classes, how imbalances are addressed in neural network training, and a real life example of classified sentences.
Dr. Shapiro is a Canadian entrepreneur with a PhD in Electrical and Computer Engineering from the University of Ottawa. He has performed various top management and management consulting roles throughout his career. The focus of Daniel’s work has been deep learning artificial intelligence. He has worked on trading algorithms (Investifai), enterprise consulting (Lemay.ai and Stallion.ai), audit automation (AuditMap.ai), and biological neural networks (Nuraleve). Daniel has over 30 peer-reviewed publications and patents.
Deep Learning has had success in supervised learning, in the presentation we will discuss various Deep Learning architectures in situations where labelled data might not be present. It’s also important to understand why a RNN made a certain decision, we will also present architectures that can provide more “explainable” AI using Deep Learning.
Dr. Dušan Sovilj obtained his D.Sc. from Aalto University in Finland. He was a postdoctoral fellow at IIHR−Hydroscience & Engineering at the University of Iowa and at the Mechanical and Industrial Engineering Department at the University of Toronto. He is currently working as a Research Scientist at Rank Software Inc. His main topics of interest are time series prediction and variable selection for regression problems, application of deep machine learning algorithms to anomaly detection, adaptive user-interfaces, and weather forecasting.