PIZZA SESSION – VISUALISING DATA USING T-SNE
Visualization techniques are essential tools for every data scientist. Unfortunately, the majority of visualization techniques can only be used to inspect a limited number of variables of interest simultaneously. As a result, these techniques are not suitable for big data that is very high-dimensional.
An effective way to visualize high-dimensional data is to represent each data object by a two-dimensional point in such a way that similar objects are represented by nearby points, and that dissimilar objects are represented by distant points. The resulting two-dimensional points can be visualized in a scatter plot. This leads to a map of the data that reveals the underlying structure of the objects, such as the presence of clusters.
We present a new technique to embed high-dimensional objects in a two-dimensional map, called t-Distributed Stochastic Neighbor Embedding (t-SNE), that produces substantially better results than alternative techniques. We demonstrate the value of t-SNE in domains such as computer vision and bioinformatics. In addition, we show how to scale up t-SNE to Big Data sets with millions of objects, and we present an approach to visualize objects of which the similarities are non-metric (such as semantic similarities).
This talk describes joint work with Geoffrey Hinton (Google / University of Toronto).
Laurens van der Maaten is currently an Assistant Professor in the Intelligent Systems department of Delft University of Technology. Previously, he worked as a post-doctoral scholar at University of California San Diego, and as a (visiting) PhD student at Maastricht University, Tilburg University, and University of Toronto. In February 2015, he will join Facebook AI Research in New York as a research scientist. His research interests include dimensionality reduction, embedding, metric learning, generative models, deep learning, time series modeling, structured prediction, regularization, face recognition, and object tracking. For his work on dimensionality reduction (t-SNE), he received the SNN Machine Learning Award 2012.
This pizza session will be held on Wednesday November 5th at the TMC Utrecht office. (18.00h)
Interested in joining us? Please inform about the possibilities by sending an email to Christel Schel before Wednesday October 29th.
See you then!