Channel: PyData
Category: Science & Technology
Tags: pythonlearn to codeeducationsoftwarepydatalearncodinghow to programjuliaopensourcescientific programmingnumfocuspython 3tutorial
Description: Details Abstract: Datasets are mathematical objects (e.g., point clouds, matrices, graphs, images, field/functions) that have shape. Characterizing the shape of these objects provides powerful insight into datasets but is not always straightforward. Common statistical and signal processing techniques (e.g., moments, Fourier transforms, convolutional filters) can fail to capture important geometric and topological features that quantify the shape of data objects. Topology is a branch of mathematics that provides powerful tools to directly characterize the shape of data objects. One such tool is the so-called Euler characteristic (EC); the EC, originally used for the characterization of polyhedra, is now broadly used in scientific areas such as random fields, cosmology, material science, thermodynamics, and neuroscience. In this talk, we will focus on the EC and its application as a descriptor that characterizes topological features of data. This characterization is accomplished by performing a decomposition of a data object into a set of independent topological bases which is summarized in the form of what is called an EC curve. We briefly discuss the mathematics of the EC and how it can be used to characterize diverse data objects (e.g., graphs and images/fields). We then shift our focus to the application of these concepts to tackle diverse problems arising in science and engineering; in particular, we discuss how the EC can be used in process monitoring (multivariate time series) by analyzing correlation structures. We also apply the EC in the analysis of both 2D spatial and 3D spatial-temporal fields; these data objects are derived from micrographs of liquid crystals and molecular simulations. We will show that the EC effectively reduces complex datasets, and that this reduction facilitates tasks such as visualization, regression, classification, and clustering. We will illustrate these examples through Python and will discuss the various packages and methods that have been developed for the topological analysis of data. Bio: Alexander Smith is a 5th year graduate student working in the lab of Victor Zavala. Prior to pursuing his Ph.D., he worked for 5 years as a senior engineer in both manufacturing and R&D at Eli Lilly and Co. Alex’s research focuses on the development of Topological and Geometrical Data Analysis methods for applications in various engineering and scientific domains. Website: adsmithphd.com pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: github.com/numfocus/YouTubeVideoTimestamps