[90] Intro to OpenRefine for Data Cleaning and Reconciliation (Martin Magdinier)

  Рет қаралды 980

Data Umbrella

Data Umbrella

Күн бұрын

Join our Meetup group:
www.meetup.com/data-umbrella
Resources
- Slides: docs.google.com/presentation/...
- Dataset: open.toronto.ca/dataset/build...
- OpenRefine Discourse forums: forum.openrefine.org/
About the Event
OpenRefine stands as a robust, open-source tool specifically tailored for those delving into the complex world of messy data. It is designed to not only cleanse such data but also to transform it, making it easier to convert between varying formats.
The talk will unfold in three primary segments. The first portion provides a comprehensive introduction to OpenRefine, exploring its purpose, its user base, and its historical evolution. Following this, attendees will embark on a tour of OpenRefine, familiarizing themselves with its download and installation processes, the intricacies of data import, the nuances of filtering and faceting, clustering, as well as vital data cleaning techniques, and the application of reconciliation services. Finally, the session culminates in an invitation to participants to join the OpenRefine community, shedding light on various avenues through which they can contribute - be it through coding, design, translation, documentation enhancement, or user support.
Timestamps
00:00 Data Umbrella introduction
03:35 What is OpenRefine?
05:00 History of OpenRefine (Freebase Gridworks, Google Refine to Open Refine)
08:33 OpenRefine user base
10:42 Project statistics
11:34 Features of OpenRefine
14:00 Contributing to OpenRefine (use, promote, help, translate, fix, create, design)
19:40 begin demo: Example dataset of Toronto building permits)
20:23 Running OpenRefine locally, installation
20:44 Download OpenRefine (openrefine.org/download)
21:45 Demo: reading in the data
24:15 Demo: export data from OpenRefine
24:38 Demo: working with the data
25:30 Demo: Text facet shows summary of different values
26:45 facet / filter
27:17 combine multiple facets
28:10 text filter
28:40 Cluster algorithm to clean text data (Ex: Fingerprint function, etc)
32:54 Cluster algorithm: n-Gram fingerprint
33:30 Cluster algorithm: Cologne phonetic
34:15 Cleaning: working with numerical data
35:20 find and replace: remove commas in number
37:49 working with dates
38:40 doing reconciliations in OpenRefine (merge multiple fields into one field)
41:12 Reconciliation Service: an API
41:32 about the dataset: Bathurst Street from Wiki Foundation
44:00 connect my dataset with Wikipedia data
44:45 Reconciliation service test bench (plus: clean street name data)
47:38 Example: Excel type code for editing data
55:26 Resources list
56:20 Q: In the Reconciliation service API, which API versions are supported by OpenRefine?
About the Speaker
Martin Magdinier is OpenRefine Project Manager and core contributor since 2013.
- GitHub: github.com/OpenRefine/
- X: / openrefine
- LinkedIn: / openrefine
#python #opensource #datascience #dataanalytics

Пікірлер: 1
@pinklemonade8864
@pinklemonade8864 7 ай бұрын
Really helpful, thank you for this!
Clean Your Data: Getting Started with OpenRefine [workshop]
1:23:31
University of Idaho Library Digital Initiatives
Рет қаралды 38 М.
Zoho CRM Tutorial
1:12:42
Zenatta Consulting
Рет қаралды 63 М.
Как бесплатно замутить iphone 15 pro max
00:59
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 8 МЛН
ВОДА В СОЛО
00:20
⚡️КАН АНДРЕЙ⚡️
Рет қаралды 30 МЛН
DEFINITELY NOT HAPPENING ON MY WATCH! 😒
00:12
Laro Benz
Рет қаралды 64 МЛН
How to access an API with Google Apps Script
9:52
saperis
Рет қаралды 60 М.
Get Started with OpenRefine: Explore, Clean, and Transform your Data!
1:20:11
Master Data Cleaning Essentials on Excel in Just 10 Minutes
10:16
Kenji Explains
Рет қаралды 537 М.
[97] A Briefer History of Open Source (Juan Luis Cano Rodríguez)
1:15:33
ChatGPT for Data Analytics: Full Course
3:35:30
Luke Barousse
Рет қаралды 231 М.
Data Cleaning with OpenRefine
26:08
Biodiversity Data Science
Рет қаралды 3,6 М.
Introduction to OpenRefine
54:49
Digital Public Library of America
Рет қаралды 721
Новые iPhone 16 и 16 Pro Max
0:42
Romancev768
Рет қаралды 2,2 МЛН
Лучший браузер!
0:27
Honey Montana
Рет қаралды 492 М.