The term Open Data is generally understood to be data that are made available to the public free of charge, without registration or restrictive licenses, for any purpose whatsoever (including commercial purposes), in electronic, machine-readable formats that ensure data are easy to find, download and use.
Open data initiatives by public institutions, such as governments and intergovernmental organisations, recognise that such data is produced with public funds and so, with few exceptions, should be treated as public goods.
Data reuse, both by data experts and the public at large, is key to creating new opportunities and benefits from government data. Open data reuse requires two basic criteria:
The purpose of this syllabus in Data Wrangling and Validation for Open Data is to guide learners to confidence in delivering technically open data: well-structured, machine-readable data, validated to a defined and standard metadata schema.
Lessons 1 & 2: Data wrangling messy dataLearning outcomes:
Project:
Each participant will be assigned a spreadsheet from training data and expected to restructure it using Python/Whyqd.
Lesson 3: Validating restructured data against a schemaLearning outcomes:
Project:
Using the machine-readable spreadsheet created in Lesson 1, develop a JSON schema, and validate the data using this schema on CSV Lint. Then import whyqd and perform the same task in Python.
Lesson 4: Anonymising personal data prior to publicationLearning outcomes:
Project:
Use a manufactured sample data file containing personal information and redact these data to prevent de-anonymisation.
Lesson 5: Investigate and transform messy data into appropriate charts for presentationLearning outcomes:
Project:
Use population series data on Crude Birth Rate to transform source data for appropriate presentation.
My name is Gavin Chait, and I am an independent data scientist specialising in economic development and data curation. I spent more than a decade in economic and development initiatives in South Africa. I was the commercial lead of open data projects at the Open Knowledge Foundation, leading the open source CKAN development team, and led the implementation of numerous open data technical and research projects around the world. Recently, I have developed openLocal.uk, an initiative to develop a comprehensive business intelligence search engine for entrepreneurs. Data are based on open data and Freedom of Information requests.
I've worked with SBC4D since 2016 on a range of projects spanning from Ghana to Morocco, Tunisia and Ethiopia, to Tanzania and Mauritius. This syllabus was originally developed, and translated into French, for the Des Chiffres et des Jeunes project in Cote d'Ivoire.
I have extensive experience in leading research projects, implementing open source software initiatives, and developing and leading seminars and workshops. I have taught for 25 years, including for undergraduates, adult education, and technical and analytical teaching at all levels.
Course content, materials and approach are copyright Gavin Chait, and released under both the Creative Commons Attribution-ShareAlike 4.0 International and the MIT licences.
The objective is to ensure reuse, and I recommend - but do not require - that any modifications or adaptations of the source material should be released under an equivalent licence.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4