IPython Notebook(s) demonstrating deep learning functionality.
Additional TensorFlow tutorials:
IPython Notebook(s) demonstrating scikit-learn functionality.
Notebook Description intro Intro notebook to scikit-learn. Scikit-learn adds Python support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. knn Implement k-nearest neighbors in scikit-learn. linear-reg Implement linear regression in scikit-learn. svm Implement support vector machine classifiers with and without kernels in scikit-learn. random-forest Implement random forest classifiers and regressors in scikit-learn. k-means Implement k-means clustering in scikit-learn. pca Implement principal component analysis in scikit-learn. gmm Implement Gaussian mixture models in scikit-learn. validation Implement validation and model selection in scikit-learn. statistical-inference-scipyIPython Notebook(s) demonstrating statistical inference with SciPy functionality.
Notebook Description scipy SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data. effect-size Explore statistics that quantify effect size by analyzing the difference in height between men and women. Uses data from the Behavioral Risk Factor Surveillance System (BRFSS) to estimate the mean and standard deviation of height for adult women and men in the United States. sampling Explore random sampling by analyzing the average weight of men and women in the United States using BRFSS data. hypothesis Explore hypothesis testing by analyzing the difference of first-born babies compared with others.IPython Notebook(s) demonstrating pandas functionality.
IPython Notebook(s) demonstrating matplotlib functionality.
IPython Notebook(s) demonstrating NumPy functionality.
IPython Notebook(s) demonstrating Python functionality geared towards data analysis.
Notebook Description data structures Learn Python basics with tuples, lists, dicts, sets. data structure utilities Learn Python operations such as slice, range, xrange, bisect, sort, sorted, reversed, enumerate, zip, list comprehensions. functions Learn about more advanced Python features: Functions as objects, lambda functions, closures, *args, **kwargs currying, generators, generator expressions, itertools. datetime Learn how to work with Python dates and times: datetime, strftime, strptime, timedelta. logging Learn about Python logging with RotatingFileHandler and TimedRotatingFileHandler. pdb Learn how to debug in Python with the interactive source code debugger. unit tests Learn how to test in Python with Nose unit tests. kaggle-and-business-analysesIPython Notebook(s) used in kaggle competitions and business analyses.
Notebook Description titanic Predict survival on the Titanic. Learn data cleaning, exploratory data analysis, and machine learning. churn-analysis Predict customer churn. Exercise logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Includes discussions of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.IPython Notebook(s) demonstrating spark and HDFS functionality.
Notebook Description spark In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. hdfs Reliably stores very large files across machines in a large cluster.IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality.
Notebook Description mapreduce-python Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and mrjob config file to analyze Amazon S3 bucket logs on Elastic MapReduce. Disco is another python-based alternative.IPython Notebook(s) demonstrating Amazon Web Services (AWS) and AWS tools functionality.
Also check out:
IPython Notebook(s) demonstrating various command lines for Linux, Git, etc.
Notebook Description linux Unix-like and mostly POSIX-compliant computer operating system. Disk usage, splitting files, grep, sed, curl, viewing running processes, terminal syntax highlighting, and Vim. anaconda Distribution of the Python programming language for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. ipython notebook Web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document. git Distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. ruby Used to interact with the AWS command line and for Jekyll, a blog framework that can be hosted on GitHub Pages. jekyll Simple, blog-aware, static site generator for personal, project, or organization sites. Renders Markdown or Textile and Liquid templates, and produces a complete, static website ready to be served by Apache HTTP Server, Nginx or another web server. pelican Python-based alternative to Jekyll. django High-level Python Web framework that encourages rapid development and clean, pragmatic design. It can be useful to share reports/analyses and for blogging. Lighter-weight alternatives include Pyramid, Flask, Tornado, and Bottle.IPython Notebook(s) demonstrating miscellaneous functionality.
Notebook Description regex Regular expression cheat sheet useful in data wrangling. algorithmia Algorithmia is a marketplace for algorithms. This notebook showcases 4 different algorithms: Face Detection, Content Summarizer, Latent Dirichlet Allocation and Optical Character Recognition.Anaconda is a free distribution of the Python programming language for large-scale data processing, predictive analytics, and scientific computing that aims to simplify package management and deployment.
Follow instructions to install Anaconda or the more lightweight miniconda.
For detailed instructions, scripts, and tools to set up your development environment for data analysis, check out the dev-setup repo.
To view interactive content or to modify elements within the IPython notebooks, you must first clone or download the repository then run the notebook. More information on IPython Notebooks can be found here.
$ git clone https://github.com/donnemartin/data-science-ipython-notebooks.git
$ cd data-science-ipython-notebooks
$ jupyter notebook
Notebooks tested with Python 2.7.x.
Contributions are welcome! For bug reports or requests please submit an issue.
Feel free to contact me to discuss any issues, questions, or comments.
This repository contains a variety of content; some developed by Donne Martin, and some from third-parties. The third-party content is distributed under the license provided by those parties.
The content developed by Donne Martin is distributed under the following license:
I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).
Copyright 2015 Donne Martin
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4