A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/datawrangling/spatialanalytics below:

datawrangling/spatialanalytics: Where 2.0 Workshop Code: Spatial Analysis of Tweets using Hadoop, Pig, Python & Mechanical Turk. Slides here: http://www.slideshare.net/kevinweil/spatial-analytics-where-20-2010

Spatial Analysis of Twitter Data with Hadoop, Pig, & Mechanical Turk

This repository contains the code examples and supporting material for the Spatial Analytics Workshop held at O’Reilly Where 2.0 on March 30, 2010. Twitter is rolling out enhanced geo features. It will be a while before geo tagged tweets are widely adopted, but there is a lot we can do right now using just profile location string information.

From the Workshop description:

“This workshop will focus on uncovering patterns and generating actionable insights from large datasets using spatial analytics. We will explore combining open government data with other location based information sources like Twitter. Participants will be guided through examples that use Hadoop and Amazon EC2 for scalable processing of location data. We will also cover some basics on spatial statistics, correlations, and trends along with how to visualize and communicate your results with open source tools.”

Spatial distribution of Twitter users based on a Streaming API sample:

Part I: Location Preprocessing & Basic Statistics Setting up our Hadoop cluster
	git clone git://github.com/datawrangling/spatialanalytics.git
	cd spatialanalytics/
	./util/hcon.sh ec2-174-129-153-177.compute-1.amazonaws.com /Users/pskomoroch/id_rsa-gsg-keypair
	skom:spatialanalytics pskomoroch$ ssh hadoop@ec2-174-129-153-177.compute-1.amazonaws.com
	The authenticity of host 'ec2-174-129-153-177.compute-1.amazonaws.com (174.129.153.177)' can't be established.
	RSA key fingerprint is 6d:b1:d6:48:db:37:61:df:b6:04:4a:93:eb:2d:1d:40.
	Are you sure you want to continue connecting (yes/no)? yes
	Warning: Permanently added 'ec2-174-129-153-177.compute-1.amazonaws.com,174.129.153.177' (RSA) to the list of known hosts.
	Linux domU-12-31-39-0F-74-82 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686
	--------------------------------------------------------------------------------

	Welcome to Amazon Elastic MapReduce running Hadoop 0.18.3 and Debian/Lenny.

	Hadoop is installed in /home/hadoop. Log files are in /mnt/var/log/hadoop. Check
	/mnt/var/log/hadoop/steps for diagnosing step failures.

	The Hadoop UI can be accessed via the command: lynx http://localhost:9100/

	--------------------------------------------------------------------------------
	hadoop@domU-12-31-39-0F-74-82:~$ 
	sudo apt-get -y install git-core
	cd /mnt
	git clone git://github.com/datawrangling/spatialanalytics.git
Counting locations with Hadoop Standardize location strings using exact matches in Geonames data Standardize remaining location strings with Mechanical Turk Generate location standardized Tweets Data sanity check: Where are conservative & liberal Twitter users? Part II: Spatial Analysis, Leveraging other data Location Characterization, Classification, & Prediction

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4