RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/pgmpy/pgmpy/wiki/GSoC-2016 below:

GSoC 2016 · pgmpy/pgmpy Wiki · GitHub

###Introduction### A graphical model or probabilistic graphical model (PGM) is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are most commonly used in probability theory, statistics (particularly Bayesian statistics) and machine learning.

pgmpy is a Python library to implement Probabilistic Graphical Models and related inference and learning algorithms. Our main focus is on providing a consistent API and flexible approach to its implementation. This is the second year pgmpy is participating in GSoC.

###Want to get involved?### If you're interested in participating in GSoC 2016 as a student, mentor, or community member, you should join the pgmpy's [mailing list] (https://groups.google.com/forum/#!forum/pgmpy) and post any questions, comments, etc. to pgmpy@googlegroups.com

Additionally, you can find us on gitter at gitter/pgmpy. If no one is available to answer your question, please be patient and post it to the mailing list as well.

###Getting Started###

Install dependencies: $ sudo pip3 install -r requirements.txt
Clone the repo: $ git clone https://github.com/pgmpy/pgmpy
Install pgmpy:

$ cd pgmpy/
$ sudo python3 setup.py install

For understanding about how to contribute , look at our contribution guide.

#####References for PGM: Notebooks for basic introduction of PGM and pgmpy: https://github.com/pgmpy/pgmpy_notebook
Quick intro to Bayesian Networks: http://people.cs.ubc.ca/~murphyk/Bayes/bnintro.html
Reference book for PGM: Probabilistic Graphical Models - Principles and Techniques

Student should start by reading the guidelines for participation. Google also provide guidelines to help with writing a proposal as a part of their GSoC Student Guide.

Note that pgmpy participates as a sub-organization of Python Software Foundation, it would be good if all the mentors and student working abide by the Python Code of Conduct

##Ideas

####1. Support for Continuous Variables Currently pgmpy supports probability distributions based on discrete random variables only. But most of the problems in the real world involve continuous random variable. In graphical models we generally work with continuous variable by either sampling from it or discretizing it. Having support for continuous variables will also allow us to work with deterministic and Gaussian nodes.

Expected Outcome: Support for Continuous node, Deterministic nodes and Conditional Gaussian nodes.
Difficulty Level: Difficult
PGM Knowledge Required: Very good understanding of Probability Theory and Graphical Models.
Skills Required: Intermediate Python
Potential Mentor(s): Abinash Panda, Ankur Ankan

####2. Cutset Conditioning method
Currently pgmpy has clique tree propagation algorithm which works by aggregation on nodes. Cutset conditioning method is based on decomposition of a subset network nodes. [1] presents a dynamic conditioning algorithm which is a refinement of cutset conditioning and B-conditioning which is an algorithm for approximate inference. We would like to implement both of these algorithms in pgmpy.

References:

[1] Darwiche, A., Conditioning algorithms for exact and approximate inference, in Proceedings of the 11th Conferene

[2] Pearl, J., A constraint-propagation approach to probabilistic reasoning, in Uncertainty and Articial Intel ligence (L. N. Kanal and J. F. Lemmer, Eds.), Elsevier, New York, 357{369, 1986}.

[3] Peot, M. A., and Shachter, R. D., Fusion and propagation with multiple observations in belief networks, Artif. Intel l., 48(3), 299{318, 1991}.

Expected Outcome: Implementation of cutset conditioning method.
Difficulty Level: Medium
PGM Knowledge Required: Very good understanding of Graphical Models.
Skills Required: Intermediate Python
Potential Mentor(s): Rajesh Soni, Pranjal Mittal

####3. Hamiltonian Monte Carlo and No-U-Turn Sampler Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. The No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps.

References:

[1] The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo: http://arxiv.org/pdf/1111.4246v1.pdf

[2] MCMC using Hamiltonian dynamics http://arxiv.org/pdf/1206.1901.pdf

Expected Outcome: Implementation of HMC and NUTS Sampler
Difficulty Level: Moderate
PGM Knowledge Required: Very good knowledge of Graphical Models and sampling methods
Skills Required: Intermediate Python
Potential Mentor(s): Ankur Ankan, Pranjal Mittal

####4. Representing Factors as Algebraic Decision Diagrams (ADDs)

Accordingly to [1], ADD representation of factors is interesting because the ADD size can be exponentially smaller and never worse than the size of a tabular representation of the same factor, and ADD operations can be implemented efficiently, namely, for two factors with n and m variables, the product can be computes in O(nm) time, summing out in O(n2), and reducing in O(n).

References:

[1] Darwiche, A. (2009). Modeling and Reasoning with Bayesian Networks (1st ed.). Cambridge University Press.

Expected Outcome:
Difficulty Level: Easy
PGM Knowledge Required: Basic understanding of Graphical Models
Skills Required: Intermediate Python
Potential Mentor(s): Jhonatan Oliveira, Rajesh Soni

####5. Structure Learning for Bayesian models from complete data

Currently pgmpy does not have methods to select models based on data. Support for structure learning would enable pgmpy to perform inference and sampling tasks, starting from a data set alone. As a starting point, structure learning could be implemented for Bayesian networks and complete data (no missing fields, no hidden variables). Two general approaches for this setting are constraint-based and score-based structure learning. The constraint-based technique consists in identifying independencies from data using hypothesis tests, and constructing a model from those independencies. For the score-based approach, one defines a scoring function that rates how well a model "fits" to the data set and then optimizes that score on the search space of possible models. Modern algorithms, such as MMHC, combine both methods.

References:

[1] Koller and Fridman (2009). Probabilistic Graphical Models: Principles and Techniques. Chapter 18
[2] Tsamardinos et al (2006). The MMHC Bayesian network structure learning algorithm

Expected Outcome: Support for score-based structure learning and constraint-based structure learning, Implementation of the MMHC structure learning algorithm.
Difficulty Level: Difficult
PGM Knowledge Required: Very good understanding of Probability Theory and Graphical Models.
Skills Required: Intermediate Python
Potential Mentor(s): Gregory Wheeler, Ankur Ankan

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4