A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/wdm0006/git-pandas below:

wdm0006/git-pandas: A wrapper around gitpython to produce pandas dataframes for analysis

Git-Pandas is a powerful Python library that transforms Git repository data into pandas DataFrames, making it easy to analyze and visualize your codebase's history, contributors, and development patterns. Built on top of GitPython, it provides a simple yet powerful interface for extracting meaningful insights from your Git repositories.

The Repository class provides a wrapper around a single Git repository, offering methods to:

The ProjectDirectory class enables analysis across multiple repositories:

Git-Pandas requires Python 3.8+ and can be installed using pip:

For enhanced functionality, install additional packages:

# For parallel processing
pip install joblib

# For Redis caching
pip install redis

# For visualization
pip install matplotlib seaborn
Basic Repository Analysis
from gitpandas import Repository
from gitpandas.cache import DiskCache

# Create repository with persistent caching
cache = DiskCache('/tmp/git_cache.gz', max_keys=1000)
repo = Repository('/path/to/repo', cache_backend=cache)

# Get commit history with filtering
commits_df = repo.commit_history(
    branch='main',
    limit=1000,
    ignore_globs=['*.pyc', '*.log'],
    include_globs=['*.py', '*.js']
)

# Analyze blame information
blame_df = repo.blame(by='repository')

# Calculate bus factor for entire repository
bus_factor_df = repo.bus_factor(by='repository')

# NEW: Calculate file-wise bus factor
file_bus_factor_df = repo.bus_factor(by='file')
Cache Management (New in v2.5.0)
# Get cache statistics
stats = repo.get_cache_stats()
print(f"Cache usage: {stats['global_cache_stats']['cache_usage_percent']:.1f}%")

# Warm cache for better performance
result = repo.warm_cache(
    methods=['commit_history', 'blame', 'file_detail'],
    limit=100
)
print(f"Created {result['cache_entries_created']} cache entries")

# Invalidate specific cache entries
repo.invalidate_cache(keys=['commit_history'])

# Clear all cache for this repository
repo.invalidate_cache()
Remote Operations (New in v2.5.0)
# Safely fetch changes from remote (read-only)
result = repo.safe_fetch_remote(dry_run=True)
if result['remote_exists'] and result['changes_available']:
    # Actually fetch the changes
    fetch_result = repo.safe_fetch_remote()
    print(f"Fetch status: {fetch_result['message']}")
Multi-Repository Analysis
from gitpandas import ProjectDirectory

# Analyze multiple repositories with shared cache
project = ProjectDirectory('/path/to/projects', cache_backend=cache)

# NEW: Bulk operations across all repositories
result = project.bulk_fetch_and_warm(
    fetch_remote=True,
    warm_cache=True,
    parallel=True,
    cache_methods=['commit_history', 'blame']
)

print(f"Processed {result['repositories_processed']} repositories")
print(f"Cache entries created: {result['summary']['total_cache_entries_created']}")

# Get project-wide cache statistics
cache_stats = project.get_cache_stats()
print(f"Total repositories: {cache_stats['total_repositories']}")
print(f"Cache coverage: {cache_stats['cache_coverage_percent']:.1f}%")
# Core Analysis
repo.commit_history(branch=None, limit=None, days=None, ignore_globs=None, include_globs=None)
repo.file_change_history(branch=None, limit=None, days=None, ignore_globs=None, include_globs=None)
repo.blame(rev="HEAD", committer=True, by="repository", ignore_globs=None, include_globs=None)
repo.bus_factor(by="repository", ignore_globs=None, include_globs=None)  # by="file" for file-wise
repo.punchcard(branch=None, limit=None, days=None, by=None, normalize=None, ignore_globs=None, include_globs=None)

# Repository Information
repo.list_files(rev="HEAD")
repo.has_branch(branch)
repo.is_bare()
repo.has_coverage()
repo.coverage()
repo.get_commit_content(rev, ignore_globs=None, include_globs=None)

# NEW: Remote Operations (v2.5.0)
repo.safe_fetch_remote(remote_name='origin', prune=False, dry_run=False)
repo.warm_cache(methods=None, **kwargs)

# NEW: Cache Management (v2.5.0)
repo.invalidate_cache(keys=None, pattern=None)
repo.get_cache_stats()
# Initialize with multiple repositories
project = ProjectDirectory(
    working_dir='/path/to/project',  # or list of repo paths
    ignore_repos=None,
    verbose=True,
    cache_backend=None,
    default_branch='main'
)

# NEW: Bulk Operations (v2.5.0)
project.bulk_fetch_and_warm(fetch_remote=False, warm_cache=False, parallel=True, **kwargs)
project.invalidate_cache(keys=None, pattern=None, repositories=None)
project.get_cache_stats()
EphemeralCache (In-Memory)
from gitpandas.cache import EphemeralCache

cache = EphemeralCache(max_keys=1000)
repo = Repository('/path/to/repo', cache_backend=cache)
from gitpandas.cache import DiskCache

cache = DiskCache('/path/to/cache.gz', max_keys=500)
repo = Repository('/path/to/repo', cache_backend=cache)
RedisDFCache (Distributed)
from gitpandas.cache import RedisDFCache

cache = RedisDFCache(
    host='localhost',
    port=6379,
    db=12,
    max_keys=1000,
    ttl=3600  # 1 hour expiration
)
repo = Repository('/path/to/repo', cache_backend=cache)

Most analysis methods support these filtering parameters:

For comprehensive documentation, examples, and API reference:

  1. Use caching for any analysis beyond one-off queries
  2. Choose the right cache backend for your use case:
  3. Filter data early using glob patterns and limits
  4. Warm your cache before intensive analysis sessions
  5. Use parallel processing for multiple repositories

We welcome contributions! Please review our Contributing Guidelines for details on:

# Clone the repository
git clone https://github.com/wdm0006/git-pandas.git
cd git-pandas

# Install in development mode
make install-dev

# Run tests
make test

# Run linting and formatting
make lint
make format

This project is BSD licensed (see LICENSE.md)


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4