A Python-based web scraper that collects GitHub developer information, their followers, and repository details using Selenium and stores the data in a MySQL database.
github-toolkit/
├── config/
│ └── settings.py # Configuration and environment variables
├── core/
│ ├── entities.py # Domain entities
│ └── exceptions.py # Custom exceptions
├── infrastructure/
│ ├── database/ # Database-related code
│ │ ├── connection.py
│ │ └── models.py
│ └── auth/ # Authentication service
│ └── auth_service.py
├── services/
│ └── scraping/ # Scraping services
│ ├── github_developer_scraper.py
│ └── github_repo_scraper.py
├── utils/
│ └── helpers.py # Utility functions
├── controllers/
│ └── github_scraper_controller.py # Main controller
├── main.py # Entry point
└── README.md
git clone git@github.com:trinhminhtriet/github-toolkit.git cd github-toolkit
python3 -m venv .venv source ~/.venv/bin/activate
pip install -r requirements.txt
.env
file in the root directory with the following variables:GITHUB_USERNAME=your_username
GITHUB_PASSWORD=your_password
DB_USERNAME=your_db_username
DB_PASSWORD=your_db_password
DB_HOST=your_db_host
DB_NAME=your_db_name
config
directory:Create a requirements.txt
file with:
selenium
sqlalchemy
python-dotenv
Run the scraper:
The scraper will:
config/settings.py
to change:
LANGUAGES
: List of programming languages to scrapeUSE_COOKIE
: Toggle between cookie-based and credential-based authenticationgit checkout -b feature/your-feature
).git commit -m "Add your feature"
).git push origin feature/your-feature
).This project is licensed under the MIT License - see the LICENSE file for details (create one if needed).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4