This project is responsible for consuming data from an API and integrating it with Google BigQuery. It includes functionalities for data extraction, processing, and incremental loading, using Python and libraries such as requests, pandas, and google-cloud-bigquery.
- Python 3.8+
- Google Cloud Platform (GCP) account
google-cloud-bigquerylibraryrequestslibrarypandaslibrarypython-dotenvlibrary- GCP JSON key file for authentication
-
Clone the repository:
git clone git@github.com:ricardodev10/BetFullX.git cd <repository-name>
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
Create a.envfile in the root directory with the following variables:API_KEY=<your-api-key> API_URL=https://v3.football.api-sports.io PROJECT_ID=<gcp-project-id> DATASET_NAME=<dataset-name> FULL_LOAD_DATE=2023-01-01 LEAGUE=2 SEASON=2023
-
Set up the GCP service key:
Place the JSON key file in thekey/directory and adjust the path in the code:key_path = "./key/your-service-key-file.json"
main.py: Main file for executing the project.requirements.txt: Project dependencies list.key/: Directory for the GCP JSON key file..env: File for environment variables.
- Data is extracted from configured endpoints, including:
- Past fixtures (
past_fixtures) - Future fixtures (
future_fixtures) - Player data (
players)
- Past fixtures (
- Uses
pandasfor normalizing and processing JSON data. - Supports nested fields and repeated entries.
- Checks for the existence of datasets and tables before loading.
- Supports different write modes:
WRITE_APPENDWRITE_TRUNCATE
- Updates based on the last recorded load date.
- Logs updates in BigQuery.
To run the project, use the following command:
python main.pyIf you have any feedback, please send it to me at ricardodev10@yahoo.com
Made with ♥ by Ricardo Junior 👋
Learning is continuous and there will always be a next level.
⬆️ Back to top
