The rb3
package provides tools for downloading, processing, and analyzing market data from B3 (the Brazilian stock exchange). This vignette will guide you through the basics of using the package to download various types of market data and perform common analyses.
The main function for fetching market data is fetch_marketdata()
. This function downloads data based on a template and parameter combinations, then processes the data into a structured database format.
Templates are predefined configurations that specify the type of data to download and how to process it. Each template corresponds to a specific dataset or file type available from B3. For example:
"b3-cotahist-yearly"
: Downloads and reads COTAHIST file that are available by year."b3-futures-settlement-prices"
: Downloads and reads settlement prices web page."b3-reference-rates"
: Downloads and reades the web page of reference interest rates."b3-bvbg-086"
: Downloads and reads the BVBG-086 file with trading instruments information.# List available templates
list_templates()
#> # A tibble: 9 × 2
#> Template Description
#> <chr> <chr>
#> 1 b3-bvbg-086 Arquivo de Preços de Mercado - BVBG-086
#> 2 b3-cotahist-daily Cotações Históricas do Pregão de Ações - Arq…
#> 3 b3-cotahist-yearly Cotações Históricas do Pregão de Ações - Arq…
#> 4 b3-futures-settlement-prices Preços de Ajustes Diários de Contratos Futur…
#> 5 b3-indexes-composition Composição dos índices da B3
#> 6 b3-indexes-current-portfolio Carteira teórica corrente dos índices da B3 …
#> 7 b3-indexes-historical-data Dados históricos e estatísticas dos índices …
#> 8 b3-indexes-theoretical-portfolio Carteira Teórica dos índices da B3 com pesos…
#> 9 b3-reference-rates Taxas referenciais
Additional information about templates can be obtained by calling the template_retrieve()
function:
# Get a specific template
template_retrieve("b3-cotahist-yearly")
#> Template: b3-cotahist-yearly
#> Description: Cotações Históricas do Pregão de Ações - Arquivo Anual
#> Required arguments:
#> • year: Ano de referência
#> Fields:
#> • regtype (numeric): Tipo de registro
#> • refdate (Date): Data do pregão
#> • bdi_code (numeric): Código BDI
#> • symbol (character): Código de negociação do papel
#> • instrument_market (numeric): Tipo de mercado
#> • corporation_name (character): Nome resumido da empresa emissora do papel
#> • specification_code (character): Especificação do papel
#> • days_to_settlement (numeric): Prazo em dias do mercado a termo
#> • trading_currency (character): Moeda de referência
#> • open (numeric): Preço de abertura do papel
#> • high (numeric): Preço máximo do papel
#> • low (numeric): Preço mínimo do papel
#> • average (numeric): Preço médio do papel
#> • close (numeric): Preço último negócio efetuado com o papel
#> • best_bid (numeric): Preço da melhor oferta de compra do papel
#> • best_ask (numeric): Preço da melhor oferta de venda do papel
#> • trade_quantity (numeric): Número de negócios efetuados com o papel
#> • traded_contracts (numeric): Quantidade total de títulos negociados neste
#> papel
#> • volume (numeric): Volume total de títulos negociados neste papel
#> • strike_price (numeric): Preço de exercício para o mercado de opções ou valor
#> do contrato para o mercado de termo secundário
#> • strike_price_adjustment_indicator (character): Indicador de correção de
#> preços de exercícios ou valores de contrato para os mercados de opções, termo
#> secundário ou futuro
#> • maturity_date (Date): Data do vencimento para os mercados de opções, termo
#> secundário ou futuro
#> • allocation_lot_size (numeric): Fator de cotação do papel
#> • strike_price_in_points (numeric): Preço de exercício em pontos para opções
#> referenciadas em dólar ou valor de contrato em pontos para termo secundário
#> • isin (character): Código do papel no sistema ISIN
#> • distribution_id (numeric): Número de distribuição do papel
Once you know the template you want to use, you can download the data by calling fetch_marketdata()
. The function takes the template name and additional parameters as arguments.
The fetch_marketdata()
function downloads and processes market data based on the specified template and parameters. The data is stored in a local database, which can be queried using specialized functions. The code below shows an example on how to download and process data using the fetch_marketdata()
.
# Download daily historical data for a specific date range
fetch_marketdata("b3-reference-rates",
refdate = bizseq("2024-01-01", "2024-01-31", "Brazil/B3"),
curve_name = c("PRE", "DIC")
)
#> ✔ Downloading data [53s]
#> ℹ 44 files downloaded
#> ✔ Reading data into DB [6s]
The total time taken to download and process the data is shown in the console output, and also the number of downloaded files is shown. This code downloads 44 files containing reference rates for the PRE and DIC curves for January 2024. The files are read and stored as parquet files forming a local database inside the rb3.cachedir
folder.
rb3.cachedir
folder
The rb3.cachedir
folder is where the downloaded data is stored. It is set as an option in R, and you can check its current value using:
getOption("rb3.cachedir")
#> [1] "/home/wilson/dev/rb3/rb3-cache"
You can change the location of the rb3.cachedir
folder by setting the option rb3.cachedir
to a different path.
# Set the rb3.cachedir folder to a different path
options(rb3.cachedir = "/path/to/your/custom/folder")
NoteIt is strongly recommended to set the
rb3.cachedir
folder in the .Rprofile file.
Inside this folder it has the 3 folders:
The folder structure looks like this:
rb3.cachedir
├── raw
└── db
The raw files are initially downloaded and stored in the raw
folder. These files are then processed and saved as parquet files in the db
folder, forming structured datasets that can be queried using the rb3
package functions. The data processing occurs in two stages: first, the raw files are transformed and stored in the input
layer within the db
folder. Next, the data undergoes further refinement and is saved in the staging
layer, also within the db
folder. The dataset cam be accessed using the function rb3::template_dataset()
.
# Get the dataset for the template "b3-reference-rates"
template_dataset("b3-reference-rates")
#> FileSystemDataset with 47 Parquet files
#> 5 columns
#> refdate: date32[day]
#> curve_name: string
#> cur_days: int64
#> col1: double
#> col2: double
This function defaults to the input
layer, but you can specify the layer
argument to access the staging
layer if needed.
# Get the dataset for the template "b3-reference-rates" in the input layer
template_dataset("b3-reference-rates", layer = "staging")
#> FileSystemDataset with 47 Parquet files
#> 7 columns
#> curve_name: string
#> refdate: date32[day]
#> forward_date: date32[day]
#> cur_days: int64
#> biz_days: int64
#> col1: double
#> col2: double
We can observe that the dataset in the input
layer has 5 columns, while the dataset in the staging
layer has 7 columns. The datasets in the staging
layer are enriched with formatted columns and additional data.
In the previous sections we have seen how to download and process data using the fetch_marketdata()
function and how to access the downloaded data using the template_dataset()
function. Each template has custom functions to access the data. These functions have the suffix _get()
.
cotahist_get()
: Retrieves historical stock market data.futures_get()
: Retrieves futures settlement prices.yc_brl_get()
: Retrieves the Brazilian nominal yield curve (PRE).yc_ipca_get()
: Retrieves the Brazilian real interest rate curve (DIC).yc_usd_get()
: Retrieves the FX-linked yield curve (DOC).and many others. For example, to access the data downloaded using the b3-reference-rates
template, you can use the yc_brl_get()
function:
# Get the Brazilian nominal yield curve (PRE)
yc_brl_get() |>
filter(refdate == "2024-01-31") |>
collect()
#> # A tibble: 257 × 7
#> curve_name refdate forward_date cur_days biz_days r_252 r_360
#> <chr> <date> <date> <int> <int> <dbl> <dbl>
#> 1 PRE 2024-01-31 2024-02-01 1 1 0.116 0
#> 2 PRE 2024-01-31 2024-02-07 7 5 0.112 0.115
#> 3 PRE 2024-01-31 2024-02-14 14 8 0.112 0.0906
#> 4 PRE 2024-01-31 2024-02-15 15 9 0.112 0.0953
#> 5 PRE 2024-01-31 2024-02-16 16 10 0.112 0.0994
#> 6 PRE 2024-01-31 2024-02-21 21 13 0.112 0.0983
#> 7 PRE 2024-01-31 2024-02-28 28 18 0.112 0.102
#> 8 PRE 2024-01-31 2024-02-29 29 19 0.112 0.104
#> 9 PRE 2024-01-31 2024-03-01 30 20 0.112 0.106
#> 10 PRE 2024-01-31 2024-03-04 33 21 0.112 0.101
#> # ℹ 247 more rows
The columns r_252
and r_360
have been renamed in the function yc_brl_get()
. This happens because the dataset b3-reference-rates
attends to the three curves PRE, DIC, and DOC, but the columns col1
and col2
have different meanings for each curve. For this reason we strongly recommend using the custom functions to access the data instead of using the template_dataset()
function directly.
The rb3
package provides a comprehensive and efficient framework for accessing, processing, and analyzing market data from B3 (the Brazilian stock exchange). In this vignette, we explored the key functionalities of the package, including:
fetch_marketdata()
function, we demonstrated how to download and process various types of market data, such as reference rates and futures settlement prices.rb3.cachedir
folder, which organizes raw files, metadata, and processed datasets for efficient access and management.template_dataset()
and custom access functions such as yc_brl_get()
and futures_get()
.By combining the power of templates, efficient data storage, and specialized query functions, the rb3
package simplifies the process of working with B3 market data. Whether you are analyzing yield curves, futures prices, or other financial datasets, rb3
provides the tools needed to streamline your workflow and focus on generating insights.
We encourage you to explore the package further and adapt its functionalities to your specific use cases in financial analysis.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4