Intro
Most of the functions in this package require a path to a
config.yml
file as input. This structure allows for an
easily reviewed file that contains all relevant parameters that should
be / were used for a given multi-loanbook analysis. Below is a full
documentation of each option.
Preface
The config file is separated into a few top-level “sections” that contain contextually similar options. The top-level sections will be documented as well below, but note that the top-level sections themselves never have a value directly associated with them.
Also note that the config file must have the top-level section
default
. This is related to a feature of the
yaml
package which facilitates having and targeting
different config sets for different purposes. Technically, one could
leverage this for use with pacta.multi.loanbook, but it is not
recommended.
Options
directories:
The directories
section contains options to define locally accessible paths where input
and output data should be found or saved. A full example directories
section might look like:
directories:
dir_input: "~/Desktop/test/input"
dir_prepared_abcd: "~/Desktop/test/prepared_abcd"
dir_matched_loanbooks: "~/Desktop/test/matched_loanbooks"
dir_prioritized_loanbooks_and_diagnostics: "~/Desktop/test/prioritized_loanbooks_and_diagnostics"
dir_analysis: "~/Desktop/test/analysis"
dir_input
dir_input
is a
path to a directory that contains all input data to be used. Input data
is any data set that must be produced or obtained externally by the user
and that is not the output of any of the functions in this package. This
includes files only needed optionally. It must be a single
string/character value, and it must refer to a valid, accessible, local
directory. As an example:
dir_prepared_abcd
dir_prepared_abcd
is a path to a directory where the outputs of the function
prepare_abcd()
should be saved. It must be a single
string/character value, and it must refer to a valid, accessible, local
directory. As an example:
dir_matched_loanbooks
dir_matched_loanbooks
is a path to a directory where the outputs of the function
match_loanbooks()
should be saved. It must be a single
string/character value, and it must refer to a valid, accessible, local
directory. As an example:
dir_prioritized_loanbooks_and_diagnostics
dir_prioritized_loanbooks_and_diagnostics
is a path to a directory where the outputs of the function
prioritise_and_diagnose()
should be saved. It must be a
single string/character value, and it must refer to a valid, accessible,
local directory. As an example:
dir_analysis
dir_analysis
is a
path to a directory where the outputs of the function
analyse()
should be saved. It must be a single
string/character value, and it must refer to a valid, accessible, local
directory. As an example:
file_names:
The file_names
section contains options to define the file names of locally accessible
files found in the directories defined in the directories
section. The directories and file names are defined separately to allow
for flexibility in where and how your input and output files are stored.
A full example file_names
section might look like:
file_names:
filename_scenario_tms: "scenarios_2022_p4b.csv"
filename_scenario_sda: "scenarios_2022_ei_p4b.csv"
filename_abcd: "2023-02-17_AI_RMI_PACTA for Banks Free dataset_EO_2022Q4.xlsx"
sheet_abcd: "Company Indicators - PACTA Comp"
filename_scenario_tms
filename_scenario_tms
is the filename of the file that contains production based scenario
data. The file specified by filename_scenario_tms
must exist in the directory specified by the dir_input
parameter. It must be a single string/character value, and it must refer
to a valid, accessible, local file. As an example:
filename_scenario_sda
filename_scenario_sda
is the filename of the file that contains emission intensity based
scenario data. The file specified by filename_scenario_sda
must exist in the directory specified by the dir_input
parameter. It must be a single string/character value, and it must refer
to a valid, accessible, local file. As an example:
filename_abcd
filename_abcd
is
the filename of the file in the directory defined by dir_input
that
contains asset based company data, including production values and
physical emission intensity values. It must be a single string/character
value, and it must refer to a valid, accessible, local file. As an
example:
sheet_abcd
sheet_abcd
is the
name of the sheet that contains asset based company data in the file
defined by filename_abcd
and
stored in the directory defined by dir_input
. It
must be a single string/character value, and it must refer to a valid,
accessible, sheet name in the appropriate file. As an example:
project_parameters:
A full example project_parameters
section might look like:
project_parameters:
scenario_source: "weo_2022"
scenario_select: "nze_2050"
region_select: "global"
start_year: 2022
time_frame: 5
by_group: "group_id"
scenario_source
scenario_source
is an identifier of the scenario source to be used. It must be a single
string/character value, and it must refer to a valid, accessible,
scenario source identifier contained in the scenario data file/s defined
by filename_scenario_tms
and filename_scenario_sda
.
Valid values typically look like "weo_2023"
or "geco_2022"
.
As an example:
scenario_select
scenario_select
is an identifier of the scenario to be used. It must be a single
string/character value, and it must refer to a valid, accessible,
scenario identifier corresponding to the scenario_source
and contained in the scenario data file/s defined by filename_scenario_tms
and filename_scenario_sda
.
Valid values typically look like "nze_2050"
,
"aps"
or "steps"
.
As an example:
region_select
region_select
is
an identifier of the region to be used. It must be a single
string/character value, and it must refer to a valid, accessible, region
identifier contained in the r2dii.data::region_isos
dataset where it must be listed as a region available for the
scenario_source
. Valid values typically look like "global"
or "advanced economies"
.
As an example:
start_year
start_year
is the
start year of the analysis. Normally, the start year should correspond
with year of the publication of the scenario in use. It must be a single
numeric value, and it must refer to a valid, accessible, year contained
in the scenario data file/s defined by filename_scenario_tms
and filename_scenario_sda
.
Valid values typically look like 2022
or 2023
(note that
this value should not be wrapped in quotes). As an example:
time_frame
time_frame
is the
number of years (starting from the start_year
) that
the analysis covers, defining the time frame. It must be a single
numeric value, and it must define a valid, accessible, time frame
covered by the scenario data file/s defined by filename_scenario_tms
and filename_scenario_sda
.
Valid values typically look like 5
or 6
(note that this
value should not be wrapped in quotes). As an example:
by_group
by_group
allows specifying the level of disaggregation to be used in the
analysis. It determines the variable along which the loan books are
grouped and thus the dimension by which to compare the PACTA
calculations. For example, one may want to calculate system-wide results
without disaggregation, using NULL
or one may
want to analyse alignment along bank specific traits, such as "group_id"
or "bank_type"
.
It can be NULL
or a
character vector of length 1. If it is not NULL
, the
indicated name must be a variable that is provided in the input loan
books and it must be complete ("group_id"
is automatically
created when reading in the loan books, so the user does not have to add
it to the raw loan books). If the provided character string is "NULL"
,
it will be treated as NULL
. As an
example:
sector_split:
A full example sector_split
section might look like:
sector_split:
apply_sector_split: TRUE
filename_split_company_id: "split_company_ids.csv"
filename_advanced_company_indicators: "2024-02-14_AI_2023Q4_RMI-Company-Indicators.xlsx"
sheet_advanced_company_indicators: "Company Activities"
apply_sector_split
apply_sector_split
It must be a single logical value (either TRUE
or FALSE
). As an
example:
filename_split_company_id
filename_split_company_id
is the filename of the CSV file that contains the split company ID data.
The file specified by filename_split_company_id
must exist in the directory specified by the dir_input
parameter. It must be a single string/character value, and it must refer
to a valid, accessible, local file. As an example:
filename_advanced_company_indicators
filename_advanced_company_indicators
is the filename of the XLSX file that contains the Advanced Company
Indicators. The file specified by filename_advanced_company_indicators
must exist in the directory specified by the dir_input
parameter. It must be a single string/character value, and it must refer
to a valid, accessible, local file. As an example:
sheet_advanced_company_indicators
sheet_advanced_company_indicators
is the name of the sheet that contains asset based company production
data in the file defined by filename_advanced_company_indicators
and stored in the directory defined by dir_input
. It
must be a single string/character value, and it must refer to a valid,
accessible, sheet name in the appropriate file. As an example:
matching:
A full example matching
section
might look like:
matching:
params_match_name:
by_sector: TRUE
min_score: 0.9
method: "jw"
p: 0.1
overwrite: NULL
join_id: NULL
manual_sector_classification:
use_manual_sector_classification: FALSE
filename_manual_sector_classification: "manual_sector_classification.csv"
params_match_name:
A full example params_match_name
section might look like:
by_sector
by_sector
. It
must be a single logical value (either TRUE
or FALSE
). Further
explanation of this argument can be found in the documentation for r2dii.match::match_name()
.
As an example:
min_score
min_score
is a
number between 0-1, to set the minimum score threshold. A score of 1 is
a perfect match. It must be a single numeric value. Valid values
typically look like 0.7
or 0.9
(note that
this value should not be wrapped in quotes). Further
explanation of this argument can be found in the documentation for r2dii.match::match_name()
.
As an example:
method
method
is the method for distance calculation. It must be a single
string/character value, and it must refer to a valid method identifier,
one of "osa"
,
"lv"
,
"dl"
,
"hamming"
,
"lcs"
,
"qgram"
,
"cosine"
,
"jaccard"
,
"jw"
,
"soundex"
.
Further explanation of this argument can be found in the documentation
for r2dii.match::match_name()
and stringdist::stringdist-metrics
.
As an example:
p
p
is the
prefix factor for Jaro-Winkler distance. The valid range for p is 0
<= p <= 0.25. If p=0 (default), the Jaro-distance is returned.
Applies only to method=‘jw’. It must be a single numeric value. Valid
values typically look like 0.1
or 0.2
(note that
this value should not be wrapped in quotes). Further
explanation of this argument can be found in the documentation for r2dii.match::match_name()
.
As an example:
overwrite
overwrite
.
Further explanation of this argument can be found in the documentation
for r2dii.match::match_name()
.
As an example:
join_id
join_id
is an optional parameter that allows defining by which variable to match
the loans to the the companies in the abcd
. Its intended
use case is join based on unambiguous identifiers, such as the
lei
, where such data is available. It can be NULL
to use
standard name matching when no common identifiers are given. Must be a
join specification which is internally passed to
dplyr::inner_join
. If it is an unnamed character/string
vector, the values are assumed to refer to identically named join
columns. If it is a named character vector, the names are used as the
join columns in the loanbook
and the values are used as the
join columns in the abcd
. Further explanation of this
argument can be found in the documentation for r2dii.match::match_name()
.
As an example:
manual_sector_classification:
A full example manual_sector_classification
section might look like:
manual_sector_classification:
use_manual_sector_classification: FALSE
filename_manual_sector_classification: "manual_sector_classification.csv"
use_manual_sector_classification
use_manual_sector_classification
determines if the matching should use an internally provided sector
classification system or if it should use one provided by the user
instead. Internal sector classification systems are given in
r2dii.data::sector_classifications
- see also additional documentation
in r2dii.data
. The function will automatically attempt
to use one of the sector classification systems, based on the inputs in
the raw loan book files. If an externally prepared sector classification
system is to be used, for example because the loans are classified using
a system that is not provided in r2dii.data out of the box, the data
must be prepared following the same structure as found in
r2dii.data::sector_classifications
. It must be a single
logical value (either TRUE
or FALSE
). As an
example:
filename_manual_sector_classification
filename_manual_sector_classification
is the filename of the CSV that contains the manual sector
classification data. The file specified by filename_manual_sector_classification
must exist in the directory specified by the dir_input
parameter. It must be a single string/character value, and it must refer
to a valid, accessible, local file. As an example:
match_prioritize:
A full example match_prioritize
section might look like:
priority
priority
indicates the level of matching that should be prioritized when a loan
can be matched at multiple levels. It must be a single string/character
value or NULL
, and it must
refer to a valid, accessible, local file. Further explanation of this
argument can be found in the documentation for r2dii.match::priortize()
.
As an example:
prepare_abcd:
A full example prepare_abcd
section might look like:
remove_inactive_companies
remove_inactive_companies
determines if inactive companies should be removed from the abcd dataset
or not. “Companies” here refers to company-sector combinations and
“inactive” characterizes such company-sector combinations that are
inactive at the end of the time frame analysed. When focusing forward
looking analysis on exposures in the end year, such inactive companies
may not produce meaningful results. It must be a single logical value
(either TRUE
or FALSE
). As an
example:
match_success_rate:
A full example match_success_rate
section might look like:
plot_width
plot_width
is the
desired width of the XXX output plot in units defined by plot_units
. It
must be a single numeric value. Valid values typically look like 10
or 12
(note that
this value should not be wrapped in quotes). As an example:
plot_height
plot_height
is
the desired height of the XXX output plot in units defined by plot_units
. It
must be a single numeric value. Valid values typically look like 6
or 8
(note that this
value should not be wrapped in quotes). As an example: