diff --git a/docs/html/Contents.html b/docs/html/Contents.html index 19c7cf7c..245fefba 100644 --- a/docs/html/Contents.html +++ b/docs/html/Contents.html @@ -1,386 +1,205 @@ - - -
-



eve takes a list of event ids in binary format as input and generates a partition of event ids as a binary data stream, according to the parameters supplied. Events are "shuffled" by being assigned to processes one by one cyclically, rather than being distributed to processes in blocks, in the order they are input. This helps to even the workload by process, in case all the "big" events are together in the same range of the event list.
+eve takes a list of event ids in binary format as input and generates +a partition of event ids as a binary data stream, according to the +parameters supplied. Events are "shuffled" by being assigned to +processes one by one cyclically, rather than being distributed to +processes in blocks, in the order they are input. This helps to even the +workload by process, in case all the "big" events are together in the +same range of the event list.
The output stream is a simple list of event_ids (4 byte integers).
+The output stream is a simple list of event_ids (4 byte +integers).
Required parameters are;
Optional parameters are;
$ eve [parameters] > [output].bin
-$ eve [parameters] | getmodel | gulcalc [parameters] > [stdout].bin
-
+$ eve [parameters] > [output].bin
+$ eve [parameters] | getmodel | gulcalc [parameters] > [stdout].bin
$ eve 1 2 > events1_2_shuffled.bin
+$ eve 1 2 > events1_2_shuffled.bin
$ eve -n 1 2 > events1_2_unshuffled.bin
$ eve -r 1 2 > events1_2_random.bin
-$ eve 1 2 | getmodel | gulcalc -r -S100 -i - > gulcalc1_2.bin
-
-In this example, the events from the file events.bin will be read into memory and the first half (partition 1 of 2) would be streamed out to binary file, or downstream to a single process calculation workflow.
+$ eve 1 2 | getmodel | gulcalc -r -S100 -i - > gulcalc1_2.bin +In this example, the events from the file events.bin will be read +into memory and the first half (partition 1 of 2) would be streamed out +to binary file, or downstream to a single process calculation +workflow.
The program requires an event binary. The file is picked up from the input sub-directory relative to where the program is invoked and has the following filename;
+The program requires an event binary. The file is picked up from the +input sub-directory relative to where the program is invoked and has the +following filename;
The data structure of events.bin is a simple list of event ids (4 byte integers).
+The data structure of events.bin is a simple list of event ids (4 +byte integers).
getmodel generates a stream of effective damageability distributions (cdfs) from an input list of events. Specifically, it combines the probability distributions from the model files, footprint.bin and vulnerability.bin, to generate effective damageability cdfs for the subset of exposures contained in the items.bin file and converts them into a binary stream.
-This is reference example of the class of programs which generates the damage distributions for an event set and streams them into memory. It is envisaged that model developers who wish to use the toolkit as a back-end calculator of their existing platforms can write their own version of getmodel, reading in their own source data and converting it into the standard output stream. As long as the standard input and output structures are adhered to, each program can be written in any language and read any input data.
-getmodel generates a stream of effective damageability distributions +(cdfs) from an input list of events. Specifically, it combines the +probability distributions from the model files, footprint.bin and +vulnerability.bin, to generate effective damageability cdfs for the +subset of exposures contained in the items.bin file and converts them +into a binary stream.
+This is reference example of the class of programs which generates +the damage distributions for an event set and streams them into memory. +It is envisaged that model developers who wish to use the toolkit as a +back-end calculator of their existing platforms can write their own +version of getmodel, reading in their own source data and converting it +into the standard output stream. As long as the standard input and +output structures are adhered to, each program can be written in any +language and read any input data.
+| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 0 | +0 | 1 | -cdf stream | +cdf stream |
None
-$ [stdin component] | getmodel | [stout component]
+Usage
+$ [stdin component] | getmodel | [stout component]
$ [stdin component] | getmodel > [stdout].bin
-$ getmodel < [stdin].bin > [stdout].bin
-
-$ eve 1 1 | getmodel | gulcalc -r -S100 -i gulcalci.bin
+$ getmodel < [stdin].bin > [stdout].bin
+Example
+$ eve 1 1 | getmodel | gulcalc -r -S100 -i gulcalci.bin
$ eve 1 1 | getmodel > getmodel.bin
-$ getmodel < events.bin > getmodel.bin
-
-The program requires the footprint binary and index file for the model, the vulnerability binary model file, and the items file representing the user's exposures. The files are picked up from sub-directories relative to where the program is invoked, as follows;
+$ getmodel < events.bin > getmodel.bin +The program requires the footprint binary and index file for the +model, the vulnerability binary model file, and the items file +representing the user's exposures. The files are picked up from +sub-directories relative to where the program is invoked, as +follows;
The getmodel output stream is ordered by event and streamed out in blocks for each event.
+The getmodel output stream is ordered by event and streamed out in +blocks for each event.
The program filters the footprint binary file for all areaperil_id's which appear in the items file. This selects the event footprints that impact the exposures on the basis on their location. Similarly the program filters the vulnerability file for vulnerability_id's that appear in the items file. This selects conditional damage distributions which are relevant for the exposures.
-The intensity distributions from the footprint file and conditional damage distributions from the vulnerability file are convolved for every combination of areaperil_id and vulnerability_id in the items file. The effective damage probabilities are calculated, for each damage bin, by summing the product of conditional damage probabilities with intensity probabilities for each event, areaperil, vulnerability combination across the intensity bins.
-The resulting discrete probability distributions are converted into discrete cumulative distribution functions 'cdfs'. Finally, the damage bin mid-point from the damage bin dictionary ('interpolation' field) is read in as a new field in the cdf stream as 'bin_mean'. This field is the conditional mean damage for the bin and it is used to choose the interpolation method for random sampling and numerical integration calculations in the gulcalc component.
+The program filters the footprint binary file for all areaperil_id's +which appear in the items file. This selects the event footprints that +impact the exposures on the basis on their location. Similarly the +program filters the vulnerability file for vulnerability_id's that +appear in the items file. This selects conditional damage distributions +which are relevant for the exposures.
+The intensity distributions from the footprint file and conditional +damage distributions from the vulnerability file are convolved for every +combination of areaperil_id and vulnerability_id in the items file. The +effective damage probabilities are calculated, for each damage bin, by +summing the product of conditional damage probabilities with intensity +probabilities for each event, areaperil, vulnerability combination +across the intensity bins.
+The resulting discrete probability distributions are converted into +discrete cumulative distribution functions 'cdfs'. Finally, the damage +bin mid-point from the damage bin dictionary ('interpolation' field) is +read in as a new field in the cdf stream as 'bin_mean'. This field is +the conditional mean damage for the bin and it is used to choose the +interpolation method for random sampling and numerical integration +calculations in the gulcalc component.
The gulcalc program performs Monte Carlo sampling of ground up loss by randomly sampling the cumulative probability of damage from the uniform distribution and generating damage factors by interpolation of the random numbers against the effective damage cdf. Other loss metrics are computed and assigned to special meaning sample index values as descibed below.
-The sampling methodologies are linear interpolation, quadratic interpolation and point value sampling depending on the damage bin definitions in the input data.
-Gulcalc also performs back-allocation of total coverage losses to the contributing subperil item losses (for multi-subperil models). This occurs when there are two or more items representing losses from different subperils to the same coverage, such as wind loss and storm surge loss, for example. In these cases, because the subperil losses are generated independently from each other it is possible to result in a total damage ratio greater than 1 for the coverage, or a total loss greated than the Total Insured Value "TIV". Back-allocation ensures that the total loss for a coverage cannot exceed the input TIV.
+The gulcalc program performs Monte Carlo sampling of ground up loss +by randomly sampling the cumulative probability of damage from the +uniform distribution and generating damage factors by interpolation of +the random numbers against the effective damage cdf. Other loss metrics +are computed and assigned to special meaning sample index values as +descibed below.
+The sampling methodologies are linear interpolation, quadratic +interpolation and point value sampling depending on the damage bin +definitions in the input data.
+Gulcalc also performs back-allocation of total coverage losses to the +contributing subperil item losses (for multi-subperil models). This +occurs when there are two or more items representing losses from +different subperils to the same coverage, such as wind loss and storm +surge loss, for example. In these cases, because the subperil losses are +generated independently from each other it is possible to result in a +total damage ratio greater than 1 for the coverage, or a total loss +greated than the Total Insured Value "TIV". Back-allocation ensures that +the total loss for a coverage cannot exceed the input TIV.
| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 2 | +2 | 1 | -loss stream | +loss stream |
Required parameters are;
The destination is either a filename or named pipe, or use - for standard output.
+The destination is either a filename or named pipe, or use - for +standard output.
Optional parameters are;
$ [stdin component] | gulcalc [parameters] | [stout component]
+Usage
+$ [stdin component] | gulcalc [parameters] | [stout component]
$ [stdin component] | gulcalc [parameters]
-$ gulcalc [parameters] < [stdin].bin
-
-$ eve 1 1 | getmodel | gulcalc -R1000000 -S100 -a1 -i - | fmcalc > fmcalc.bin
+$ gulcalc [parameters] < [stdin].bin
+Example
+$ eve 1 1 | getmodel | gulcalc -R1000000 -S100 -a1 -i - | fmcalc > fmcalc.bin
$ eve 1 1 | getmodel | gulcalc -R1000000 -S100 -a1 -i - | summarycalc -i -1 summarycalc1.bin
$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i gulcalci.bin
-$ gulcalc -r -S100 -i -a1 gulcalci.bin < getmodel.bin
-
-The program requires the damage bin dictionary binary for the static folder and the item and coverage binaries from the input folder. The files are found in the following locations relative to the working directory with the filenames;
+$ gulcalc -r -S100 -i -a1 gulcalci.bin < getmodel.bin +The program requires the damage bin dictionary binary for the static +folder and the item and coverage binaries from the input folder. The +files are found in the following locations relative to the working +directory with the filenames;
If the user specifies -r as a parameter, then the program also picks up a random number file from the static directory. The filename is;
+If the user specifies -r as a parameter, then the program also picks +up a random number file from the static directory. The filename is;
The stdin stream is a block of cdfs which are ordered by event_id, areaperil_id, vulnerability_id and bin_index ascending, from getmodel. The gulcalc program constructs a cdf for each item, based on matching the areaperil_id and vulnerability_id from the stdin and the item file.
+The stdin stream is a block of cdfs which are ordered by event_id, +areaperil_id, vulnerability_id and bin_index ascending, from getmodel. +The gulcalc program constructs a cdf for each item, based on matching +the areaperil_id and vulnerability_id from the stdin and the item +file.
Random samples are indexed using positive integers starting from 1, called the 'sidx', or sample index.
-For each item cdf and for the number of samples specified, the program draws a uniformly distributed random number and uses it to sample ground up loss from the cdf using one of three methods, as follows;
-For a given damage interval corresponding to a cumulative probability interval that each random number falls within;
+Random samples are indexed using positive integers starting from 1, +called the 'sidx', or sample index.
+For each item cdf and for the number of samples specified, the +program draws a uniformly distributed random number and uses it to +sample ground up loss from the cdf using one of three methods, as +follows;
+For a given damage interval corresponding to a cumulative probability +interval that each random number falls within;
An example of the three cases and methods is given below;
| bin_from | +bin_from | bin_to | bin_mean | -Method used | +Method used |
|---|---|---|---|---|---|
| 0.1 | +0.1 | 0.2 | 0.15 | -Linear interpolation | +Linear interpolation |
| 0.1 | +0.1 | 0.1 | 0.1 | -Sample bin value | +Sample bin value |
| 0.1 | +0.1 | 0.2 | 0.14 | -Quadratic interpolation | +Quadratic interpolation |
If the -R parameter is used along with a specified number of random numbers then random numbers used for sampling are generated on the fly for each event and group of items which have a common group_id using the Mersenne twister psuedo random number generator (the default RNG of the C++ v11 compiler). These random numbers are not repeatable, unless a seed is also specified (-s{number}).
-If the -r parameter is used, gulcalc reads a random number from the provided random number file, which produces repeatable results.
-The default random number behaviour (no additional parameters) is to generate random numbers from a seed determined by a combination of the event_id and group_id, which produces repeatable results. See Random Numbers for more details.
-Each sampled damage is multiplied by the item TIV, looked up from the coverage file.
+If the -R parameter is used along with a specified number of random +numbers then random numbers used for sampling are generated on the fly +for each event and group of items which have a common group_id using the +Mersenne twister psuedo random number generator (the default RNG of the +C++ v11 compiler). These random numbers are not repeatable, unless a +seed is also specified (-s{number}).
+If the -r parameter is used, gulcalc reads a random number from the +provided random number file, which produces repeatable results.
+The default random number behaviour (no additional parameters) is to +generate random numbers from a seed determined by a combination of the +event_id and group_id, which produces repeatable results. See Random Numbers for more details.
+Each sampled damage is multiplied by the item TIV, looked up from the +coverage file.
Samples with negative indexes have special meanings as follows;
| sidx | -description | +sidx | +description |
|---|---|---|---|
| -1 | -Numerical integration mean | +-1 | +Numerical integration mean |
| -2 | -Numerical integration standard deviation | +-2 | +Numerical integration standard +deviation |
| -3 | -Impacted exposure | +-3 | +Impacted exposure |
| -4 | -Chance of loss | +-4 | +Chance of loss |
| -5 | -Maximum loss | +-5 | +Maximum loss |
The allocation method determines how item losses are adjusted when a coverage is subject to losses from multiple perils, because the total loss to a coverage from mutiple perils cannot exceed the input TIV. This situation is identified when multiple item_ids in the item file share the same coverage_id. The TIV is held in the coverages file against the coverage_id and the item_id TIV is looked up from its relationship to coverage_id in the item file.
+The allocation method determines how item losses are adjusted when a +coverage is subject to losses from multiple perils, because the total +loss to a coverage from mutiple perils cannot exceed the input TIV. This +situation is identified when multiple item_ids in the item file share +the same coverage_id. The TIV is held in the coverages file against the +coverage_id and the item_id TIV is looked up from its relationship to +coverage_id in the item file.
The allocation methods are as follows;
| a | -description | +a | +description |
|---|---|---|---|
| 0 | -Pass losses through unadjusted (used for single peril models) | +0 | +Pass losses through unadjusted (used for +single peril models) |
| 1 | -Sum the losses and cap them to the TIV. Back-allocate TIV to the contributing items in proportion to the unadjusted losses | +1 | +Sum the losses and cap them to the TIV. +Back-allocate TIV to the contributing items in proportion to the +unadjusted losses |
| 2 | -Keep the maximum subperil loss and set the others to zero. Back-allocate equally when there are equal maximum losses | +2 | +Keep the maximum subperil loss and set the +others to zero. Back-allocate equally when there are equal maximum +losses |
The mean, impacted exposure and maximum loss special samples are also subject to these allocation rules. -The impacted exposure value, sidx -3, is always back-allocated equally to the items, for allocation rules 1 and 2, since by definition it is the same value for all items related to the same coverage.
+The mean, impacted exposure and maximum loss special samples are also +subject to these allocation rules. The impacted exposure value, sidx -3, +is always back-allocated equally to the items, for allocation rules 1 +and 2, since by definition it is the same value for all items related to +the same coverage.
fmcalc is the reference implementation of the Oasis Financial Module. It applies policy terms and conditions to the ground up losses and produces loss sample output. It reads in the loss stream from either gulcalc or from another fmcalc and can be called recursively and apply several consecutive sets of policy terms and conditions.
-fmcalc is the reference implementation of the Oasis Financial Module. +It applies policy terms and conditions to the ground up losses and +produces loss sample output. It reads in the loss stream from either +gulcalc or from another fmcalc and can be called recursively and apply +several consecutive sets of policy terms and conditions.
+| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 2 | +2 | 1 | -loss stream | +loss stream |
Optional parameters are;
$ [stdin component] | fmcalc [parameters] | [stout component]
+Usage
+$ [stdin component] | fmcalc [parameters] | [stout component]
$ [stdin component] | fmcalc [parameters] > [stdout].bin
-$ fmcalc [parameters] < [stdin].bin > [stdout].bin
-
-$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc -p direct -a2 | summarycalc -f -2 - | eltcalc > elt.csv
+$ fmcalc [parameters] < [stdin].bin > [stdout].bin
+Example
+$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc -p direct -a2 | summarycalc -f -2 - | eltcalc > elt.csv
$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc -p direct -a1 > fmcalc.bin
$ fmcalc -p ri1 -a2 -S -n < gulcalci.bin > fmcalc.bin
-$ fmcalc -p direct | fmcalc -p ri1 -n | fmcalc -p ri2 -n < gulcalci.bin > fm_ri2_net.bin
-
-For the gulcalc item stream input, the program requires the item, coverage and fm input data files, which are Oasis abstract data objects that describe an insurance or reinsurance programme. This data is picked up from the following files relative to the working directory by default;
+$ fmcalc -p direct | fmcalc -p ri1 -n | fmcalc -p ri2 -n < gulcalci.bin > fm_ri2_net.bin +For the gulcalc item stream input, the program requires the item, +coverage and fm input data files, which are Oasis abstract data objects +that describe an insurance or reinsurance programme. This data is picked +up from the following files relative to the working directory by +default;
For loss stream input from either gulcalc or fmcalc, the program requires only the four fm input data files,
+For loss stream input from either gulcalc or fmcalc, the program +requires only the four fm input data files,
The location of the files can be changed by using the -p parameter followed by the path location relative to the present working directory. eg -p ri1
-fmcalc passes the loss samples, including the numerical integration mean, sidx -1, and impacted exposure, sidx -3, through a set of financial calculations which are defined by the input files. The special samples -2, -4 and -5 are ignored and dropped in the output. For more information about the calculation see Financial Module
+The location of the files can be changed by using the -p parameter +followed by the path location relative to the present working directory. +eg -p ri1
+fmcalc passes the loss samples, including the numerical integration +mean, sidx -1, and impacted exposure, sidx -3, through a set of +financial calculations which are defined by the input files. The special +samples -2, -4 and -5 are ignored and dropped in the output. For more +information about the calculation see Financial Module
The purpose of summarycalc is firstly to aggregate the samples of loss to a level of interest for reporting, thereby reducing the volume of data in the stream. This is a generic first step which precedes all of the downstream output calculations. Secondly, it unifies the formats of the gulcalc and fmcalc streams, so that they are transformed into an identical stream type for downstream outputs. Finally, it can generate up to 10 summary level outputs in one go, creating multiple output streams or files.
-The output is similar to the gulcalc or fmcalc input which are losses are by sample index and by event, but the ground up or (re)insurance loss input losses are grouped to an abstract level represented by a summary_id. The relationship between the input identifier and the summary_id are defined in cross reference files called gulsummaryxref and fmsummaryxref.
-The purpose of summarycalc is firstly to aggregate the samples of +loss to a level of interest for reporting, thereby reducing the volume +of data in the stream. This is a generic first step which precedes all +of the downstream output calculations. Secondly, it unifies the formats +of the gulcalc and fmcalc streams, so that they are transformed into an +identical stream type for downstream outputs. Finally, it can generate +up to 10 summary level outputs in one go, creating multiple output +streams or files.
+The output is similar to the gulcalc or fmcalc input which are losses +are by sample index and by event, but the ground up or (re)insurance +loss input losses are grouped to an abstract level represented by a +summary_id. The relationship between the input identifier and the +summary_id are defined in cross reference files called +gulsummaryxref and fmsummaryxref.
+| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 3 | +3 | 1 | -summary stream | +summary stream |
The input stream should be identified explicitly as -i input from gulcalc or -f input from fmcalc.
-summarycalc supports up to 10 concurrent outputs. This is achieved by explictly directing each output to a named pipe, file, or to standard output.
-For each output stream, the following tuple of parameters must be specified for at least one summary set;
+The input stream should be identified explicitly as -i input from +gulcalc or -f input from fmcalc.
+summarycalc supports up to 10 concurrent outputs. This is achieved by +explictly directing each output to a named pipe, file, or to standard +output.
+For each output stream, the following tuple of parameters must be +specified for at least one summary set;
For example the following parameter choices are valid;
-$ summarycalc -i -1 -
-'outputs results for summaryset 1 to standard output
+$ summarycalc -i -1 -
+'outputs results for summaryset 1 to standard output
$ summarycalc -i -1 summarycalc1.bin
-'outputs results for summaryset 1 to a file (or named pipe)
+'outputs results for summaryset 1 to a file (or named pipe)
$ summarycalc -i -1 summarycalc1.bin -2 summarycalc2.bin
-'outputs results for summaryset 1 and 2 to a file (or named pipe)
-
-Note that the summaryset_id relates to a summaryset_id in the required input data file gulsummaryxref.bin or fmsummaryxref.bin for a gulcalc input stream or a fmcalc input stream, respectively, and represents a user specified summary reporting level. For example summaryset_id = 1 represents portfolio level, summaryset_id = 2 represents zipcode level and summaryset_id 3 represents site level.
-$ [stdin component] | summarycalc [parameters] | [stdout component]
+'outputs results for summaryset 1 and 2 to a file (or named pipe)
+Note that the summaryset_id relates to a summaryset_id in the
+required input data file gulsummaryxref.bin or
+fmsummaryxref.bin for a gulcalc input stream or a
+fmcalc input stream, respectively, and represents a user specified
+summary reporting level. For example summaryset_id = 1 represents
+portfolio level, summaryset_id = 2 represents zipcode level and
+summaryset_id 3 represents site level.
+Usage
+$ [stdin component] | summarycalc [parameters] | [stdout component]
$ [stdin component] | summarycalc [parameters]
-$ summarycalc [parameters] < [stdin].bin
-
-$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | summarycalc -i -1 - | eltcalc > eltcalc.csv
+$ summarycalc [parameters] < [stdin].bin
+Example
+$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | summarycalc -i -1 - | eltcalc > eltcalc.csv
$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | summarycalc -i -1 gulsummarycalc.bin
$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc | summarycalc -f -1 fmsummarycalc.bin
-$ summarycalc -f -1 fmsummarycalc.bin < fmcalc.bin
-
-The program requires the gulsummaryxref file for gulcalc input (-i option), or the fmsummaryxref file for fmcalc input (-f option). This data is picked up from the following files relative to the working directory;
+$ summarycalc -f -1 fmsummarycalc.bin < fmcalc.bin +The program requires the gulsummaryxref file for gulcalc input (-i +option), or the fmsummaryxref file for fmcalc input (-f option). This +data is picked up from the following files relative to the working +directory;
summarycalc takes either ground up loss from gulcalc or financial loss samples from fmcalc as input and aggregates them to a user-defined summary reporting level. The output is similar to the input, individual losses by sample index and by event, but the ground up or financial losses are summed to an abstract level represented by a summary_id. The relationship between the input identifier, item_id for gulcalc or output_id for fmcalc, and the summary_id are defined in the input files.
+summarycalc takes either ground up loss from gulcalc or financial +loss samples from fmcalc as input and aggregates them to a user-defined +summary reporting level. The output is similar to the input, individual +losses by sample index and by event, but the ground up or financial +losses are summed to an abstract level represented by a summary_id. The +relationship between the input identifier, item_id for gulcalc or +output_id for fmcalc, and the summary_id are defined in the input +files.
The special samples are computed as follows;
Go to 4.2 Output Components section
- - - - +Go to 4.2 Output Components +section
+ + + diff --git a/docs/html/DataConversionComponents.html b/docs/html/DataConversionComponents.html index b59b3115..bae0dd68 100644 --- a/docs/html/DataConversionComponents.html +++ b/docs/html/DataConversionComponents.html @@ -1,1913 +1,2456 @@ - - - -
The following components convert input data in csv format to the binary format required by the calculation components in the reference model;
+ + + + + + +
The following components convert input data in csv format to the +binary format required by the calculation components in the reference +model;
Static data
A reference intensity bin dictionary csv should also exist, although there is no conversion component for this file because it is not needed for calculation purposes.
+A reference intensity bin dictionary csv +should also exist, although there is no conversion component for this +file because it is not needed for calculation purposes.
Input data
These components are intended to allow users to generate the required input binaries from csv independently of the original data store and technical environment. All that needs to be done is first generate the csv files from the data store (SQL Server database, etc).
-The following components convert the binary input data required by the calculation components in the reference model into csv format;
+These components are intended to allow users to generate the required +input binaries from csv independently of the original data store and +technical environment. All that needs to be done is first generate the +csv files from the data store (SQL Server database, etc).
+The following components convert the binary input data required by +the calculation components in the reference model into csv format;
Static data
Input data
These components are provided for the convenience of viewing the data and debugging.
+These components are provided for the convenience of viewing the data +and debugging.
The aggregate vulnerability file is required for the gulmc component. +It contains the conditional distributions of damage for each intensity +bin and for each vulnerability_id. This file must have the following +location and filename;
+The csv file should contain the following fields and include a header +row.
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| aggregate_vulnerability_id | +int | +4 | +Oasis vulnerability_id | +45 | +
| vulnerability_id | +int | +4 | +Oasis vulnerability_id | +45 | +
If this file is present, the weights.bin or weights.csv file must +also be present. The data should not contain nulls.
+$ aggregatevulnerabilitytobin < aggregate_vulnerability.csv > aggregate_vulnerability.bin
+$ aggregatevulnerabilitytocsv < aggregate_vulnerability.bin > aggregate_vulnerability.csv
+
The damage bin dictionary is a reference table in Oasis which defines how the effective damageability cdfs are discretized on a relative damage scale (normally between 0 and 1). It is required by getmodel and gulcalc and must have the following location and filename;
+The damage bin dictionary is a reference table in Oasis which defines +how the effective damageability cdfs are discretized on a relative +damage scale (normally between 0 and 1). It is required by getmodel and +gulcalc and must have the following location and filename;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| bin_index | +bin_index | int | 4 | -Identifier of the damage bin | -1 | +Identifier of the damage bin | +1 |
| bin_from | +bin_from | float | 4 | -Lower damage threshold for the bin | -0.01 | +Lower damage threshold for the bin | +0.01 |
| bin_to | +bin_to | float | 4 | -Upper damage threshold for the bin | -0.02 | +Upper damage threshold for the bin | +0.02 |
| interpolation | +interpolation | float | 4 | -Interpolation damage value for the bin (usually the mid-point) | -0.015 | +Interpolation damage value for the bin +(usually the mid-point) | +0.015 |
The interval_type field has been deprecated and will be filled with zeros in the binary file. It does not need to be included as the final column in the csv file:
+The interval_type field has been deprecated and will be filled with +zeros in the binary file. It does not need to be included as the final +column in the csv file:
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| interval_type | +interval_type | int | 4 | -Identifier of the interval type, e.g. closed, open (deprecated) | -0 | +Identifier of the interval type, e.g. +closed, open (deprecated) | +0 |
The data should be ordered by bin_index ascending and not contain nulls. The bin_index should be a contiguous sequence of integers starting from 1.
+The data should be ordered by bin_index ascending and not contain +nulls. The bin_index should be a contiguous sequence of integers +starting from 1.
$ damagebintobin < damage_bin_dict.csv > damage_bin_dict.bin
-
-Validation checks on the damage bin dictionary csv file are conducted by default during conversion to binary format. These can be suppressed with the -N argument:
-$ damagebintobin -N < damage_bin_dict.csv > damage_bin_dict.bin
-
+$ damagebintobin < damage_bin_dict.csv > damage_bin_dict.bin
+Validation checks on the damage bin dictionary csv file are conducted +by default during conversion to binary format. These can be suppressed +with the -N argument:
+$ damagebintobin -N < damage_bin_dict.csv > damage_bin_dict.bin
$ damagebintocsv < damage_bin_dict.bin > damage_bin_dict.csv
-
-The deprecated interval_type field can be sent to the output using the -i argument:
-$ damagebintocsv -i < damage_bin_dict.bin > damage_bin_dict.csv
-
+$ damagebintocsv < damage_bin_dict.bin > damage_bin_dict.csv
+The deprecated interval_type field can be sent to the output using +the -i argument:
+$ damagebintocsv -i < damage_bin_dict.bin > damage_bin_dict.csv
The intensity bin dictionary defines the meaning of the bins of the hazard intensity measure. The hazard intensity measure could be flood depth, windspeed, peak ground acceleration etc, depending on the type of peril. The range of hazard intensity values in the model is discretized into bins, each with a unique and contiguous bin_index listed in the intensity bin dictionary. The bin_index is used as a reference in the footprint file (field intensity_bin_index) to specify the hazard intensity for each event and areaperil.
-This file is for reference only as it is not used in the calculation so there is no component to convert it to binary format.
-The csv file should contain the following fields and include a header row.
+The intensity bin dictionary defines the meaning of the bins of the +hazard intensity measure. The hazard intensity measure could be flood +depth, windspeed, peak ground acceleration etc, depending on the type of +peril. The range of hazard intensity values in the model is discretized +into bins, each with a unique and contiguous bin_index listed in the +intensity bin dictionary. The bin_index is used as a reference in the +footprint file (field intensity_bin_index) to specify the hazard +intensity for each event and areaperil.
+This file is for reference only as it is not used in the calculation +so there is no component to convert it to binary format.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| bin_index | +bin_index | int | 4 | -Identifier of the intensity bin | -1 | +Identifier of the intensity bin | +1 |
| bin_from | +bin_from | float | 4 | -Lower intensity threshold for the bin | -56 | +Lower intensity threshold for the bin | +56 |
| bin_to | +bin_to | float | 4 | -Upper intensity threshold for the bin | -57 | +Upper intensity threshold for the bin | +57 |
| interpolation | +interpolation | float | 4 | -Mid-point intensity value for the bin | -0.015 | +Mid-point intensity value for the bin | +0.015 |
| interval_type | +interval_type | int | 4 | -Identifier of the interval type, e.g. closed, open | -1 | +Identifier of the interval type, e.g. +closed, open | +1 |
The data should be ordered by bin_index ascending and not contain nulls. The bin_index should be a contiguous sequence of integers starting from 1.
+The data should be ordered by bin_index ascending and not contain +nulls. The bin_index should be a contiguous sequence of integers +starting from 1.
The event footprint is required for the getmodel component, as well as an index file containing the starting positions of each event block. These must have the following location and filenames;
+The event footprint is required for the getmodel component, as well +as an index file containing the starting positions of each event block. +These must have the following location and filenames;
The csv file should contain the following fields and include a header row.
-| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -1 | +Oasis event_id | +1 |
| areaperil_id | +areaperil_id | int | 4 | -Oasis areaperil_id | -4545 | +Oasis areaperil_id | +4545 |
| intensity_bin_index | +intensity_bin_index | int | 4 | -Identifier of the intensity bin | -10 | +Identifier of the intensity bin | +10 |
| prob | +prob | float | 4 | -The probability mass for the intensity bin between 0 and 1 | -0.765 | +The probability mass for the intensity bin +between 0 and 1 | +0.765 |
The data should be ordered by event_id, areaperil_id and not contain nulls.
+The data should be ordered by event_id, areaperil_id and not contain +nulls.
$ footprinttobin -i {number of intensity bins} < footprint.csv
-
-This command will create a binary file footprint.bin and an index file footprint.idx in the working directory. The number of intensity bins is the maximum value of intensity_bin_index.
-Validation checks on the footprint csv file are conducted by default during conversion to binary format. These can be suppressed with the -N argument.
-$ footprinttobin -i {number of intensity bins} -N < footprint.csv
-
-There is an additional parameter -n, which should be used when there is only one record per event_id and areaperil_id, with a single intensity_bin_index value and prob = 1. This is the special case 'no hazard intensity uncertainty'. In this case, the usage is as follows.
-$ footprinttobin -i {number of intensity bins} -n < footprint.csv
-
-Both parameters -i and -n are held in the header of the footprint.bin and used in getmodel.
-The output binary and index file names can be explicitly set using the -b and --x flags respectively:
-$ footprinttobin -i {number of intensity bins} -b {output footprint binary file name} -x {output footprint index file name} < footprint.csv
-
-Both output binary and index file names must be given to use this option.
-In the case of very large footprint files, it may be preferrable to compress the data as it is written to the binary file. Compression is performed using zlib by issuing the -z flag. If the -u flag is used in addition, the index file will include the uncompressed data size. It is recommended to use the -u flag to prevent any memory issues during decompression with getmodel or footprinttocsv:
-$ footprinttobin -i {number of intensity bins} -z < footprint.csv
-$ footprinttobin -i {number of intensity bins} -z -u < footprint.csv
-
-The value of the -u parameter is held in the same location as -n in the header of the footprint.bin file, left-shifted by 1.
+$ footprinttobin -i {number of intensity bins} < footprint.csv
+This command will create a binary file footprint.bin and an index +file footprint.idx in the working directory. The number of intensity +bins is the maximum value of intensity_bin_index.
+Validation checks on the footprint csv file are conducted by default +during conversion to binary format. These can be suppressed with the -N +argument:
+$ footprinttobin -i {number of intensity bins} -N < footprint.csv > footprint.bin
+There is an additional parameter -n, which should be used when there +is only one record per event_id and areaperil_id, with a single +intensity_bin_index value and prob = 1. This is the special case 'no +hazard intensity uncertainty'. In this case, the usage is as +follows.
+$ footprinttobin -i {number of intensity bins} -n < footprint.csv
+Both parameters -i and -n are held in the header of the footprint.bin +and used in getmodel.
+The output binary and index file names can be explicitly set using +the -b and -x flags respectively:
+$ footprinttobin -i {number of intensity bins} -b {output footprint binary file name} -x {output footprint index file name} < footprint.csv
+Both output binary and index file names must be given to use this +option.
+In the case of very large footprint files, it may be preferrable to +compress the data as it is written to the binary file. Compression is +performed using zlib by issuing the -z +flag. If the -u flag is used in addition, the index file will include +the uncompressed data size. It is recommended to use the -u flag to +prevent any memory issues during decompression with getmodel or +footprinttocsv:
+$ footprinttobin -i {number of intensity bins} -z < footprint.csv
+$ footprinttobin -i {number of intensity bins} -z -u < footprint.csv
+The value of the -u parameter is held in the same location as -n in +the header of the footprint.bin file, left-shifted by 1.
$ footprinttocsv > footprint.csv
-
-footprinttocsv requires a binary file footprint.bin and an index file footprint.idx to be present in the working directory.
-Input binary and index file names can be explicitly set using the -b and -x flags respectively:
-$ footprinttocsv -b {input footprint binary file name} -x {input footprint index file name} > footprint.csv
-
-Both input binary and index file name must be given to use this option.
-Footprint binary files that contain compressed data require the -z argument to be issued:
-$ footprinttocsv -z > footprint.csv
-
+$ footprinttocsv > footprint.csv
+footprinttocsv requires a binary file footprint.bin and an index file +footprint.idx to be present in the working directory.
+Input binary and index file names can be explicitly set using the -b +and -x flags respectively:
+$ footprinttocsv -b {input footprint binary file name} -x {input footprint index file name} > footprint.csv
+Both input binary and index file name must be given to use this +option.
+Footprint binary files that contain compressed data require the -z +argument to be issued:
+$ footprinttocsv -z > footprint.csv
+
+
+The lossfactors binary maps the event_id/amplification_id pairs with +post loss amplification factors, and is supplied by the model providers. +The first 4 bytes are preserved for future use and the data format is as +follows. It is required by Post Loss Amplification (PLA) workflow must +have the following location and filename;
+The csv file should contain the following fields and include a header +row.
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| event_id | +int | +4 | +Event ID | +1 | +
| count | +int | +4 | +Number of amplification IDs associated +with the event ID | +1 | +
| amplification_id | +int | +4 | +Amplification ID | +1 | +
| factor | +float | +4 | +The uplift factor | +1.01 | +
All fields must not have null values. The csv file will not contain +the count, and the conversion tools will add/remove this count.
+$ lossfactorstobin < lossfactors.csv > lossfactors.bin
+$ lossfactorstocsv < lossfactors.bin > lossfactors.csv
A random number file may be provided for the gulcalc component as an option (using gulcalc -r parameter) The random number binary contains a list of random numbers used for ground up loss sampling in the kernel calculation. It must have the following location and filename;
+A random number file may be provided for the gulcalc component as an +option (using gulcalc -r parameter) The random number binary contains a +list of random numbers used for ground up loss sampling in the kernel +calculation. It must have the following location and filename;
If the gulcalc -r parameter is not used, the random number binary is not required and random numbers are instead generated dynamically during the calculation, using the -R parameter to specify how many should be generated.
-The random numbers can be imported from a csv file using the component randtobin.
-The csv file should contain a simple list of random numbers and include a header row.
+If the gulcalc -r parameter is not used, the random number binary is +not required and random numbers are instead generated dynamically during +the calculation, using the -R parameter to specify how many should be +generated.
+The random numbers can be imported from a csv file using the +component randtobin.
+The csv file should contain a simple list of random numbers and +include a header row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| rand | +rand | float | 4 | -Number between 0 and 1 | -0.75875 | +Number between 0 and 1 | +0.75875 |
$ randtobin < random.csv > random.bin
-
+$ randtobin < random.csv > random.bin
There are a few parameters available which allow the generation of a random number csv file as follows;
+There are a few parameters available which allow the generation of a +random number csv file as follows;
$ randtocsv -r < random.bin > random.csv
+$ randtocsv -r < random.bin > random.csv
$ randtocsv -g 1000000 > random.csv
-$ randtocsv -g 1000000 -S 1234 > random.csv
-
-The -S {seed value} option produces repeatable random numbers, whereas usage of -g alone will generate a different set every time.
+$ randtocsv -g 1000000 -S 1234 > random.csv +The -S {seed value} option produces repeatable random numbers, +whereas usage of -g alone will generate a different set every time.
The vulnerability file is required for the getmodel component. It contains the conditional distributions of damage for each intensity bin and for each vulnerability_id. This file must have the following location and filename;
+The vulnerability file is required for the getmodel component. It +contains the conditional distributions of damage for each intensity bin +and for each vulnerability_id. This file must have the following +location and filename;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| vulnerability_id | +vulnerability_id | int | 4 | -Oasis vulnerability_id | -45 | +Oasis vulnerability_id | +45 |
| intensity_bin_index | +intensity_bin_index | int | 4 | -Identifier of the hazard intensity bin | -10 | +Identifier of the hazard intensity +bin | +10 |
| damage_bin_index | +damage_bin_index | int | 4 | -Identifier of the damage bin | -20 | +Identifier of the damage bin | +20 |
| prob | +prob | float | 4 | -The probability mass for the damage bin | -0.186 | +The probability mass for the damage +bin | +0.186 |
The data should be ordered by vulnerability_id, intensity_bin_index and not contain nulls.
+The data should be ordered by vulnerability_id, intensity_bin_index +and not contain nulls.
$ vulnerabilitytobin -d {number of damage bins} < vulnerability.csv > vulnerability.bin
-
-The parameter -d number of damage bins is the maximum value of damage_bin_index. This is held in the header of vulnerability.bin and used by getmodel.
-Validation checks on the vulnerability csv file are conducted by default during conversion to binary format. These can be suppressed with the -N argument:
-$ vulnerabilitytobin -d {number of damage bins} -N < vulnerability.csv > vulnerability.bin
-
-In the case of very large vulnerability files, it may be preferrable to create an index file to improve performance. Issuing the -i flag creates vulnerability.bin and vulnerability.idx in the current working directory:
-$ vulnerabilitytobin -d {number of damage bins} -i < vulnerability.csv
-
-Additionally, the data can be compressed as it is written to the binary file. Compression is performed with zlib by issuing the -z flag. This creates vulnerability.bin.z and vulnerability.idx.z in the current working directory:
-$ vulnerabilitytobin -d {number of damage bins} -i < vulnerability.csv
-
-The getmodel component will look for the presence of index files in the following order to determine which algorithm to use to extract data from vulnerability.bin:
+$ vulnerabilitytobin -d {number of damage bins} < vulnerability.csv > vulnerability.bin
+The parameter -d number of damage bins is the maximum value of +damage_bin_index. This is held in the header of vulnerability.bin and +used by getmodel.
+Validation checks on the vulnerability csv file are conducted by +default during conversion to binary format. These can be suppressed with +the -N argument:
+$ vulnerabilitytobin -d {number of damage bins} -N < vulnerability.csv > vulnerability.bin
+In the case of very large vulnerability files, it may be preferrable +to create an index file to improve performance. Issuing the -i flag +creates vulnerability.bin and vulnerability.idx in the current working +directory:
+$ vulnerabilitytobin -d {number of damage bins} -i < vulnerability.csv
+Additionally, the data can be compressed as it is written to the +binary file. Compression is performed with zlib by issuing the -z flag. This creates +vulnerability.bin.z and vulnerability.idx.z in the current working +directory:
+$ vulnerabilitytobin -d {number of damage bins} -i < vulnerability.csv
+The getmodel component will look for the presence of index files in +the following order to determine which algorithm to use to extract data +from vulnerability.bin:
$ vulnerabilitytocsv < vulnerability.bin > vulnerability.csv
+$ vulnerabilitytocsv < vulnerability.bin > vulnerability.csv
$ vulnerabilitytocsv -i > vulnerability.csv
-$ vulnerabilitytocsv -z > vulnerability.csv
-
+$ vulnerabilitytocsv -z > vulnerability.csv
+
+
+The vulnerability weights binary contains the the weighting of each +vulnerability function in all areaperil IDs. The data format is as +follows. It is required by gulmc with the aggregate_vulnerability file +and must have the following location and filename;
+The csv file should contain the following fields and include a header +row.
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| areaperil_id | +int | +4 | +Areaperil ID | +1 | +
| vulnerability_id | +int | +4 | +Vulnerability ID | +1 | +
| weight | +float | +4 | +The weighting factor | +1.0 | +
All fields must not have null values.
+$ weightstobin < weights.csv > weights.bin
+$ weightstocsv < weights.bin > weights.csv
The amplifications binary contains the list of item IDs mapped to +amplification IDs. The data format is as follows. It is required by Post +Loss Amplification (PLA) workflow must have the following location and +filename;
+The csv file should contain the following fields and include a header +row.
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| item_id | +int | +4 | +Item ID | +1 | +
| amplification_id | +int | +4 | +Amplification ID | +1 | +
The item_id must start from 1 and must be contiguous and not have +null values. The binary file only contains the amplification IDs and +assumes the item_ids would start from 1 and are contiguous.
+$ amplificationtobin < amplifications.csv > amplifications.bin
+$ amplificationtocsv < amplifications.bin > amplifications.csv
+
The coverages binary contains the list of coverages and the coverage TIVs. The data format is as follows. It is required by gulcalc and fmcalc and must have the following location and filename;
+The coverages binary contains the list of coverages and the coverage +TIVs. The data format is as follows. It is required by gulcalc and +fmcalc and must have the following location and filename;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| coverage_id | +coverage_id | int | 4 | -Identifier of the coverage | -1 | +Identifier of the coverage | +1 |
| tiv | +tiv | float | 4 | -The total insured value of the coverage | -200000 | +The total insured value of the +coverage | +200000 |
Coverage_id must be an ordered contiguous sequence of numbers starting at 1.
+Coverage_id must be an ordered contiguous sequence of numbers +starting at 1.
$ coveragetobin < coverages.csv > coverages.bin
-
+$ coveragetobin < coverages.csv > coverages.bin
$ coveragetocsv < coverages.bin > coverages.csv
-
+$ coveragetocsv < coverages.bin > coverages.csv
+
+
+The ensemble file is used for ensemble modelling (multiple views) +which maps sample IDs to particular ensemble ID groups. It is an +optional file for use with AAL and LEC. It must have the following +location and filename;
+The csv file should contain a list of event_ids (integers) and +include a header.
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| sidx | +int | +4 | +Sample ID | +1 | +
| ensemble_id | +int | +4 | +Ensemble ID | +1 | +
$ ensembletobin < ensemble.csv > ensemble.bin
+$ ensembletocsv < ensemble.bin > ensemble.csv
One or more event binaries are required by eve. It must have the following location and filename;
+One or more event binaries are required by eve. It must have the +following location and filename;
The csv file should contain a list of event_ids (integers) and include a header.
+The csv file should contain a list of event_ids (integers) and +include a header.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
$ evetobin < events.csv > events.bin
-
+$ evetobin < events.csv > events.bin
$ evetocsv < events.bin > events.csv
-
+$ evetocsv < events.bin > events.csv
The items binary contains the list of exposure items for which ground up loss will be sampled in the kernel calculations. The data format is as follows. It is required by gulcalc and outputcalc and must have the following location and filename;
+The items binary contains the list of exposure items for which ground +up loss will be sampled in the kernel calculations. The data format is +as follows. It is required by gulcalc and outputcalc and must have the +following location and filename;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| item_id | +item_id | int | 4 | -Identifier of the exposure item | -1 | +Identifier of the exposure item | +1 |
| coverage_id | +coverage_id | int | 4 | -Identifier of the coverage | -3 | +Identifier of the coverage | +3 |
| areaperil_id | +areaperil_id | int | 4 | -Identifier of the locator and peril | -4545 | +Identifier of the locator and peril | +4545 |
| vulnerability_id | +vulnerability_id | int | 4 | -Identifier of the vulnerability distribution | -645 | +Identifier of the vulnerability +distribution | +645 |
| group_id | +group_id | int | 4 | -Identifier of the correlaton group | -3 | +Identifier of the correlaton group | +3 |
The data should be ordered by areaperil_id, vulnerability_id ascending and not contain nulls. item_id must be a contiguous sequence of numbers starting from 1.
+The data should be ordered by areaperil_id, vulnerability_id +ascending and not contain nulls. item_id must be a contiguous sequence +of numbers starting from 1.
$ itemtobin < items.csv > items.bin
-
+$ itemtobin < items.csv > items.bin
$ itemtocsv < items.bin > items.csv
-
+$ itemtocsv < items.bin > items.csv
The gulsummaryxref binary is a cross reference file which determines how item or coverage losses from gulcalc output are summed together into at various summary levels in summarycalc. It is required by summarycalc and must have the following location and filename;
+The gulsummaryxref binary is a cross reference file which determines +how item or coverage losses from gulcalc output are summed together into +at various summary levels in summarycalc. It is required by summarycalc +and must have the following location and filename;
The csv file should contain the following fields and include a header row.
-| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| item_id / coverage_id | +item_id / coverage_id | int | 4 | -Identifier of the item or coverage | -3 | +Identifier of the item or coverage | +3 |
| summary_id | +summary_id | int | 4 | -Identifier of the summary level grouping | -3 | +Identifier of the summary level +grouping | +3 |
| summaryset_id | +summaryset_id | int | 4 | -Identifier of the summary set | -1 | +Identifier of the summary set | +1 |
One summary set consists of a common summaryset_id and each item_id being assigned a summary_id. An example is as follows.
+One summary set consists of a common summaryset_id and each item_id +being assigned a summary_id. An example is as follows.
| item_id | +item_id | summary_id | -summaryset_id | +summaryset_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -1 | +1 |
| 3 | +3 | 1 | -1 | +1 |
| 4 | +4 | 2 | -1 | +1 |
| 5 | +5 | 2 | -1 | +1 |
| 6 | +6 | 2 | -1 | +1 |
This shows, for summaryset_id=1, items 1-3 being grouped into summary_id = 1 and items 4-6 being grouped into summary_id = 2. This could be an example of a 'site' level grouping, for example. The summary_ids should be held in a dictionary which contains the description of the ids to make meaning of the output results. For instance;
+This shows, for summaryset_id=1, items 1-3 being grouped into +summary_id = 1 and items 4-6 being grouped into summary_id = 2. This +could be an example of a 'site' level grouping, for example. The +summary_ids should be held in a dictionary which contains the +description of the ids to make meaning of the output results. For +instance;
| summary_id | +summary_id | summaryset_id | -summary_desc | +summary_desc |
|---|---|---|---|---|
| 1 | +1 | 1 | -site_435 | +site_435 |
| 2 | +2 | 1 | -site_958 | +site_958 |
This cross reference information is not required in ktools.
-Up to 10 summary sets may be provided in gulsummaryxref, depending on the required summary reporting levels for the analysis. Here is an example of the 'site' summary level with summaryset_id=1, plus an 'account' summary level with summaryset_id = 2. In summary set 2, the account summary level includes both sites because all items are assigned a summary_id of 1.
+Up to 10 summary sets may be provided in gulsummaryxref, depending on +the required summary reporting levels for the analysis. Here is an +example of the 'site' summary level with summaryset_id=1, plus an +'account' summary level with summaryset_id = 2. In summary set 2, the +account summary level includes both sites because all items are assigned +a summary_id of 1.
| item_id | +item_id | summary_id | -summaryset_id | +summaryset_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -1 | +1 |
| 3 | +3 | 1 | -1 | +1 |
| 4 | +4 | 2 | -1 | +1 |
| 5 | +5 | 2 | -1 | +1 |
| 6 | +6 | 2 | -1 | +1 |
| 1 | +1 | 1 | -2 | +2 |
| 2 | +2 | 1 | -2 | +2 |
| 3 | +3 | 1 | -2 | +2 |
| 4 | +4 | 1 | -2 | +2 |
| 5 | +5 | 1 | -2 | +2 |
| 6 | +6 | 1 | -2 | +2 |
$ gulsummaryxreftobin < gulsummaryxref.csv > gulsummaryxref.bin
-
+$ gulsummaryxreftobin < gulsummaryxref.csv > gulsummaryxref.bin
$ gulsummaryxreftocsv < gulsummaryxref.bin > gulsummaryxref.csv
-
+$ gulsummaryxreftocsv < gulsummaryxref.bin > gulsummaryxref.csv
The fm programme binary file contains the level heirarchy and defines aggregations of losses required to perform a loss calculation, and is required for fmcalc only.
+The fm programme binary file contains the level heirarchy and defines +aggregations of losses required to perform a loss calculation, and is +required for fmcalc only.
This must have the following location and filename;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| from_agg_id | +from_agg_id | int | 4 | -Oasis Financial Module from_agg_id | -1 | +Oasis Financial Module from_agg_id | +1 |
| level_id | +level_id | int | 4 | -Oasis Financial Module level_id | -1 | +Oasis Financial Module level_id | +1 |
| to_agg_id | +to_agg_id | int | 4 | -Oasis Financial Module to_agg_id | -1 | +Oasis Financial Module to_agg_id | +1 |
$ fmprogrammetobin < fm_programme.csv > fm_programme.bin
-
+$ fmprogrammetobin < fm_programme.csv > fm_programme.bin
$ fmprogrammetocsv < fm_programme.bin > fm_programme.csv
-
+$ fmprogrammetocsv < fm_programme.bin > fm_programme.csv
The fmprofile binary file contains the list of calculation rules with profile values (policytc_ids) that appear in the policytc file. This is required for fmcalc only.
-There are two versions of this file and either one or the other can be used at a time.
+The fmprofile binary file contains the list of calculation rules with +profile values (policytc_ids) that appear in the policytc file. This is +required for fmcalc only.
+There are two versions of this file and either one or the other can +be used at a time.
They must be in the following location with filename formats;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
fm_profile
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| policytc_id | +policytc_id | int | 4 | -Primary key | -34 | +Primary key | +34 |
| calcrule_id | +calcrule_id | int | 4 | -The calculation rule that applies to the terms | -12 | +The calculation rule that applies to the +terms | +12 |
| deductible_1 | +deductible_1 | int | 4 | -First deductible | -0.03 | +First deductible | +0.03 |
| deductible_2 | +deductible_2 | float | 4 | -Second deductible | -50000 | +Second deductible | +50000 |
| deductible_3 | +deductible_3 | float | 4 | -Third deductible | -100000 | +Third deductible | +100000 |
| attachment_1 | +attachment_1 | float | 4 | -Attachment point, or excess | -1000000 | +Attachment point, or excess | +1000000 |
| limit_1 | +limit_1 | float | 4 | -Limit | -5000000 | +Limit | +5000000 |
| share_1 | +share_1 | float | 4 | -First proportional share | -0.8 | +First proportional share | +0.8 |
| share_2 | +share_2 | float | 4 | -Second proportional share | -0.25 | +Second proportional share | +0.25 |
| share_3 | +share_3 | float | 4 | -Third proportional share | -1 | +Third proportional share | +1 |
fm_profile_step
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| policytc_id | +policytc_id | int | 4 | -Primary key | -34 | +Primary key | +34 |
| calcrule_id | +calcrule_id | int | 4 | -The calculation rule that applies to the terms | -12 | +The calculation rule that applies to the +terms | +12 |
| deductible_1 | +deductible_1 | int | 4 | -First deductible | -0.03 | +First deductible | +0.03 |
| deductible_2 | +deductible_2 | float | 4 | -Second deductible | -50000 | +Second deductible | +50000 |
| deductible_3 | +deductible_3 | float | 4 | -Third deductible | -100000 | +Third deductible | +100000 |
| attachment_1 | +attachment_1 | float | 4 | -Attachment point, or excess | -1000000 | +Attachment point, or excess | +1000000 |
| limit_1 | +limit_1 | float | 4 | -First limit | -5000000 | +First limit | +5000000 |
| share_1 | +share_1 | float | 4 | -First proportional share | -0.8 | +First proportional share | +0.8 |
| share_2 | +share_2 | float | 4 | -Second proportional share | -0.25 | +Second proportional share | +0.25 |
| share_3 | +share_3 | float | 4 | -Third proportional share | -1 | +Third proportional share | +1 |
| step_id | +step_id | int | 4 | -Step number | -1 | +Step number | +1 |
| trigger_start | +trigger_start | float | 4 | -Start trigger for payout | -0.05 | +Start trigger for payout | +0.05 |
| trigger_end | +trigger_end | float | 4 | -End trigger for payout | -0.15 | +End trigger for payout | +0.15 |
| payout_start | +payout_start | float | 4 | -Start payout | -100 | +Start payout | +100 |
| payout_end | +payout_end | float | 4 | -End payout | -200 | +End payout | +200 |
| limit_2 | +limit_2 | float | 4 | -Second limit | -3000000 | +Second limit | +3000000 |
| scale_1 | +scale_1 | float | 4 | -Scaling (inflation) factor 1 | -0.03 | +Scaling (inflation) factor 1 | +0.03 |
| scale_2 | +scale_2 | float | 4 | -Scaling (inflation) factor 2 | -0.2 | +Scaling (inflation) factor 2 | +0.2 |
$ fmprofiletobin < fm_profile.csv > fm_profile.bin
-$ fmprofiletobin -S < fm_profile_step.csv > fm_profile_step.bin
-
+$ fmprofiletobin < fm_profile.csv > fm_profile.bin
+$ fmprofiletobin -S < fm_profile_step.csv > fm_profile_step.bin
$ fmprofiletocsv < fm_profile.bin > fm_profile.csv
-$ fmprofiletocsv -S < fm_profile_step.bin > fm_profile_step.csv
-
+$ fmprofiletocsv < fm_profile.bin > fm_profile.csv
+$ fmprofiletocsv -S < fm_profile_step.bin > fm_profile_step.csv
The fm policytc binary file contains the cross reference between the aggregations of losses defined in the fm programme file at a particular level and the calculation rule that should be applied as defined in the fm profile file. This file is required for fmcalc only.
-This must have the following location and filename;
+The fm policytc binary file contains the cross reference between the +aggregations of losses defined in the fm programme file at a particular +level and the calculation rule that should be applied as defined in the +fm profile file. This file is required for fmcalc only.
+This must have the following location and filename;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| layer_id | +layer_id | int | 4 | -Oasis Financial Module layer_id | -1 | +Oasis Financial Module layer_id | +1 |
| level_id | +level_id | int | 4 | -Oasis Financial Module level_id | -1 | +Oasis Financial Module level_id | +1 |
| agg_id | +agg_id | int | 4 | -Oasis Financial Module agg_id | -1 | +Oasis Financial Module agg_id | +1 |
| policytc_id | +policytc_id | int | 4 | -Oasis Financial Module policytc_id | -1 | +Oasis Financial Module policytc_id | +1 |
$ fmpolicytctobin < fm_policytc.csv > fm_policytc.bin
-
+$ fmpolicytctobin < fm_policytc.csv > fm_policytc.bin
$ fmpolicytctocsv < fm_policytc.bin > fm_policytc.csv
-
+$ fmpolicytctocsv < fm_policytc.bin > fm_policytc.csv
The fm summary xref binary is a cross reference file which determines how losses from fmcalc output are summed together at various summary levels by summarycalc. It is required by summarycalc and must have the following location and filename;
+The fm summary xref binary is a cross reference file which determines +how losses from fmcalc output are summed together at various summary +levels by summarycalc. It is required by summarycalc and must have the +following location and filename;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| output_id | +output_id | int | 4 | -Identifier of the coverage | -3 | +Identifier of the coverage | +3 |
| summary_id | +summary_id | int | 4 | -Identifier of the summary level group for one or more output losses | -1 | +Identifier of the summary level group for +one or more output losses | +1 |
| summaryset_id | +summaryset_id | int | 4 | -Identifier of the summary set (0 to 9 inclusive) | -1 | +Identifier of the summary set (0 to 9 +inclusive) | +1 |
One summary set consists of a common summaryset_id and each output_id being assigned a summary_id. An example is as follows.
+One summary set consists of a common summaryset_id and each output_id +being assigned a summary_id. An example is as follows.
| output_id | +output_id | summary_id | -summaryset_id | +summaryset_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 2 | -1 | +1 |
This shows, for summaryset_id=1, output_id=1 being assigned summary_id = 1 and output_id=2 being assigned summary_id = 2.
-If the output_id represents a policy level loss output from fmcalc (the meaning of output_id is defined in the fm xref file) then no further grouping is performed by summarycalc and this is an example of a 'policy' summary level grouping.
-Up to 10 summary sets may be provided in this file, depending on the required summary reporting levels for the analysis. Here is an example of the 'policy' summary level with summaryset_id=1, plus an 'account' summary level with summaryset_id = 2. In summary set 2, the 'account' summary level includes both policy's because both output_id's are assigned a summary_id of 1.
+This shows, for summaryset_id=1, output_id=1 being assigned +summary_id = 1 and output_id=2 being assigned summary_id = 2.
+If the output_id represents a policy level loss output from fmcalc +(the meaning of output_id is defined in the fm xref file) then no +further grouping is performed by summarycalc and this is an example of a +'policy' summary level grouping.
+Up to 10 summary sets may be provided in this file, depending on the +required summary reporting levels for the analysis. Here is an example +of the 'policy' summary level with summaryset_id=1, plus an 'account' +summary level with summaryset_id = 2. In summary set 2, the 'account' +summary level includes both policy's because both output_id's are +assigned a summary_id of 1.
| output_id | +output_id | summary_id | -summaryset_id | +summaryset_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 2 | -1 | +1 |
| 1 | +1 | 1 | -2 | +2 |
| 2 | +2 | 1 | -2 | +2 |
If a more detailed summary level than policy is required for insured losses, then the user should specify in the fm profile file to back-allocate fmcalc losses to items. Then the output_id represents back-allocated policy losses to item, and in the fmsummaryxref file these can be grouped into any summary level, such as site, zipcode, line of business or region, for example. The user needs to define output_id in the fm xref file, and group them together into meaningful summary levels in the fm summary xref file, hence these two files must be consistent with respect to the meaning of output_id.
+If a more detailed summary level than policy is required for insured +losses, then the user should specify in the fm profile file to +back-allocate fmcalc losses to items. Then the output_id represents +back-allocated policy losses to item, and in the fmsummaryxref file +these can be grouped into any summary level, such as site, zipcode, line +of business or region, for example. The user needs to define output_id +in the fm xref file, and group them together into meaningful summary +levels in the fm summary xref file, hence these two files must be +consistent with respect to the meaning of output_id.
$ fmsummaryxreftobin < fmsummaryxref.csv > fmsummaryxref.bin
-
+$ fmsummaryxreftobin < fmsummaryxref.csv > fmsummaryxref.bin
$ fmsummaryxreftocsv < fmsummaryxref.bin > fmsummaryxref.csv
-
+$ fmsummaryxreftocsv < fmsummaryxref.bin > fmsummaryxref.csv
The fmxref binary file contains cross reference data specifying the output_id in the fmcalc as a combination of agg_id and layer_id, and is required by fmcalc.
+The fmxref binary file contains cross reference data specifying the +output_id in the fmcalc as a combination of agg_id and layer_id, and is +required by fmcalc.
This must be in the following location with filename format;
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| output_id | +output_id | int | 4 | -Identifier of the output group of losses | -1 | +Identifier of the output group of +losses | +1 |
| agg_id | +agg_id | int | 4 | -Identifier of the agg_id to output | -1 | +Identifier of the agg_id to output | +1 |
| layer_id | +layer_id | int | 4 | -Identifier of the layer_id to output | -1 | +Identifier of the layer_id to output | +1 |
The data should not contain any nulls.
-The output_id represents the summary level at which losses are output from fmcalc, as specified by the user.
+The output_id represents the summary level at which losses are output +from fmcalc, as specified by the user.
There are two cases;
For example, say there are two policy layers (with layer_ids=1 and 2) which applies to the sum of losses from 4 items (the summary level represented by agg_id=1). Without back-allocation, the policy summary level of losses can be represented as two output_id's as follows;
+For example, say there are two policy layers (with layer_ids=1 and 2) +which applies to the sum of losses from 4 items (the summary level +represented by agg_id=1). Without back-allocation, the policy summary +level of losses can be represented as two output_id's as follows;
| output_id | +output_id | agg_id | -layer_id | +layer_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -2 | +2 |
If the user wants to back-allocate policy losses to the items and output the losses by item and policy, then the item-policy summary level of losses would be represented by 8 output_id's, as follows;
+If the user wants to back-allocate policy losses to the items and +output the losses by item and policy, then the item-policy summary level +of losses would be represented by 8 output_id's, as follows;
| output_id | +output_id | agg_id | -layer_id | +layer_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 2 | -1 | +1 |
| 3 | +3 | 3 | -1 | +1 |
| 4 | +4 | 4 | -1 | +1 |
| 5 | +5 | 1 | -2 | +2 |
| 6 | +6 | 2 | -2 | +2 |
| 7 | +7 | 3 | -2 | +2 |
| 8 | +8 | 4 | -2 | +2 |
The fm summary xref file must be consistent with respect to the meaning of output_id in the fmxref file.
+The fm summary xref file must be consistent with respect to the +meaning of output_id in the fmxref file.
$ fmxreftobin < fm_xref.csv > fm_xref.bin
-
+$ fmxreftobin < fm_xref.csv > fm_xref.bin
$ fmxreftocsv < fm_xref.bin > fm_xref.csv
-
+$ fmxreftocsv < fm_xref.bin > fm_xref.csv
The occurrence file is required for certain output components which, in the reference model, are leccalc, pltcalc and aalcalc. In general, some form of event occurence file is required for any output which involves the calculation of loss metrics over a period of time. The occurrence file assigns occurrences of the event_ids to numbered periods. A period can represent any length of time, such as a year, or 2 years for instance. The output metrics such as mean, standard deviation or loss exceedance probabilities are with respect to the chosen period length. Most commonly in catastrophe modelling, the period of interest is a year.
+The occurrence file is required for certain output components which, +in the reference model, are leccalc, pltcalc and aalcalc. In general, +some form of event occurence file is required for any output which +involves the calculation of loss metrics over a period of time. The +occurrence file assigns occurrences of the event_ids to numbered +periods. A period can represent any length of time, such as a year, or 2 +years for instance. The output metrics such as mean, standard deviation +or loss exceedance probabilities are with respect to the chosen period +length. Most commonly in catastrophe modelling, the period of interest +is a year.
The occurrence file also includes date fields.
The csv file should contain the following fields and include a header row.
+The csv file should contain the following fields and include a header +row.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -The occurrence event_id | -45567 | +The occurrence event_id | +45567 |
| period_no | +period_no | int | 4 | -A numbered period in which the event occurs | -56876 | +A numbered period in which the event +occurs | +56876 |
| occ_year | +occ_year | int | 4 | -the year number of the event occurrence | -56876 | +the year number of the event +occurrence | +56876 |
| occ_month | +occ_month | int | 4 | -the month of the event occurrence | -5 | +the month of the event occurrence | +5 |
| occ_day | +occ_day | int | 4 | -the day of the event occurrence | -16 | +the day of the event occurrence | +16 |
The occurrence year in this example is a scenario numbered year, which cannot be expressed as a real date in a standard calendar.
-In addition, the following fields are optional and should comprise the sixth and seventh column respectively:
+The occurrence year in this example is a scenario numbered year, +which cannot be expressed as a real date in a standard calendar.
+In addition, the following fields are optional and should comprise +the sixth and seventh column respectively:
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| occ_hour | +occ_hour | int | 4 | -The hour of the event occurrence | -13 | +The hour of the event occurrence | +13 |
| occ_minute | +occ_minute | int | 4 | -The minute of the event occurrence | -52 | +The minute of the event occurrence | +52 |
The date fields are converted to a single number through an algorithm for efficient storage in the binary file. The data type for this field is either an integer when the optional date fields are not included or a long long integer when the these date fields are included. This should not be confused with the deprecated occ_date_id field.
+The date fields are converted to a single number through an algorithm +for efficient storage in the binary file. The data type for this field +is either an integer when the optional date fields are not included or a +long long integer when the these date fields are included. This should +not be confused with the deprecated occ_date_id field.
A required parameter is -P, the total number of periods of event occurrences. The total number of periods is held in the header of the binary file and used in output calculations.
-$ occurrencetobin -P10000 < occurrence.csv > occurrence.bin
-
-If it is desirable to include the occ_hour and occ_minute fields in the binary file, the -H argument should be given. A flag to signify the presence of these fields is set in the header of the binary file, which is read by other kiools components. If these fields do not exist in the csv file, both are assigned the value of 0 when written to the binary file.
-$ occurrencetobin -P10000 -H < occurrence.csv > occurrence.bin
-
+A required parameter is -P, the total number of periods of event +occurrences. The total number of periods is held in the header of the +binary file and used in output calculations.
+$ occurrencetobin -P10000 < occurrence.csv > occurrence.bin
+If it is desirable to include the occ_hour and occ_minute fields in +the binary file, the -H argument should be given. A flag to signify the +presence of these fields is set in the header of the binary file, which +is read by other kiools components. If these fields do not exist in the +csv file, both are assigned the value of 0 when written to the binary +file.
+$ occurrencetobin -P10000 -H < occurrence.csv > occurrence.bin
$ occurrencetocsv < occurrence.bin > occurrence.csv
-
+$ occurrencetocsv < occurrence.bin > occurrence.csv
The returnperiods binary file is a list of return periods that the user requires to be included in loss exceedance curve (leccalc) results.
+The returnperiods binary file is a list of return periods that the +user requires to be included in loss exceedance curve (leccalc) +results.
This must be in the following location with filename format;
The csv file should contain the following field and include a header.
+The csv file should contain the following field and include a +header.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| return_period | +return_period | int | 4 | -Return period | -250 | +Return period | +250 |
$ returnperiodtobin < returnperiods.csv > returnperiods.bin
-
+$ returnperiodtobin < returnperiods.csv > returnperiods.bin
$ returnperiodtocsv < returnperiods.bin > returnperiods.csv
-
+$ returnperiodtocsv < returnperiods.bin > returnperiods.csv
The periods binary file is a list of all the periods that are in the model and is optional for weighting the periods in the calculation. The file is used in the calculation of the loss exceedance curve (leccalc) and aalcalc results.
+The periods binary file is a list of all the periods that are in the +model and is optional for weighting the periods in the calculation. The +file is used in the calculation of the loss exceedance curve (leccalc) +and aalcalc results.
This must be in the following location with filename format;
The csv file should contain the following field and include a header.
+The csv file should contain the following field and include a +header.
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| period_no | +period_no | int | 4 | -A numbered period in which the event occurs | -4545 | +A numbered period in which the event +occurs | +4545 |
| weight | +weight | int | 4 | -relative weight to P, the maximum period_no | -0.0003 | +relative weight to P, the maximum +period_no | +0.0003 |
All periods must be present in this file (no gaps in period_no from 1 to P).
+All periods must be present in this file (no gaps in period_no from 1 +to P).
$ periodstobin < periods.csv > periods.bin
-
+$ periodstobin < periods.csv > periods.bin
$ periodstocsv < periods.bin > periods.csv
-
+$ periodstocsv < periods.bin > periods.csv
+
+
+The quantile binary file contains a list of user specified quantile +floats. The data format is as follows. It is optionally used by the +Quantile Event/Period Loss tables and must have the following location +and filename;
+The csv file should contain the following fields and include a header +row.
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| quantile | +float | +4 | +Quantile float | +0.1 | +
All fields must not have null values.
+$ quantiletobin < quantile.csv > quantile.bin
+$ quantiletocsv < quantile.bin > quantile.csv
-Go to 4.5 Stream conversion components section
- - - - +Go to 4.5 Stream conversion +components section
+ + + diff --git a/docs/html/FinancialModule.html b/docs/html/FinancialModule.html index 3784310f..3608fb8f 100644 --- a/docs/html/FinancialModule.html +++ b/docs/html/FinancialModule.html @@ -1,591 +1,519 @@ - - - -
The Oasis Financial Module is a data-driven process design for calculating the losses on (re)insurance contracts. It has an abstract design in order to cater for the many variations in contract structures and terms. The way Oasis works is to be fed data in order to execute calculations, so for the insurance calculations it needs to know the structure, parameters and calculation rules to be used. This data must be provided in the files used by the Oasis Financial Module:
+ + + + + + +
The Oasis Financial Module is a data-driven process design for +calculating the losses on (re)insurance contracts. It has an abstract +design in order to cater for the many variations in contract structures +and terms. The way Oasis works is to be fed data in order to execute +calculations, so for the insurance calculations it needs to know the +structure, parameters and calculation rules to be used. This data must +be provided in the files used by the Oasis Financial Module:
This section explains the design of the Financial Module which has been implemented in the fmcalc component.
+This section explains the design of the Financial Module which has +been implemented in the fmcalc component.
In addition, there is a separate github repository ktest which is an extended test suite for ktools and contains a library of financial module worked examples provided by Oasis Members with a full set of input and output files.
-Note that other reference tables are referred to below that do not appear explicitly in the kernel as they are not directly required for calculation. It is expected that a front end system will hold all of the exposure and policy data and generate the above four input files required for the kernel calculation.
+In addition, there is a separate github repository ktest which is an extended +test suite for ktools and contains a library of financial module worked +examples provided by Oasis Members with a full set of input and output +files.
+Note that other reference tables are referred to below that do not +appear explicitly in the kernel as they are not directly required for +calculation. It is expected that a front end system will hold all of the +exposure and policy data and generate the above four input files +required for the kernel calculation.
The Financial Module outputs sample by sample losses by (re)insurance contract, or by item, which represents the individual coverage subject to economic loss from a particular peril. In the latter case, it is necessary to ‘back-allocate’ losses when they are calculated at a higher policy level. The Financial Module can output retained loss or ultimate net loss (UNL) perspectives as an option, and at any stage in the calculation.
-The output contains anonymous keys representing the (re)insurance policy (agg_id and layer_id) at the chosen output level (output_id) and a loss value. Losses by sample number (idx) and event (event_id) are produced. To make sense of the output, this output must be cross-referenced with Oasis dictionaries which contain the meaningful business information.
-The Financial Module does not support multi-currency calculations.
+The Financial Module outputs sample by sample losses by (re)insurance +contract, or by item, which represents the individual coverage subject +to economic loss from a particular peril. In the latter case, it is +necessary to ‘back-allocate’ losses when they are calculated at a higher +policy level. The Financial Module can output retained loss or ultimate +net loss (UNL) perspectives as an option, and at any stage in the +calculation.
+The output contains anonymous keys representing the (re)insurance +policy (agg_id and layer_id) at the chosen output level (output_id) and +a loss value. Losses by sample number (idx) and event (event_id) are +produced. To make sense of the output, this output must be +cross-referenced with Oasis dictionaries which contain the meaningful +business information.
+The Financial Module does not support multi-currency +calculations.
Profiles are used throughout the Oasis framework and are meta-data definitions with their associated data types and rules. Profiles are used in the Financial Module to perform the elements of financial calculations used to calculate losses to (re)insurance policies. For anything other than the most simple policy which has a blanket deductible and limit, say, a profile do not represent a policy structure on its own, but rather is to be used as a building block which can be combined with other building blocks to model a particular financial contract. In this way it is possible to model an unlimited range of structures with a limited number of profiles.
-The FM Profiles form an extensible library of calculations defined within the fmcalc code that can be invoked by specifying a particular calcrule_id and providing the required data values such as deductible and limit, as described below.
+Profiles are used throughout the Oasis framework and are meta-data +definitions with their associated data types and rules. Profiles are +used in the Financial Module to perform the elements of financial +calculations used to calculate losses to (re)insurance policies. For +anything other than the most simple policy which has a blanket +deductible and limit, say, a profile do not represent a policy structure +on its own, but rather is to be used as a building block which can be +combined with other building blocks to model a particular financial +contract. In this way it is possible to model an unlimited range of +structures with a limited number of profiles.
+The FM Profiles form an extensible library of calculations defined +within the fmcalc code that can be invoked by specifying a particular +calcrule_id and providing the required data values such +as deductible and limit, as described below.
See Appendix B FM Profiles for more details.
+See Appendix B FM Profiles for more +details.
The Oasis Financial Module is a data-driven process design for calculating the losses on insurance policies. It is an abstract design in order to cater for the many variations and has four basic concepts:
+The Oasis Financial Module is a data-driven process design for +calculating the losses on insurance policies. It is an abstract design +in order to cater for the many variations and has four basic +concepts:
The profile not only provides the fields to be used in calculating losses (such as limit and deductible) but also which mathematical calculation (calcrule_id) to apply.
+The profile not only provides the fields to be used in calculating +losses (such as limit and deductible) but also which mathematical +calculation (calcrule_id) to apply.
The Financial Module brings together three elements in order to undertake a calculation:
+The Financial Module brings together three elements in order to +undertake a calculation:
There are many ways an insurance loss can be calculated with many different terms and conditions. For instance, there may be deductibles applied to each element of coverage (e.g. a buildings damage deductible), some site-specific deductibles or limits, and some overall policy deductibles and limits and share. To undertake the calculation in the correct order and using the correct items (and their values) the structure and sequence of calculations must be defined. This is done in the programme file which defines a heirarchy of groups across a number of levels. Levels drive the sequence of calculation. A financial calculation is performed at successive levels, depending on the structure of policy terms and conditions. For example there might be 3 levels representing coverage, site and policy terms and conditions.
-
Groups are defined within levels and they represent aggregations of losses on which to perform the financial calculations. The grouping fields are called from_agg_id and to_agg_id which represent a grouping of losses at the previous level and the present level of the hierarchy, respectively.
-Each level calculation applies to the to_agg_id groupings in the heirarchy. There is no calculation applied to the from_agg_id groupings at level 1 - these ids directly correspond to the ids in the loss input.
-
There are many ways an insurance loss can be calculated with many +different terms and conditions. For instance, there may be deductibles +applied to each element of coverage (e.g. a buildings damage +deductible), some site-specific deductibles or limits, and some overall +policy deductibles and limits and share. To undertake the calculation in +the correct order and using the correct items (and their values) the +structure and sequence of calculations must be defined. This is done in +the programme file which defines a heirarchy of groups +across a number of levels. Levels drive the sequence of +calculation. A financial calculation is performed at successive levels, +depending on the structure of policy terms and conditions. For example +there might be 3 levels representing coverage, site and policy terms and +conditions.
+
Groups are defined within levels and they represent aggregations of +losses on which to perform the financial calculations. The grouping +fields are called from_agg_id and to_agg_id which represent a grouping +of losses at the previous level and the present level of the hierarchy, +respectively.
+Each level calculation applies to the to_agg_id groupings in the +heirarchy. There is no calculation applied to the from_agg_id groupings +at level 1 - these ids directly correspond to the ids in the loss +input.
+
The initial input is the ground-up loss (GUL) table, generally coming from the main Oasis calculation of ground-up losses. Here is an example, for a two events and 1 sample (idx=1):
+The initial input is the ground-up loss (GUL) table, generally coming +from the main Oasis calculation of ground-up losses. Here is an example, +for a two events and 1 sample (idx=1):
| event_id | +event_id | item_id | sidx | -loss | +loss |
|---|---|---|---|---|---|
| 1 | +1 | 1 | 1 | -100,000 | +100,000 |
| 1 | +1 | 2 | 1 | -10,000 | +10,000 |
| 1 | +1 | 3 | 1 | -2,500 | +2,500 |
| 1 | +1 | 4 | 1 | -400 | +400 |
| 2 | +2 | 1 | 1 | -90,000 | +90,000 |
| 2 | +2 | 2 | 1 | -15,000 | +15,000 |
| 2 | +2 | 3 | 1 | -3,000 | +3,000 |
| 2 | +2 | 4 | 1 | -500 | +500 |
The values represent a single ground-up loss sample for items belonging to an account. We use “programme” rather than "account" as it is more general characteristic of a client’s exposure protection needs and allows a client to have multiple programmes active for a given period. -The linkage between account and programme can be provided by a user defined prog dictionary, for example;
+The values represent a single ground-up loss sample for items +belonging to an account. We use “programme” rather than "account" as it +is more general characteristic of a client’s exposure protection needs +and allows a client to have multiple programmes active for a given +period. The linkage between account and programme can be provided by a +user defined prog dictionary, for example;
| prog_id | +prog_id | account_id | -prog_name | +prog_name |
|---|---|---|---|---|
| 1 | +1 | 1 | -ABC Insurance Co. 2016 renewal | +ABC Insurance Co. 2016 renewal |
Items 1-4 represent Structure, Other Structure, Contents and Time Element coverage ground up losses for a single property, respectively, and this example is a simple residential policy with combined property coverage terms. For this policy type, the Structure, Other Structure and Contents losses are aggregated, and a deductible and limit is applied to the total. A separate set of terms, again a simple deductible and limit, is applied to the “Time Element” coverage which, for residential policies, generally means costs for temporary accommodation. The total insured loss is the sum of the output from the combined property terms and the time element terms.
+Items 1-4 represent Structure, Other Structure, Contents and Time +Element coverage ground up losses for a single property, respectively, +and this example is a simple residential policy with combined property +coverage terms. For this policy type, the Structure, Other Structure and +Contents losses are aggregated, and a deductible and limit is applied to +the total. A separate set of terms, again a simple deductible and limit, +is applied to the “Time Element” coverage which, for residential +policies, generally means costs for temporary accommodation. The total +insured loss is the sum of the output from the combined property terms +and the time element terms.
The actual items falling into the programme are specified in the programme table together with the aggregation groupings that go into a given level calculation:
+The actual items falling into the programme are specified in the +programme table together with the aggregation groupings +that go into a given level calculation:
| from_agg_id | +from_agg_id | level_id | -to_agg_id | +to_agg_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -1 | +1 |
| 3 | +3 | 1 | -1 | +1 |
| 4 | +4 | 1 | -2 | +2 |
| 1 | +1 | 2 | -1 | +1 |
| 2 | +2 | 2 | -1 | +1 |
Note that from_agg_id for level_id=1 is equal to the item_id in the input loss table (but in theory from_agg_id could represent a higher level of grouping, if required).
-In level 1, items 1, 2 and 3 all have to_agg_id =1 so losses will be summed together before applying the combined deductible and limit, but item 4 (time element) will be treated separately (not aggregated) as it has to_agg_id = 2. For level 2 we have all 4 items losses (now represented by two groups from_agg_id =1 and 2 from the previous level) aggregated together as they have the same to_agg_id = 1.
+Note that from_agg_id for level_id=1 is equal to the item_id in the +input loss table (but in theory from_agg_id could represent a higher +level of grouping, if required).
+In level 1, items 1, 2 and 3 all have to_agg_id =1 so losses will be +summed together before applying the combined deductible and limit, but +item 4 (time element) will be treated separately (not aggregated) as it +has to_agg_id = 2. For level 2 we have all 4 items losses (now +represented by two groups from_agg_id =1 and 2 from the previous level) +aggregated together as they have the same to_agg_id = 1.
Next we have the profile description table, which list the profiles representing general policy types. Our example is represented by two general profiles which specify the input fields and mathematical operations to perform. In this example, the profile for the combined coverages and time is the same (albeit with different values) and requires a limit, a deductible, and an associated calculation rule, whereas the profile for the policy requires a limit, attachment, and share, and an associated calculation rule.
+Next we have the profile description table, which list the profiles +representing general policy types. Our example is represented by two +general profiles which specify the input fields and mathematical +operations to perform. In this example, the profile for the combined +coverages and time is the same (albeit with different values) and +requires a limit, a deductible, and an associated calculation rule, +whereas the profile for the policy requires a limit, attachment, and +share, and an associated calculation rule.
| Profile description | -calcrule_id | +Profile description | +calcrule_id |
|---|---|---|---|
| deductible and limit | -1 | +deductible and limit | +1 |
| deductible and/or attachment, limit and share | -2 | +deductible and/or attachment, limit and +share | +2 |
There is a “profile value” table for each profile containing the applicable policy terms, each identified by a policytc_id. The table below shows the list of policy terms for calcrule_id 1.
+There is a “profile value” table for each profile containing the +applicable policy terms, each identified by a policytc_id. The table +below shows the list of policy terms for calcrule_id 1.
| policytc_id | +policytc_id | deductible1 | limit1 |
|---|---|---|---|
| 1 | +1 | 1,000 | 1,000,000 |
| 2 | +2 | 2,000 | 18,000 |
And next, for calcrule_id 2, the values for the overall policy attachment, limit and share
+And next, for calcrule_id 2, the values for the overall policy +attachment, limit and share
| policytc_id | +policytc_id | deductible1 | attachment1 | limit1 | @@ -594,7 +522,7 @@
|---|---|---|---|---|
| 3 | +3 | 0 | 1,000 | 1,000,000 | @@ -602,13 +530,28 @@
In practice, all profile values are stored in a single flattened format which contains all supported profile fields (see fm profile in 4.3 Data Conversion Components), but conceptually they belong in separate profile value tables.
+In practice, all profile values are stored in a single flattened +format which contains all supported profile fields (see fm profile in 4.3 Data Conversion Components), +but conceptually they belong in separate profile value tables.
The flattened file is;
fm_profile
-| policytc_id | +policytc_id | calcrule_id | deductible1 | deductible2 | @@ -617,12 +560,12 @@limit1 | share1 | share2 | -share3 | +share3 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | +1 | 1 | 1,000 | 0 | @@ -631,10 +574,10 @@1,000,000 | 0 | 0 | -0 | +0 |
| 1 | +1 | 1 | 2,000 | 0 | @@ -643,10 +586,10 @@18,000 | 0 | 0 | -0 | +0 |
| 1 | +1 | 2 | 0 | 0 | @@ -655,147 +598,201 @@1,000,000 | 0.1 | 0 | -0 | +0 |
For any given profile we have one standard rule calcrule_id, being the mathematical function used to calculate the losses from the given profile’s fields. More information about the functions can be found in FM Profiles.
+For any given profile we have one standard rule +calcrule_id, being the mathematical function used to +calculate the losses from the given profile’s fields. More information +about the functions can be found in FM +Profiles.
The policytc table specifies the (re)insurance contracts (this is a combination of agg_id and layer_id) and the separate terms and conditions which will be applied to each layer_id/agg_id for a given level. In our example, we have a limit and deductible with the same value applicable to the combination of the first three items, a limit and deductible for the fourth item (time) in level 1, and then a limit, attachment, and share applicable at level 2 covering all items. We’d represent this in terms of the distinct agg_ids as follows:
+The policytc table specifies the (re)insurance +contracts (this is a combination of agg_id and layer_id) and the +separate terms and conditions which will be applied to each +layer_id/agg_id for a given level. In our example, we have a limit and +deductible with the same value applicable to the combination of the +first three items, a limit and deductible for the fourth item (time) in +level 1, and then a limit, attachment, and share applicable at level 2 +covering all items. We’d represent this in terms of the distinct agg_ids +as follows:
| layer_id | +layer_id | level_id | agg_id | -policytc_id | +policytc_id |
|---|---|---|---|---|---|
| 1 | +1 | 1 | 1 | -1 | +1 |
| 1 | +1 | 1 | 2 | -2 | +2 |
| 1 | +1 | 2 | 1 | -3 | +3 |
In words, the data in the table mean;
At Level 1;
-Apply policytc_id (terms and conditions) 1 to (the sum of losses represented by) agg_id 1
+Apply policytc_id (terms and conditions) 1 to (the sum of losses +represented by) agg_id 1
Apply policytc_id 2 to agg_id 2
Then at level 2;
Apply policytc_id 3 to agg_id 1
-Levels are processed in ascending order and the calculated losses from a previous level are summed according to the groupings defined in the programme table which become the input losses to the next level.
+Levels are processed in ascending order and the calculated losses +from a previous level are summed according to the groupings defined in +the programme table which become the input losses to the next level.
Layers can be used to model multiple sets of terms and conditions applied to the same losses, such as excess policies. For the lower level calculations and in the general case where there is a single contract, layer_id should be set to 1. For a given level_id and agg_id, multiple layers can be defined by setting layer_id =1,2,3 etc, and assigning a different calculation policytc_id to each.
-
For this example at level 3, the policytc data might look as follows;
+Layers can be used to model multiple sets of terms and conditions +applied to the same losses, such as excess policies. For the lower level +calculations and in the general case where there is a single contract, +layer_id should be set to 1. For a given level_id and agg_id, multiple +layers can be defined by setting layer_id =1,2,3 etc, and assigning a +different calculation policytc_id to each.
+
For this example at level 3, the policytc data might look as +follows;
| layer_id | +layer_id | level_id | agg_id | -policytc_id | +policytc_id |
|---|---|---|---|---|---|
| 1 | +1 | 3 | 1 | -22 | +22 |
| 2 | +2 | 3 | 1 | -23 | +23 |
Losses are output by event, output_id and sample. The table looks like this;
+Losses are output by event, output_id and sample. The table looks +like this;
| event_id | +event_id | output_id | sidx | -loss | +loss |
|---|---|---|---|---|---|
| 1 | +1 | 1 | 1 | -455.24 | +455.24 |
| 2 | +2 | 1 | 1 | -345.6 | +345.6 |
The output_id is specified by the user in the xref table, and is a unique combination of agg_id and layer_id. For instance;
+The output_id is specified by the user in the xref +table, and is a unique combination of agg_id and layer_id. For +instance;
| output_id | +output_id | agg_id | -layer_id | +layer_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -2 | +2 |
The output_id must be specified consistently with the back-allocation rule. Losses can either output at the contract level or back-allocated to the lowest level, which is item_id, using one of three command line options. There are three meaningful values here – don’t allocate (0) used typically for all levels where a breakdown of losses is not required in output, allocate back to items (1) in proportion to the input (ground up) losses, or allocate back to items (2) in proportion to the losses from the prior level calculation.
-$ fmcalc -a0 # Losses are output at the contract level and not back-allocated
+The output_id must be specified consistently with the back-allocation
+rule. Losses can either output at the contract level or back-allocated
+to the lowest level, which is item_id, using one of three command line
+options. There are three meaningful values here – don’t allocate (0)
+used typically for all levels where a breakdown of losses is not
+required in output, allocate back to items (1) in proportion to the
+input (ground up) losses, or allocate back to items (2) in proportion to
+the losses from the prior level calculation.
+$ fmcalc -a0 # Losses are output at the contract level and not back-allocated
$ fmcalc -a1 # Losses are back-allocated to items on the basis of the input losses (e.g. ground up loss)
-$ fmcalc -a2 # Losses are back-allocated to items on the basis of the prior level losses
-
-The rules for specifying the output_ids in the xref table are as follows;
+$ fmcalc -a2 # Losses are back-allocated to items on the basis of the prior level losses +The rules for specifying the output_ids in the xref table are as +follows;
To make sense of this, if there is more than one output at the contract level, then each one must be back-allocated to all of the items, with each individual loss represented by a unique output_id.
-To avoid unnecessary computation, it is recommended not to back-allocate unless losses are required to be reported at a more detailed level than the contract level (site or zip code, for example). In this case, losses are re-aggregated up from item level (represented by output_id in fmcalc output) in summarycalc, using the fmsummaryxref table.
+To make sense of this, if there is more than one output at the +contract level, then each one must be back-allocated to all of the +items, with each individual loss represented by a unique output_id.
+To avoid unnecessary computation, it is recommended not to +back-allocate unless losses are required to be reported at a more +detailed level than the contract level (site or zip code, for example). +In this case, losses are re-aggregated up from item level (represented +by output_id in fmcalc output) in summarycalc, using the fmsummaryxref +table.
The first run of fmcalc is designed to calculate the primary or direct insurance losses from the ground up losses of an exposure portfolio. fmcalc has been designed to be recursive, so that the 'gross' losses from the first run can be streamed back in to second and subsequent runs of fmcalc, each time with a different set of input files representing reinsurance contracts, and can output either the reinsurance gross loss, or net loss. There are two modes of output;
+The first run of fmcalc is designed to calculate the primary or +direct insurance losses from the ground up losses of an exposure +portfolio. fmcalc has been designed to be recursive, so that the 'gross' +losses from the first run can be streamed back in to second and +subsequent runs of fmcalc, each time with a different set of input files +representing reinsurance contracts, and can output either the +reinsurance gross loss, or net loss. There are two modes of output;
net loss is output when the command line parameter -n is used, otherwise output loss is gross by default.
+net loss is output when the command line parameter -n is used, +otherwise output loss is gross by default.
The types of reinsurance supported by the Financial Module are;
Second and subsequent runs of fmcalc require the same four fm files fm_programme, fm_policytc, fm_profile, and fm_xref.
-This time, the hierarchy specified in fm_programme must be consistent with the range of output_ids from the incoming stream of losses, as specified in the fm_xref file from the previous iteration. Specifically, this means the range of values in from_agg_id at level 1 must match the range of values in output_id.
+Second and subsequent runs of fmcalc require the same four fm files +fm_programme, fm_policytc, fm_profile, and fm_xref.
+This time, the hierarchy specified in fm_programme must be consistent +with the range of output_ids from the incoming stream of losses, as +specified in the fm_xref file from the previous iteration. Specifically, +this means the range of values in from_agg_id at level 1 must match the +range of values in output_id.
For example;
fm_xref (iteration 1)
| output_id | +output_id | agg_id | -layer_id | +layer_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -2 | +2 |
| from_agg_id | +from_agg_id | level_id | -to_agg_id | +to_agg_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -2 | +2 |
| 1 | +1 | 2 | -1 | +1 |
| 2 | +2 | 2 | -1 | +1 |
The abstraction of from_agg_id at level 1 from item_id means that losses needn't be back-allocated to item_id after every iteration of fmcalc. In fact, performance will be improved when back-allocation is minimised.
-Using the two layer example from above, here's an example of the fm files for a simple quota share treaty with 50% ceded and 90% placed covering both policy layers.
-The command to run the direct insurance followed by reinsurance might look like this;
-$ fmcalc -p direct < guls.bin | fmcalc -p ri1 -n > ri1_net.bin
-
-In this command, ground up losses are being streamed into fmcalc to calculate the insurance losses, which are streamed into fmcalc again to calculate the reinsurance net loss. The direct insurance fm files would be located in the folder 'direct' and the reinsurance fm files in the folder 'ri1'. The -n flag in the second call of fmcalc results in net losses being output to the file 'ri1_net.bin'. These are the losses to the insurer net of recoveries from the quota share treaty.
+The abstraction of from_agg_id at level 1 from item_id means that +losses needn't be back-allocated to item_id after every iteration of +fmcalc. In fact, performance will be improved when back-allocation is +minimised.
+Using the two layer example from above, here's an example of the fm +files for a simple quota share treaty with 50% ceded and 90% placed +covering both policy layers.
+The command to run the direct insurance followed by reinsurance might +look like this;
+$ fmcalc -p direct < guls.bin | fmcalc -p ri1 -n > ri1_net.bin
+In this command, ground up losses are being streamed into fmcalc to +calculate the insurance losses, which are streamed into fmcalc again to +calculate the reinsurance net loss. The direct insurance fm files would +be located in the folder 'direct' and the reinsurance fm files in the +folder 'ri1'. The -n flag in the second call of fmcalc results in net +losses being output to the file 'ri1_net.bin'. These are the losses to +the insurer net of recoveries from the quota share treaty.
The fm_xref file from the direct insurance (first) iteration is
fm_xref
| output_id | +output_id | agg_id | -layer_id | +layer_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -2 | +2 |
The fm files for the reinsurance (second) iteration would be as follows;
+The fm files for the reinsurance (second) iteration would be as +follows;
fm_programme
| from_agg_id | +from_agg_id | level_id | -to_agg_id | +to_agg_id |
|---|---|---|---|---|
| 1 | +1 | 1 | -1 | +1 |
| 2 | +2 | 1 | -1 | +1 |
fm_profile
-| policytc_id | +policytc_id | calcrule_id | deductible1 | deductible2 | @@ -948,12 +975,12 @@share1 | share2 | -share3 | +share3 |
|---|---|---|---|---|---|---|---|---|
| 1 | +1 | 25 | 0 | 0 | @@ -962,7 +989,7 @@0.5 | 0.9 | -1 | +1 |
The Financial Module can support unlimited inuring priority levels for reinsurance. Each set of contracts with equal inuring priority would be calculated in one iteration. The net losses from the first inuring priority are streamed into the second inuring priority calculation, and so on.
-Where there are multiple contracts with equal inuring priority, these are implemented as layers with a single iteration.
+The Financial Module can support unlimited inuring priority levels +for reinsurance. Each set of contracts with equal inuring priority would +be calculated in one iteration. The net losses from the first inuring +priority are streamed into the second inuring priority calculation, and +so on.
+Where there are multiple contracts with equal inuring priority, these +are implemented as layers with a single iteration.
The net calculation for iterations with multiple layers is;
-net loss = max(0, input loss - layer1 loss - layer2 loss - ... - layer n loss)
+net loss = max(0, input loss - layer1 loss - layer2 loss - ... - +layer n loss)
- - - - - + + + + diff --git a/docs/html/Introduction.html b/docs/html/Introduction.html index a7a2d9dd..0682d97f 100644 --- a/docs/html/Introduction.html +++ b/docs/html/Introduction.html @@ -1,400 +1,333 @@ - - - -

The in-memory solution for the Oasis Kernel is called the kernel tools or “ktools”. ktools is an independent “specification” of a set of processes which means that it defines the processing architecture and data structures. The framework is implemented as a set of components called the “reference model” which can then be adapted for particular model or business needs.
-The code can be compiled in Linux, POSIX-compliant Windows and native Windows. The installation instructions can be found in README.md.
+The in-memory solution for the Oasis Kernel is called the kernel +tools or “ktools”. ktools is an independent “specification” of a set of +processes which means that it defines the processing architecture and +data structures. The framework is implemented as a set of components +called the “reference model” which can then be adapted for particular +model or business needs.
+The code can be compiled in Linux, POSIX-compliant Windows and native +Windows. The installation instructions can be found in README.html.
The Kernel performs the core Oasis calculations of computing effective damageability distributions, Monte-Carlo sampling of ground up loss, the financial module calculations, which apply insurance policy terms and conditions to the sampled losses, and finally some common catastrophe model outputs.
-The early releases of Oasis used a SQL-compliant database to perform all calculations. Release 1.3 included the first “in-memory” version of the Oasis Kernel written in C++ and C to provide streamed calculation at high computational performance, as an alternative to the database calculation. The scope of the in-memory calculation was for the most intensive Kernel calculations of ground up loss sampling and the financial module. This in-memory variant was first delivered as a stand-alone toolkit "ktools" with R1.4.
-With R1.5, a Linux-based in-memory calculation back-end was released, using the reference model components of ktools. The range of functionality of ktools was still limited to ground up loss sampling, the financial module and single output workflows, with effective damage distributions and output calculations still being performed in a SQL-compliant database.
-In 2016 the functionality of ktools was extended to include the full range of Kernel calculations, including effective damageability distribution calculations and a wider range of financial module and output calculations. The data stream structures and input data formats were also substantially revised to handle multi-peril models, user-definable summary levels for outputs, and multiple output workflows.
-In 2018 the Financial Module was extended to perform net loss calculations for per occurrence forms of reinsurance, including facultative reinsurance, quota share, surplus share, per risk and catastrophe excess of loss treaties.
+The Kernel performs the core Oasis calculations of computing +effective damageability distributions, Monte-Carlo sampling of ground up +loss, the financial module calculations, which apply insurance policy +terms and conditions to the sampled losses, and finally some common +catastrophe model outputs.
+The early releases of Oasis used a SQL-compliant database to perform +all calculations. Release 1.3 included the first “in-memory” version of +the Oasis Kernel written in C++ and C to provide streamed calculation at +high computational performance, as an alternative to the database +calculation. The scope of the in-memory calculation was for the most +intensive Kernel calculations of ground up loss sampling and the +financial module. This in-memory variant was first delivered as a +stand-alone toolkit "ktools" with R1.4.
+With R1.5, a Linux-based in-memory calculation back-end was released, +using the reference model components of ktools. The range of +functionality of ktools was still limited to ground up loss sampling, +the financial module and single output workflows, with effective damage +distributions and output calculations still being performed in a +SQL-compliant database.
+In 2016 the functionality of ktools was extended to include the full +range of Kernel calculations, including effective damageability +distribution calculations and a wider range of financial module and +output calculations. The data stream structures and input data formats +were also substantially revised to handle multi-peril models, +user-definable summary levels for outputs, and multiple output +workflows.
+In 2018 the Financial Module was extended to perform net loss +calculations for per occurrence forms of reinsurance, including +facultative reinsurance, quota share, surplus share, per risk and +catastrophe excess of loss treaties.
The Kernel is provided as a toolkit of components (“ktools”) which can be invoked at the command line. Each component is a separately compiled executable with a binary data stream of inputs and/or outputs.
-The principle is to stream data through the calculations in memory, starting with generating the damage distributions and ending with calculating the user's required result, before writing the output to disk. This is done on an event-by-event basis, which means at any one time the compute server only has to hold the model data for a single event in its memory, per process. The user can run the calculation across multiple processes in parallel, specifiying the analysis workfkow and number of processes in a script file appropriate to the operating system.
+The Kernel is provided as a toolkit of components (“ktools”) which +can be invoked at the command line. Each component is a separately +compiled executable with a binary data stream of inputs and/or +outputs.
+The principle is to stream data through the calculations in memory, +starting with generating the damage distributions and ending with +calculating the user's required result, before writing the output to +disk. This is done on an event-by-event basis, which means at any one +time the compute server only has to hold the model data for a single +event in its memory, per process. The user can run the calculation +across multiple processes in parallel, specifiying the analysis workfkow +and number of processes in a script file appropriate to the operating +system.
The components can be written in any language as long as the data structures of the binary streams are adhered to. The current set of components have been written in POSIX-compliant C++. This means that they can be compiled in Linux and Windows using the latest GNU compiler toolchain.
+The components can be written in any language as long as the data +structures of the binary streams are adhered to. The current set of +components have been written in POSIX-compliant C++. This means that +they can be compiled in Linux and Windows using the latest GNU compiler +toolchain.
The components in the Reference Model can be summarized as follows;
+The components in the Reference Model can be summarized as +follows;
Standard piping syntax can be used to invoke the components at the command line. It is the same syntax in Windows DOS, Linux terminal or Cygwin (a Linux emulator for Windows). For example the following command invokes eve, getmodel, gulcalc, fmcalc, summarycalc and eltcalc, and exports an event loss table output to a csv file.
-$ eve 1 1 | getmodel | gulcalc -r –S100 -a1 –i - | fmcalc | summarycalc -f -1 - | eltcalc > elt.csv
-
-Example python scripts are provided along with a binary data package in the /examples folder to demonstrate usage of the toolkit. For more guidance on how to use the toolkit, see Workflows.
-Go to 2. Data streaming architecture overview
- - - - +Standard piping syntax can be used to invoke the components at the +command line. It is the same syntax in Windows DOS, Linux terminal or +Cygwin (a Linux emulator for Windows). For example the following command +invokes eve, getmodel, gulcalc, fmcalc, summarycalc and eltcalc, and +exports an event loss table output to a csv file.
+$ eve 1 1 | getmodel | gulcalc -r –S100 -a1 –i - | fmcalc | summarycalc -f -1 - | eltcalc > elt.csvExample python scripts are provided along with a binary data package +in the /examples folder to demonstrate usage of the toolkit. For more +guidance on how to use the toolkit, see Workflows.
+Go to 2. Data streaming architecture +overview
+ + + diff --git a/docs/html/MultiPeril.html b/docs/html/MultiPeril.html index 1ad7386a..c737f2cb 100644 --- a/docs/html/MultiPeril.html +++ b/docs/html/MultiPeril.html @@ -1,409 +1,254 @@ - - - -
ktools now supports multi-peril models through the introduction of the coverage_id in the data structures.
-Ground up losses apply at the “Item” level in the Kernel which corresponds to “interest coverage” in business terms, which is the element of financial loss that can be associated with a particular asset. In ktools, item_id represents the element of financial loss and coverage_id represents the asset with its associated total insured value. If there is more than one item per coverage (as defined in the items data) then each item represents an element of financial loss from a particular peril contributing to the total loss for the asset. For each item, the identification of the peril is held in the areaperil_id, which is a unique key representing a combination of the location (area) and peril.
-Ground up losses are calculated by multiplying the damage ratio for an item by the total insured value of its associated coverage (defined in the coverages data). The questions are then; how are these losses combined across items, and how are they correlated?
-There are a few ways in which losses can be combined and the first example in ktools uses a simple rule, which is to sum the losses for each coverage and cap the overall loss to the total insured value. This is what you get when you use the -c parameter in gulcalc to output losses by 'coverage'.
-In v3.1.0 the method of combining losses became function-driven using the gulcalc command line parameter -a as a few standard approaches have emerged. These are;
+ + + + + + +
ktools now supports multi-peril models through the introduction of +the coverage_id in the data structures.
+Ground up losses apply at the “Item” level in the Kernel which +corresponds to “interest coverage” in business terms, which is the +element of financial loss that can be associated with a particular +asset. In ktools, item_id represents the element of financial loss and +coverage_id represents the asset with its associated total insured +value. If there is more than one item per coverage (as defined in the +items data) then each item represents an element of financial loss from +a particular peril contributing to the total loss for the asset. For +each item, the identification of the peril is held in the areaperil_id, +which is a unique key representing a combination of the location (area) +and peril.
+Ground up losses are calculated by multiplying the damage ratio for +an item by the total insured value of its associated coverage (defined +in the coverages data). The questions are then; how are these losses +combined across items, and how are they correlated?
+There are a few ways in which losses can be combined and the first +example in ktools uses a simple rule, which is to sum the losses for +each coverage and cap the overall loss to the total insured value. This +is what you get when you use the -c parameter in gulcalc to output +losses by 'coverage'.
+In v3.1.0 the method of combining losses became function-driven using +the gulcalc command line parameter -a as a few standard approaches have +emerged. These are;
| Allocation option | -Description | +Allocation option | +Description |
|---|---|---|---|
| 0 | -Do nothing (suitable for single sub-peril models with one item per coverage) | +0 | +Do nothing (suitable for single sub-peril +models with one item per coverage) |
| 1 | -Sum damage ratios and cap to 1. Back-allocate in proportion to contributing subperil loss | +1 | +Sum damage ratios and cap to 1. +Back-allocate in proportion to contributing subperil loss |
| 2 | -Multiplicative method for combining damage. Back-allocate in proportion to contributing subperil loss | +2 | +Total damage = maximum subperil damage. +Back-allocate all to the maximum contributing subperil loss |
| 3 | -Total damage = maximum subperil damage. Back-allocate all to the maximum contributing subperil loss | +3 | +Multiplicative method for combining +damage. Back-allocate in proportion to contributing subperil loss |
Allocation option 1 has been implemented in v3.1.0.
-Correlation of item damage is generic in ktools, as damage can either be 100% correlated or independent (see Appendix A Random Numbers). This is no different in the multi-peril case when items represent different elements of financial loss to the same asset, rather than different assets. More sophisticated methods of multi-peril correlation have been implemented for particular models, but as yet no standard approach has been implemented in ktools.
-Note that ground up losses by item can be passed into the financial module unallocated (allocation method 0) using the gulcalc option -i, or allocated using the gulcalc option -a1 -i. If the item ground up losses are passed though unallocated then the limit of total insured value must be applied as part of the financial module calculations, to prevent the ground up loss exceeding the coverage TIV.
+Allocation options 0,1,2 have been implemented to date.
+Correlation of item damage is generic in ktools, as damage can either +be 100% correlated or independent (see Appendix A Random Numbers). This is no +different in the multi-peril case when items represent different +elements of financial loss to the same asset, rather than different +assets. More sophisticated methods of multi-peril correlation have been +implemented for particular models, but as yet no standard approach has +been implemented in ktools.
+Note that ground up losses by item can be passed into the financial +module unallocated (allocation method 0) using the gulcalc option -i, or +allocated using the gulcalc option -a1 -i. If the item ground up losses +are passed though unallocated then the limit of total insured value must +be applied as part of the financial module calculations, to prevent the +ground up loss exceeding the coverage TIV.
- - - - + + + diff --git a/docs/html/ORDOutputComponents.html b/docs/html/ORDOutputComponents.html new file mode 100644 index 00000000..ba380332 --- /dev/null +++ b/docs/html/ORDOutputComponents.html @@ -0,0 +1,1390 @@ + + + + + + +As well as the set of legacy outputs described in +OutputComponents.html, ktools also supports Open Results Data "ORD" output +calculations and reports.
+Open Results Data is a data standard for catastrophe loss model +results developed as part of Open Data Standards "ODS". ODS is curated +by OasisLMF and governed by the Open Data Standards Steering Committee +(SC), comprised of industry experts representing (re)insurers, brokers, +service providers and catastrophe model vendors. More information about +ODS can be found here.
+ktools supports a subset of the fields in each of the ORD reports, +which are given in more detail below. In most cases, the existing +components for legacy outputs are used to generate ORD format outputs +when called with extra command line switches, although there is a +dedicated component call ordleccalc to generate all of the EPT reports. +In overview, here are the mappings from component to ORD report:
+Summarycalctocsv takes the summarycalc loss stream, which contains +the individual loss samples by event and summary_id, and outputs them in +ORD format. Summarycalc is a core component that aggregates the +individual building or coverage loss samples into groups that are of +interest from a reporting perspective. This is covered in Core Components
+$ [stdin component] | summarycalctocsv [parameters] > selt.csv
+$ summarycalctocsv [parameters] > selt.csv < [stdin].bin
+$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | summarycalc -i -1 - | summarycalctocsv -o > selt.csv
+$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | summarycalc -i -1 - | summarycalctocsv -p selt.parquet
+$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | summarycalc -i -1 - | summarycalctocsv -p selt.parquet -o > selt.csv
+$ summarycalctocsv -o > selt.csv < summarycalc.bin
+$ summarycalctocsv -p selt.parquet < summarycalc.bin
+$ summarycalctocsv -p selt.parquet -o > selt.csv < summarycalc.bin
+None.
+The Sample ELT output is a csv file with the following fields;
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| EventId | +int | +4 | +Model event_id | +45567 | +
| SummaryId | +int | +4 | +SummaryId representing a grouping of +losses | +10 | +
| SampleId | +int | +4 | +The sample number | +2 | +
| Loss | +float | +4 | +The loss sample | +13645.78 | +
| ImpactedExposure | +float | +4 | +Exposure value impacted by the event for +the sample | +70000 | +
The program calculates loss by SummaryId and EventId. There are two +variants (in addition to the sample variant SELT output by summarycalc, +above);
+$ [stdin component] | eltcalc -M [filename.csv] -Q [filename.csv] -m [filename.parquet] -q [filename.parquet]
+$ eltcalc -M [filename.csv] -Q [filename.csv] -m [filename.parquet] -q [filename.parquet] < [stdin].bin
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | eltcalc -M MELT.csv -Q QELT.csv
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | eltcalc -m MELT.parquet -q QELT.parquet
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | eltcalc -M MELT.csv -Q QELT.csv -m MELT.parquet -q QELT.parquet
+$ eltcalc -M MELT.csv -Q QELT.csv < summarycalc.bin
+$ eltcalc -m MELT.parquet -Q QELT.parquet < summarycalc.bin
+$ eltcalc -M MELT.csv -Q QELT.csv -m MELT.parquet -q QELT.parquet < summarycalc.bin
+The Quantile report requires the quantile.bin file
+For each SummaryId and EventId, the sample mean and standard +deviation is calculated from the sampled losses in the summarycalc +stream and output to file. The analytical mean is also output as a +seperate record, differentiated by a 'SampleType' field. Variations of +the exposure value are also output (see below for details).
+For each SummaryId and EventId, this report provides the probability +and the corresponding loss quantile computed from the samples. The list +of probabilities is provided as input in the quantile.bin file.
+Quantiles are cut points dividing the range of a probability +distribution into continuous intervals with equal probabilities, or +dividing the observations in a sample set in the same way. In this case +we are computing the quantiles of loss from the sampled losses by event +and summary for a user-provided list of probabilities. For each provided +probability p, the loss quantile is the sampled loss which is bigger +than the proportion p of the observed samples.
+In practice this is calculated by sorting the samples in ascending +order of loss and using linear interpolation between the ordered +observations to compute the precise loss quantile for the required +probability.
+The algorithm used for the quantile estimate type and interpolation +scheme from a finite sample set is R-7 referred to in Wikipedia https://en.wikipedia.org/wiki/Quantile
+If p is the probability, and the sample size is N, then the position +of the ordered samples required for the quantile is computed by;
+(N-1)p + 1
+In general, this value will be a fraction rather than an integer, +representing a value in between two ordered samples. Therefore for an +integer value of k between 1 and N-1 with k < (N-1)p + 1 < k+1 , +the loss quantile Q(p) is calculated by a linear interpolation of the +kth ordered sample X(k) and the k+1 th ordered sample X(k+1) as +follows;
+Q(p) = X(k) * (1-h) + X(k+1) * h
+where h = (N-1)p + 1 - k
+The Moment ELT output is a csv file with the following fields;
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| EventId | +int | +4 | +Model event_id | +45567 | +
| SummaryId | +int | +4 | +SummaryId representing a grouping of +losses | +10 | +
| SampleType | +int | +4 | +1 for analytical mean, 2 for sample +mean | +2 | +
| EventRate | +float | +4 | +Annual frequency of event computed by +relative frequency of occurrence | +0.01 | +
| ChanceOfLoss | +float | +4 | +Probability of a loss calculated from the +effective damage distributions | +0.95 | +
| MeanLoss | +float | +4 | +Mean | +1345.678 | +
| SDLoss | +float | +4 | +Sample standard deviation for +SampleType=2 | +945.89 | +
| MaxLoss | +float | +4 | +Maximum possible loss calculated from the +effective damage distribution | +75000 | +
| FootprintExposure | +float | +4 | +Exposure value impacted by the model's +event footprint | +80000 | +
| MeanImpactedExposure | +float | +4 | +Mean exposure impacted by the event across +the samples (where loss > 0 ) | +65000 | +
| MaxImpactedExposure | +float | +4 | +Maximum exposure impacted by the event +across the samples (where loss > 0) | +70000 | +
The Quantile ELT output is a csv file with the following fields;
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| EventId | +int | +4 | +Model event_id | +45567 | +
| SummaryId | +int | +4 | +SummaryId representing a grouping of +losses | +10 | +
| Quantile | +float | +4 | +The probability associated with the loss +quantile | +0.9 | +
| Loss | +float | +4 | +The loss quantile | +1345.678 | +
The program calculates loss by Period, EventId and SummaryId and +outputs the results in ORD format. There are three variants;
+$ [stdin component] | pltcalc -S [filename.csv] -M [filename.csv] -Q [filename.csv] -s [filename.parquet] -m [filename.parquet] -q [filename.parquet]
+$ pltcalc -S [filename.csv] -M [filename.csv] -Q [filename.csv] -s [filename.parquet] -m [filename.parquet] -q [filename.parquet] < [stdin].bin
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | pltcalc -S SPLT.csv -M MPLT.csv -Q QPLT.csv
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | pltcalc -s SPLT.parquet -m MPLT.parquet -q QPLT.parquet
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | pltcalc -S SPLT.csv -M MPLT.csv -Q QPLT.csv -s SPLT.parquet -m MPLT.parquet -q QPLT.parquet
+$ pltcalc -S SPLT.csv -M MPLT.csv -Q QPLT.csv < summarycalc.bin
+$ pltcalc -s SPLT.parquet -m MPLT.parquet -q QPLT.parquet < summarycalc.bin
+$ pltcalc -S SPLT.csv -M MPLT.csv -Q QPLT.csv -s SPLT.parquet -m MPLT.parquet -q QPLT.parquet < summarycalc.bin
+pltcalc requires the occurrence.bin file
+The Quantile report additionally requires the quantile.bin file
+pltcalc will optionally use the following file if present
+For each Period, EventId and SummaryId, the individual loss samples +are output by SampleId. The sampled event losses from the summarycalc +stream are assigned to a Period for each occurrence of the EventId in +the occurrence file.
+For each Period, EventId and SummaryId, the sample mean and standard +deviation is calculated from the sampled event losses in the summarycalc +stream and output to file. The analytical mean is also output as a +seperate record, differentiated by a 'SampleType' field. Variations of +the exposure value are also output (see below for more details).
+For each Period, EventId and SummaryId, this report provides the +probability and the corresponding loss quantile computed from the +samples. The list of probabilities is provided in the quantile.bin +file.
+See QELT for the method of computing the loss quantiles.
+The Sample PLT output is a csv with the folling fields
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| Period | +int | +4 | +The period in which the event occurs | +500 | +
| PeriodWeight | +int | +4 | +The weight of the period (frequency +relative to the total number of periods) | +0.001 | +
| EventId | +int | +4 | +Model event_id | +45567 | +
| Year | +int | +4 | +The year in which the event occurs | +1970 | +
| Month | +int | +4 | +The month number in which the event +occurs | +5 | +
| Day | +int | +4 | +The day number in which the event +occurs | +22 | +
| Hour | +int | +4 | +The hour in which the event occurs | +11 | +
| Minute | +int | +4 | +The minute in which the event occurs | +45 | +
| SummaryId | +int | +4 | +SummaryId representing a grouping of +losses | +10 | +
| SampleId | +int | +4 | +The sample number | +2 | +
| Loss | +float | +4 | +The loss sample | +13645.78 | +
| ImpactedExposure | +float | +4 | +Exposure impacted by the event for the +sample | +70000 | +
The Moment PLT output is a csv file with the following fields;
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| Period | +int | +4 | +The period in which the event occurs | +500 | +
| PeriodWeight | +int | +4 | +The weight of the period (frequency +relative to the total number of periods) | +0.001 | +
| EventId | +int | +4 | +Model event_id | +45567 | +
| Year | +int | +4 | +The year in which the event occurs | +1970 | +
| Month | +int | +4 | +The month number in which the event +occurs | +5 | +
| Day | +int | +4 | +The day number in which the event +occurs | +22 | +
| Hour | +int | +4 | +The hour in which the event occurs | +11 | +
| Minute | +int | +4 | +The minute in which the event occurs | +45 | +
| SummaryId | +int | +4 | +SummaryId representing a grouping of +losses | +10 | +
| SampleType | +int | +4 | +1 for analytical mean, 2 for sample +mean | +2 | +
| ChanceOfLoss | +float | +4 | +Probability of a loss calculated from the +effective damage distributions | +0.95 | +
| MeanLoss | +float | +4 | +Mean | +1345.678 | +
| SDLoss | +float | +4 | +Sample standard deviation for +SampleType=2 | +945.89 | +
| MaxLoss | +float | +4 | +Maximum possible loss calculated from the +effective damage distribution | +75000 | +
| FootprintExposure | +float | +4 | +Exposure value impacted by the model's +event footprint | +80000 | +
| MeanImpactedExposure | +float | +4 | +Mean exposure impacted by the event across +the samples (where loss > 0 ) | +65000 | +
| MaxImpactedExposure | +float | +4 | +Maximum exposure impacted by the event +across the samples (where loss > 0) | +70000 | +
The Quantile PLT output is a csv file with the following fields;
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| Period | +int | +4 | +The period in which the event occurs | +500 | +
| PeriodWeight | +int | +4 | +The weight of the period (frequency +relative to the total number of periods) | +0.001 | +
| EventId | +int | +4 | +Model event_id | +45567 | +
| Year | +int | +4 | +The year in which the event occurs | +1970 | +
| Month | +int | +4 | +The month number in which the event +occurs | +5 | +
| Day | +int | +4 | +The day number in which the event +occurs | +22 | +
| Hour | +int | +4 | +The hour in which the event occurs | +11 | +
| Minute | +int | +4 | +The minute in which the event occurs | +45 | +
| SummaryId | +int | +4 | +SummaryId representing a grouping of +losses | +10 | +
| Quantile | +float | +4 | +The probability associated with the loss +quantile | +0.9 | +
| Loss | +float | +4 | +The loss quantile | +1345.678 | +
This component produces several variants of loss exceedance curves, +known as Exceedance Probability Tables "EPT" under ORD.
+An Exceedance Probability Table is a set of user-specified +percentiles of (typically) annual loss on one of two bases – AEP (sum of +losses from all events in a year) or OEP (maximum of any one event’s +losses in a year). In ORD the percentiles are expressed as Return +Periods, which is the reciprocal of the percentile.
+How EPTs are derived in general depends on the mathematical +methodology of calculating the underlying ground up and insured +losses.
+In the Oasis kernel the methodology is Monte Carlo sampling from +damage distributions, which results in several samples (realisations) of +an event loss for every event in the model's catalogue. The event losses +are assigned to a year timeline and the years are rank ordered by loss. +The method of computing the percentiles is by taking the ratio of the +frequency of years with a loss exceeding a given threshold over the +total number of years.
+The OasisLMF approach gives rise to five variations of calculation of +these statistics:
+Exceedance Probability Tables are further generalised in Oasis to +represent not only annual loss percentiles but loss percentiles over any +period of time. Thus the typical use of 'Year' label in outputs is +replaced by the more general term 'Period', which can be any period of +time as defined in the model data 'occurrence' file (although the normal +period of interest is a year).
+An optional parameter is;
+$ ordleccalc [parameters]
+
+'First generate summarycalc binaries by running the core workflow, for the required summary set
+$ eve 1 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc1.bin
+$ eve 2 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc2.bin
+
+'Then run ordleccalc, pointing to the specified sub-directory of work containing summarycalc binaries.
+'Write aggregate and occurrence full uncertainty
+$ ordleccalc -Ksummary1 -F -f -O ept.csv
+$ ordleccalc -Ksummary1 -F -f -P ept.parquet
+$ ordleccalc -Ksummary1 -F -f -O ept.csv -P ept.parquet
+
+'Write occurrence per sample (PSEPT)
+$ ordleccalc -Ksummary1 -w -o psept.csv
+$ ordleccalc -Ksummary1 -w -p psept.parquet
+$ ordleccalc -Ksummary1 -w -o psept.csv -p psept.parquet
+
+'Write aggregate and occurrence per sample (written to PSEPT) and per sample mean (written to EPT file)
+$ ordleccalc -Ksummary1 -W -w -M -m -O ept.csv -o psept.csv
+$ ordleccalc -Ksummary1 -W -w -M -m -P ept.parquet -p psept.parquet
+$ ordleccalc -Ksummary1 -W -w -M -m -O ept.csv -o psept.csv -P ept.parquet -p psept.parquet
+
+'Write full output
+$ ordleccalc -Ksummary1 -F -f -W -w -S -s -M -m -O ept.csv -o psept.csv
+$ ordleccalc -Ksummary1 -F -f -W -w -S -s -M -m -P ept.parquet -p psept.parquet
+$ ordleccalc -Ksummary1 -F -f -W -w -S -s -M -m -O ept.csv -o pseept.csv -P ept.parquet -p psept.parquet
+ordleccalc requires the occurrence.bin file
+and will optionally use the following additional files if present
+ordleccalc does not have a standard input that can be streamed in. +Instead, it reads in summarycalc binary data from a file in a fixed +location. The format of the binaries must match summarycalc standard +output. The location is in the 'work' subdirectory of the present +working directory. For example;
+The user must ensure the work subdirectory exists. The user may also +specify a subdirectory of /work to store these files. e.g.
+The reason for ordleccalc not having an input stream is that the +calculation is not valid on a subset of events, i.e. within a single +process when the calculation has been distributed across multiple +processes. It must bring together all event losses before assigning +event losses to periods and ranking losses by period. The summarycalc +losses for all events (all processes) must be written to the /work +folder before running leccalc.
+All files with extension .bin from the specified subdirectory are +read into memory, as well as the occurrence.bin. The summarycalc losses +are grouped together and sampled losses are assigned to period according +to which period the events are assigned to in the occurrence file.
+If multiple events occur within a period;
+The 'EPType' field in the output identifies the basis of loss +exceedance curve.
+The 'EPTypes' are;
+TVAR results are generated automatically if the OEP or AEP report is +selected in the analysis options. TVAR, or Tail Conditional Expectation +(TCE), is computed by averaging the rank ordered losses exceeding a +given return period loss from the respective OEP or AEP result.
+Then the calculation differs by EPCalc type, as follows;
+The mean damage loss (sidx = -1) is output as a standard +exceedance probability table. If the calculation is run with 0 samples, +then leccalc will still return the mean damage loss exceedance +curve.
Full uncertainty - all losses by period are rank ordered to +produce a single loss exceedance curve.
Per Sample mean - the return period losses from the Per Sample +EPT are averaged, which produces a single loss exceedance +curve.
Sample mean - the losses by period are first averaged across the +samples, and then a single loss exceedance table is created from the +period sample mean losses.
All four of the above variants are output into the same file when +selected.
+Finally, the fifth variant, the Per Sample EPT is output to a +separate file. In this case, for each sample, losses by period are rank +ordered to produce a loss exceedance curve for each sample.
+Exceedance Probability Tables (EPT)
+csv files with the following fields;
+Exceedance Probability Table (EPT)
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| SummaryId | +int | +4 | +identifier representing a summary level +grouping of losses | +10 | +
| EPCalc | +int | +4 | +1, 2, 3 or 4 with meanings as given +above | +2 | +
| EPType | +int | +4 | +1, 2, 3 or 4 with meanings as given +above | +1 | +
| ReturnPeriod | +float | +4 | +return period interval | +250 | +
| loss | +float | +4 | +loss exceedance threshold or TVAR for +return period | +546577.8 | +
Per Sample Exceedance Probability Tables (PSEPT)
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| SummaryId | +int | +4 | +identifier representing a summary level +grouping of losses | +10 | +
| SampleID | +int | +4 | +Sample number | +20 | +
| EPType | +int | +4 | +1, 2, 3 or 4 | +3 | +
| ReturnPeriod | +float | +4 | +return period interval | +250 | +
| loss | +float | +4 | +loss exceedance threshold or TVAR for +return period | +546577.8 | +
An additional feature of ordleccalc is available to vary the relative +importance of the period losses by providing a period weightings file to +the calculation. In this file, a weight can be assigned to each period +make it more or less important than neutral weighting (1 divided by the +total number of periods). For example, if the neutral weight for period +1 is 1 in 10000 years, or 0.0001, then doubling the weighting to 0.0002 +will mean that period's loss reoccurrence rate would double. Assuming no +other period losses, the return period of the loss of period 1 in this +example would be halved.
+All period_nos must appear in the file from 1 to P (no gaps). There +is no constraint on the sum of weights. Periods with zero weight will +not contribute any losses to the loss exceedance curve.
+This feature will be invoked automatically if the periods.bin file is +present in the input directory.
+ +aalcalc outputs the Average Loss Table (ALT) which contains the +average annual loss and standard deviation of annual loss by +SummaryId.
+Two types of average and standard deviation of loss are calculated; +analytical (SampleType 1) and sample (SampleType 2). If the analysis is +run with zero samples, then only SampleType 1 statistics are +returned.
+aalcalc requires the occurrence.bin file
+aalcalc does not have a standard input that can be streamed in. +Instead, it reads in summarycalc binary data from a file in a fixed +location. The format of the binaries must match summarycalc standard +output. The location is in the 'work' subdirectory of the present +working directory. For example;
+The user must ensure the work subdirectory exists. The user may also +specify a subdirectory of /work to store these files. e.g.
+The reason for aalcalc not having an input stream is that the +calculation is not valid on a subset of events, i.e. within a single +process when the calculation has been distributed across multiple +processes. It must bring together all event losses before assigning +event losses to periods and finally computing the final statistics.
+$ aalcalc [parameters] > alt.csv
+'First generate summarycalc binaries by running the core workflow, for the required summary set
+$ eve 1 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc1.bin
+$ eve 2 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc2.bin
+
+'Then run aalcalc, pointing to the specified sub-directory of work containing summarycalc binaries.
+$ aalcalc -o -Ksummary1 > alt.csv
+$ aalcalc -p alt.parquet -Ksummary1
+$ allcalc -o -p alt.parquet -Ksummary1 > alt.csv
+csv file containing the following fields;
+| Name | +Type | +Bytes | +Description | +Example | +
|---|---|---|---|---|
| SummaryId | +int | +4 | +SummaryId representing a grouping of +losses | +10 | +
| SampleType | +int | +4 | +1 for analytical statistics, 2 for sample +statistics | +1 | +
| MeanLoss | +float | +8 | +average annual loss | +6785.9 | +
| SDLoss | +float | +8 | +standard deviation of loss | +54657.8 | +
The occurrence file and summarycalc files from the specified +subdirectory are read into memory. Event losses are assigned to period +according to which period the events occur in and summed by period and +by sample.
+For type 1, the mean and standard deviation of numerically integrated +mean period losses are calculated across the periods. For type 2 the +mean and standard deviation of the sampled period losses are calculated +across all samples (sidx > 1) and periods.
+An additional feature of aalcalc is available to vary the relative +importance of the period losses by providing a period weightings file to +the calculation. In this file, a weight can be assigned to each period +make it more or less important than neutral weighting (1 divided by the +total number of periods). For example, if the neutral weight for period +1 is 1 in 10000 years, or 0.0001, then doubling the weighting to 0.0002 +will mean that period's loss reoccurrence rate would double and the loss +contribution to the average annual loss would double.
+All period_nos must appear in the file from 1 to P (no gaps). There +is no constraint on the sum of weights. Periods with zero weight will +not contribute any losses to the AAL.
+This feature will be invoked automatically if the periods.bin file is +present in the input directory.
+ +Go to 4.4 Data conversion +components section
+ + + diff --git a/docs/html/OutputComponents.html b/docs/html/OutputComponents.html index 8dcf4dc8..b03e91b2 100644 --- a/docs/html/OutputComponents.html +++ b/docs/html/OutputComponents.html @@ -1,489 +1,338 @@ - - - -The program calculates mean and standard deviation of loss by summary_id and by event_id.
+ + + + + + +The program calculates mean and standard deviation of loss by +summary_id and by event_id.
None
$ [stdin component] | eltcalc > elt.csv
-$ eltcalc < [stdin].bin > elt.csv
-
+$ [stdin component] | eltcalc > elt.csv
+$ eltcalc < [stdin].bin > elt.csv
$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | eltcalc > elt.csv
-$ eltcalc < summarycalc.bin > elt.csv
-
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - | eltcalc > elt.csv
+$ eltcalc < summarycalc.bin > elt.csv
No additional data is required, all the information is contained within the input stream.
+No additional data is required, all the information is contained +within the input stream.
For each summary_id and event_id, the sample mean and standard deviation is calculated from the sampled losses in the summarycalc stream and output to file. The analytical mean is also output as a seperate record, differentiated by a 'type' field. The exposure_value, which is carried in the event_id, summary_id header of the stream is also output.
+For each summary_id and event_id, the sample mean and standard +deviation is calculated from the sampled losses in the summarycalc +stream and output to file. The analytical mean is also output as a +seperate record, differentiated by a 'type' field. The exposure_value, +which is carried in the event_id, summary_id header of the stream is +also output.
csv file with the following fields;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| summary_id | +summary_id | int | 4 | -summary_id representing a grouping of losses | -10 | +summary_id representing a grouping of +losses | +10 |
| type | +type | int | 4 | -1 for analytical mean, 2 for sample mean | -2 | +1 for analytical mean, 2 for sample +mean | +2 |
| event_id | +event_id | int | 4 | -Oasis event_id | -45567 | +Oasis event_id | +45567 |
| mean | +mean | float | 4 | -mean | -1345.678 | +mean | +1345.678 |
| standard_deviation | +standard_deviation | float | 4 | -sample standard deviation | -945.89 | +sample standard deviation | +945.89 |
| exposure_value | +exposure_value | float | 4 | -exposure value for summary_id affected by the event | -70000 | +exposure value for summary_id affected by +the event | +70000 |
Loss exceedance curves, also known as exceedance probability curves, are computed by a rank ordering a set of losses by period and computing the probability of exceedance for each level of loss based on relative frequency. Losses are first assigned to periods of time (typically years) by reference to the occurrence file which contains the event occurrences in each period over a timeline of, say, 10,000 periods. Event losses are summed within each period for an aggregate loss exceedance curve, or the maximum of the event losses in each period is taken for an occurrence loss exceedance curve. From this point, there are a few variants available as follows;
+Loss exceedance curves, also known as exceedance probability curves, +are computed by a rank ordering a set of losses by period and computing +the probability of exceedance for each level of loss based on relative +frequency. Losses are first assigned to periods of time (typically +years) by reference to the occurrence file which +contains the event occurrences in each period over a timeline of, say, +10,000 periods. Event losses are summed within each period for an +aggregate loss exceedance curve, or the maximum of the event losses in +each period is taken for an occurrence loss exceedance curve. From this +point, there are a few variants available as follows;
Wheatsheaf/multiple EP - losses by period are rank ordered for each sample, which produces many loss exceedance curves - one for each sample across the same timeline. The wheatsheaf shows the variation in return period loss due to sampled damage uncertainty, for a given timeline of occurrences.
-Full uncertainty/single EP - all sampled losses by period are rank ordered to produce a single loss exceedance curve. This treats each sample as if it were another period of losses in an extrapolated timeline. Stacking the curves end-to-end rather then viewing side-by-side as in the wheatsheaf is a form of averaging with respect to a particular return period loss and provides stability in the point estimate, for a given timeline of occurrences.
-Sample mean - the losses by period are first averaged across the samples, and then a single loss exceedance curve is created from the period sample mean losses.
-Wheatsheaf mean - the loss exceedance curves from the Wheatsheaf are averaged across each return period, which produces a single loss exceedance curve.
-Wheatsheaf/multiple EP - losses by period are rank ordered for +each sample, which produces many loss exceedance curves - one for each +sample across the same timeline. The wheatsheaf shows the variation in +return period loss due to sampled damage uncertainty, for a given +timeline of occurrences.
Full uncertainty/single EP - all sampled losses by period are +rank ordered to produce a single loss exceedance curve. This treats each +sample as if it were another period of losses in an extrapolated +timeline. Stacking the curves end-to-end rather then viewing +side-by-side as in the wheatsheaf is a form of averaging with respect to +a particular return period loss and provides stability in the point +estimate, for a given timeline of occurrences.
Sample mean - the losses by period are first averaged across the +samples, and then a single loss exceedance curve is created from the +period sample mean losses.
Wheatsheaf mean - the loss exceedance curves from the Wheatsheaf +are averaged across each return period, which produces a single loss +exceedance curve.
The ranked losses represent the first, second, third, etc.. largest loss periods within the total number of periods of say 10,000 years. The relative frequency of these periods of loss is interpreted as the probability of loss exceedance, that is to say that the top ranked loss has an exceedance probability of 1 in 10000, or 0.01%, the second largest loss has an exceedance probability of 0.02%, and so on. In the output file, the exceedance probability is expressed as a return period, which is the reciprocal of the exceedance probability multiplied by the total number of periods. Only non-zero loss periods are returned.
-The ranked losses represent the first, second, third, etc.. largest +loss periods within the total number of periods of say 10,000 years. The +relative frequency of these periods of loss is interpreted as the +probability of loss exceedance, that is to say that the top ranked loss +has an exceedance probability of 1 in 10000, or 0.01%, the second +largest loss has an exceedance probability of 0.02%, and so on. In the +output file, the exceedance probability is expressed as a return period, +which is the reciprocal of the exceedance probability multiplied by the +total number of periods. Only non-zero loss periods are returned.
+An optional parameter is;
$ leccalc [parameters] > lec.csv
-
-
+$ leccalc [parameters] > lec.csv
+
'First generate summarycalc binaries by running the core workflow, for the required summary set
+'First generate summarycalc binaries by running the core workflow, for the required summary set
$ eve 1 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc1.bin
$ eve 2 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc2.bin
-'Then run leccalc, pointing to the specified sub-directory of work containing summarycalc binaries.
+'Then run leccalc, pointing to the specified sub-directory of work containing summarycalc binaries.
$ leccalc -Ksummary1 -F lec_full_uncertainty_agg.csv -f lec_full_uncertainty_occ.csv
-' With return period file
-$ leccalc -r -Ksummary1 -F lec_full_uncertainty_agg.csv -f lec_full_uncertainty_occ.csv
-
-leccalc requires the occurrence.bin file
leccalc does not have a standard input that can be streamed in. Instead, it reads in summarycalc binary data from a file in a fixed location. The format of the binaries must match summarycalc standard output. The location is in the 'work' subdirectory of the present working directory. For example;
+leccalc does not have a standard input that can be streamed in. +Instead, it reads in summarycalc binary data from a file in a fixed +location. The format of the binaries must match summarycalc standard +output. The location is in the 'work' subdirectory of the present +working directory. For example;
The user must ensure the work subdirectory exists. The user may also specify a subdirectory of /work to store these files. e.g.
+The user must ensure the work subdirectory exists. The user may also +specify a subdirectory of /work to store these files. e.g.
The reason for leccalc not having an input stream is that the calculation is not valid on a subset of events, i.e. within a single process when the calculation has been distributed across multiple processes. It must bring together all event losses before assigning event losses to periods and ranking losses by period. The summarycalc losses for all events (all processes) must be written to the /work folder before running leccalc.
-All files with extension .bin from the specified subdirectory are read into memory, as well as the occurrence.bin. The summarycalc losses are grouped together and sampled losses are assigned to period according to which period the events occur in.
+The reason for leccalc not having an input stream is that the +calculation is not valid on a subset of events, i.e. within a single +process when the calculation has been distributed across multiple +processes. It must bring together all event losses before assigning +event losses to periods and ranking losses by period. The summarycalc +losses for all events (all processes) must be written to the /work +folder before running leccalc.
+All files with extension .bin from the specified subdirectory are +read into memory, as well as the occurrence.bin. The summarycalc losses +are grouped together and sampled losses are assigned to period according +to which period the events occur in.
If multiple events occur within a period;
Then the calculation differs by lec type, as follows;
For all curves, the analytical mean loss (sidx = -1) is output as a separate exceedance probability curve. If the calculation is run with 0 samples, then leccalc will still return the analytical mean loss exceedance curve. The 'type' field in the output identifies the type of loss exceedance curve, which is 1 for analytical mean, and 2 for curves calculated from the samples.
-For all curves, the analytical mean loss (sidx = -1) is output as a +separate exceedance probability curve. If the calculation is run with 0 +samples, then leccalc will still return the analytical mean loss +exceedance curve. The 'type' field in the output identifies the type of +loss exceedance curve, which is 1 for analytical mean, and 2 for curves +calculated from the samples.
+csv file with the following fields;
-Full uncertainty, Sample mean and Wheatsheaf mean loss exceedance curve
+Full uncertainty, Sample mean and Wheatsheaf mean loss +exceedance curve
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| summary_id | +summary_id | int | 4 | -summary_id representing a grouping of losses | -10 | +summary_id representing a grouping of +losses | +10 |
| type | +type | int | 4 | -1 for analytical mean, 2 for sample mean | -2 | +1 for analytical mean, 2 for sample +mean | +2 |
| return_period | +return_period | float | 4 | -return period interval | -250 | +return period interval | +250 |
| loss | +loss | float | 4 | -loss exceedance threshold for return period | -546577.8 | +loss exceedance threshold for return +period | +546577.8 |
Wheatsheaf loss exceedance curve
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| summary_id | +summary_id | int | 4 | -summary_id representing a grouping of losses | -10 | +summary_id representing a grouping of +losses | +10 |
| sidx | +sidx | int | 4 | -Oasis sample index | -50 | +Oasis sample index | +50 |
| return_period | +return_period | float | 4 | -return period interval | -250 | +return period interval | +250 |
| loss | +loss | float | 4 | -loss exceedance threshold for return period | -546577.8 | +loss exceedance threshold for return +period | +546577.8 |
An additional feature of leccalc is available to vary the relative importance of the period losses by providing a period weightings file to the calculation. In this file, a weight can be assigned to each period make it more or less important than neutral weighting (1 divided by the total number of periods). For example, if the neutral weight for period 1 is 1 in 10000 years, or 0.0001, then doubling the weighting to 0.0002 will mean that period's loss reoccurrence rate would double. Assuming no other period losses, the return period of the loss of period 1 in this example would be halved.
-All period_nos must appear in the file from 1 to P (no gaps). There is no constraint on the sum of weights. Periods with zero weight will not contribute any losses to the loss exceedance curve.
-This feature will be invoked automatically if the periods.bin file is present in the input directory.
+An additional feature of leccalc is available to vary the relative +importance of the period losses by providing a period weightings file to +the calculation. In this file, a weight can be assigned to each period +make it more or less important than neutral weighting (1 divided by the +total number of periods). For example, if the neutral weight for period +1 is 1 in 10000 years, or 0.0001, then doubling the weighting to 0.0002 +will mean that period's loss reoccurrence rate would double. Assuming no +other period losses, the return period of the loss of period 1 in this +example would be halved.
+All period_nos must appear in the file from 1 to P (no gaps). There +is no constraint on the sum of weights. Periods with zero weight will +not contribute any losses to the loss exceedance curve.
+This feature will be invoked automatically if the periods.bin file is +present in the input directory.
-The program outputs sample mean and standard deviation by summary_id, event_id and period_no. The analytical mean is also output as a seperate record, differentiated by a 'type' field. It also outputs an event occurrence date.
-The program outputs sample mean and standard deviation by summary_id, +event_id and period_no. The analytical mean is also output as a seperate +record, differentiated by a 'type' field. It also outputs an event +occurrence date.
+None
-$ [stdin component] | pltcalc > plt.csv
-$ pltcalc < [stdin].bin > plt.csv
-
+$ [stdin component] | pltcalc > plt.csv
+$ pltcalc < [stdin].bin > plt.csv
$ eve 1 1 | getmodel | gulcalc -r -S100 -C1 | summarycalc -1 - | pltcalc > plt.csv
-$ pltcalc < summarycalc.bin > plt.csv
-
-$ eve 1 1 | getmodel | gulcalc -r -S100 -C1 | summarycalc -1 - | pltcalc > plt.csv
+$ pltcalc < summarycalc.bin > plt.csv
+pltcalc requires the occurrence.bin file
The occurrence.bin file is read into memory. For each summary_id, event_id and period_no, the sample mean and standard deviation is calculated from the sampled losses in the summarycalc stream and output to file. The exposure_value, which is carried in the event_id, summary_id header of the stream is also output, as well as the date field(s) from the occurrence file.
-There are two output formats, depending on whether an event occurrence date is an integer offset to some base date that most external programs can interpret as a real date, or a calendar day in a numbered scenario year. The output format will depend on the format of the date fields in the occurrence.bin file.
+The occurrence.bin file is read into memory. For each summary_id, +event_id and period_no, the sample mean and standard deviation is +calculated from the sampled losses in the summarycalc stream and output +to file. The exposure_value, which is carried in the event_id, +summary_id header of the stream is also output, as well as the date +field(s) from the occurrence file.
+There are two output formats, depending on whether an event +occurrence date is an integer offset to some base date that most +external programs can interpret as a real date, or a calendar day in a +numbered scenario year. The output format will depend on the format of +the date fields in the occurrence.bin file.
In the former case, the output format is;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| type | +type | int | 4 | -1 for analytical mean, 2 for sample mean | -1 | +1 for analytical mean, 2 for sample +mean | +1 |
| summary_id | +summary_id | int | 4 | -summary_id representing a grouping of losses | -10 | +summary_id representing a grouping of +losses | +10 |
| event_id | +event_id | int | 4 | -Oasis event_id | -45567 | +Oasis event_id | +45567 |
| period_no | +period_no | int | 4 | -identifying an abstract period of time, such as a year | -56876 | +identifying an abstract period of time, +such as a year | +56876 |
| mean | +mean | float | 4 | -mean | -1345.678 | +mean | +1345.678 |
| standard_deviation | +standard_deviation | float | 4 | -sample standard deviation | -945.89 | +sample standard deviation | +945.89 |
| exposure_value | +exposure_value | float | 4 | -exposure value for summary_id affected by the event | -70000 | +exposure value for summary_id affected by +the event | +70000 |
| date_id | +date_id | int | 4 | -the date_id of the event occurrence | -28616 | +the date_id of the event occurrence | +28616 |
Using a base date of 1/1/1900 the integer 28616 is interpreted as 16/5/1978.
+Using a base date of 1/1/1900 the integer 28616 is interpreted as +16/5/1978.
In the latter case, the output format is;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| type | +type | int | 4 | -1 for analytical mean, 2 for sample mean | -1 | +1 for analytical mean, 2 for sample +mean | +1 |
| summary_id | +summary_id | int | 4 | -summary_id representing a grouping of losses | -10 | +summary_id representing a grouping of +losses | +10 |
| event_id | +event_id | int | 4 | -Oasis event_id | -45567 | +Oasis event_id | +45567 |
| period_no | +period_no | int | 4 | -identifying an abstract period of time, such as a year | -56876 | +identifying an abstract period of time, +such as a year | +56876 |
| mean | +mean | float | 4 | -mean | -1345.678 | +mean | +1345.678 |
| standard_deviation | +standard_deviation | float | 4 | -sample standard deviation | -945.89 | +sample standard deviation | +945.89 |
| exposure_value | +exposure_value | float | 4 | -exposure value for summary_id affected by the event | -70000 | +exposure value for summary_id affected by +the event | +70000 |
| occ_year | +occ_year | int | 4 | -the year number of the event occurrence | -56876 | +the year number of the event +occurrence | +56876 |
| occ_month | +occ_month | int | 4 | -the month of the event occurrence | -5 | +the month of the event occurrence | +5 |
| occ_year | +occ_year | int | 4 | -the day of the event occurrence | -16 | +the day of the event occurrence | +16 |
aalcalc computes the overall average annual loss and standard deviation of annual loss.
-Two types of aal and standard deviation of loss are calculated; analytical (type 1) and sample (type 2). If the analysis is run with zero samples, then only type 1 statistics are returned by aalcalc.
-The Average Loss Converence Table 'ALCT' is a second optional output which can be generated from aalcalc. This provides extra statistical output which can be used to estimate the amount of simulation error in the average annual loss estimate from samples (type 2).
-aalcalc computes the overall average annual loss and standard +deviation of annual loss.
+Two types of aal and standard deviation of loss are calculated; +analytical (type 1) and sample (type 2). If the analysis is run with +zero samples, then only type 1 statistics are returned by aalcalc.
+The Average Loss Converence Table 'ALCT' is a second optional output +which can be generated from aalcalc. This provides extra statistical +output which can be used to estimate the amount of simulation error in +the average annual loss estimate from samples (type 2).
+aalcalc requires the occurrence.bin file
aalcalc does not have a standard input that can be streamed in. Instead, it reads in summarycalc binary data from a file in a fixed location. The format of the binaries must match summarycalc standard output. The location is in the 'work' subdirectory of the present working directory. For example;
+aalcalc does not have a standard input that can be streamed in. +Instead, it reads in summarycalc binary data from a file in a fixed +location. The format of the binaries must match summarycalc standard +output. The location is in the 'work' subdirectory of the present +working directory. For example;
The user must ensure the work subdirectory exists. The user may also specify a subdirectory of /work to store these files. e.g.
+The user must ensure the work subdirectory exists. The user may also +specify a subdirectory of /work to store these files. e.g.
The reason for aalcalc not having an input stream is that the calculation is not valid on a subset of events, i.e. within a single process when the calculation has been distributed across multiple processes. It must bring together all event losses before assigning event losses to periods and finally computing the final statistics.
-The reason for aalcalc not having an input stream is that the +calculation is not valid on a subset of events, i.e. within a single +process when the calculation has been distributed across multiple +processes. It must bring together all event losses before assigning +event losses to periods and finally computing the final statistics.
+$ aalcalc [parameters] > aal.csv
-
-First generate summarycalc binaries by running the core workflow, for the required summary set
+Usage
+$ aalcalc [parameters] > aal.csv
+Examples
+First generate summarycalc binaries by running the core workflow, for the required summary set
$ eve 1 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc1.bin
$ eve 2 2 | getmodel | gulcalc -r -S100 -c - | summarycalc -g -1 - > work/summary1/summarycalc2.bin
Then run aalcalc, pointing to the specified sub-directory of work containing summarycalc binaries.
$ aalcalc -Ksummary1 > aal.csv
Add alct output at 95% confidence level
-$ aalcalc -Ksummary1 -o -l 0.95 -c alct.csv > aal.csv
-
-AAL:
csv file containing the following fields;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| summary_id | +summary_id | int | 4 | -summary_id representing a grouping of losses | -1 | +summary_id representing a grouping of +losses | +1 |
| type | +type | int | 4 | -1 for analytical statistics, 2 for sample statistics | -2 | +1 for analytical statistics, 2 for sample +statistics | +2 |
| mean | +mean | float | 8 | -average annual loss | -1014.23 | +average annual loss | +1014.23 |
| standard_deviation | +standard_deviation | float | 8 | -standard deviation of annual loss | -11039.78 | +standard deviation of annual loss | +11039.78 |
ALCT:
csv file containing the following fields;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| SummaryId | +SummaryId | int | 4 | -summary_id representing a grouping of losses | -1 | +summary_id representing a grouping of +losses | +1 |
| MeanLoss | +MeanLoss | float | 4 | -the average annual loss estimate from samples | -1014.23 | +the average annual loss estimate from +samples | +1014.23 |
| SDLoss | +SDLoss | float | 8 | -the standard deviation of annual loss from samples | -11039.78 | +the standard deviation of annual loss from +samples | +11039.78 |
| SampleSize | +SampleSize | int | 8 | -the number of samples used to produce the statistics | -100 | +the number of samples used to produce the +statistics | +100 |
| LowerCI | +LowerCI | float | 8 | -the lower threshold of the confidence interval for the mean estimate | -1004.52 | +the lower threshold of the confidence +interval for the mean estimate | +1004.52 |
| UpperCI | +UpperCI | float | 8 | -the upper threshold of the confidence interval for the mean estimate | -1023.94 | +the upper threshold of the confidence +interval for the mean estimate | +1023.94 |
| StandardError | +StandardError | float | 8 | -the total standard error of the mean estimate | -5.90 | +the total standard error of the mean +estimate | +5.90 |
| RelativeError | +RelativeError | float | 8 | -the StandardError divided by the mean estimate | -0.005 | +the StandardError divided by the mean +estimate | +0.005 |
| VarElementHaz | +VarElementHaz | float | 8 | -the contribution to variance of the estimate from the hazard | -8707.40 | +the contribution to variance of the +estimate from the hazard | +8707.40 |
| StandardErrorHaz | +StandardErrorHaz | float | 8 | -the square root of VarElementHaz | -93.31 | +the square root of VarElementHaz | +93.31 |
| RelativeErrorHaz | +RelativeErrorHaz | float | 8 | -the StandardErrorHaz divided by the mean estimate | -0.092 | +the StandardErrorHaz divided by the mean +estimate | +0.092 |
| VarElementVuln | +VarElementVuln | float | 8 | -the contribution to variance of the estimate from the vulnerability | -34.81 | +the contribution to variance of the +estimate from the vulnerability | +34.81 |
| StandardErrorVuln | +StandardErrorVuln | float | 8 | -the square root of VarElementVuln | -5.90 | +the square root of VarElementVuln | +5.90 |
| RelativeErrorVuln | +RelativeErrorVuln | float | 8 | -the StandardErrorVuln divided by the mean estimate | -0.005 | +the StandardErrorVuln divided by the mean +estimate | +0.005 |
The occurrence file and summarycalc files from the specified subdirectory are read into memory. Event losses are assigned to periods based on when the events occur and summed by period and by sample. These are referred to as 'annual loss samples'.
+The occurrence file and summarycalc files from the specified +subdirectory are read into memory. Event losses are assigned to periods +based on when the events occur and summed by period and by sample. These +are referred to as 'annual loss samples'.
AAL calculation:
-For type 1, calculations are performed on the type 1 (numerically integrated) mean annual losses by period. The AAL is the mean annual losses summed across the periods and divided by the number of periods. The standard deviation is the square root of the sum of squared errors between each annual mean loss and the AAL mean divided by the degrees of freedom (periods - 1).
-For type 2 the mean and standard deviation of the annual loss samples are calculated across all samples and periods. The mean estimates the average annual loss, calculated as the sum of all annual loss samples divided by the total number of periods times the number of samples. The standard deviation is the square root of the sum of squared errors between each annual loss sample and the type 2 mean, divided by the degrees of freedom (periods × samples - 1).
+For type 1, calculations are performed on the type 1 (numerically +integrated) mean annual losses by period. The AAL is the mean annual +losses summed across the periods and divided by the number of periods. +The standard deviation is the square root of the sum of squared errors +between each annual mean loss and the AAL mean divided by the degrees of +freedom (periods - 1).
+For type 2 the mean and standard deviation of the annual loss samples +are calculated across all samples and periods. The mean estimates the +average annual loss, calculated as the sum of all annual loss samples +divided by the total number of periods times the number of samples. The +standard deviation is the square root of the sum of squared errors +between each annual loss sample and the type 2 mean, divided by the +degrees of freedom (periods × samples - 1).
ALCT calculation:
-In ALCT, MeanLoss and SDLoss are the same as the type 2 mean and standard deviation from the AAL report. StandardError indicates how much the average annual loss estimate might vary if the simulation were rerun with different random numbers, reflecting the simulation error in the estimate. RelativeError, the StandardError as a percentage of the mean, is convenient for assessing simulation error and acceptable levels are typically expressed in percentage terms. StandardError is derived from the ANOVA metrics described below.
-LowerCI and UpperCI represent the absolute lower and upper thresholds for the confidence interval for the AAL estimate, indicating the range of losses within a specified confidence level. A higher confidence level results in a wider confidence interval.
+In ALCT, MeanLoss and SDLoss are the same as the type 2 mean and +standard deviation from the AAL report. StandardError indicates how much +the average annual loss estimate might vary if the simulation were rerun +with different random numbers, reflecting the simulation error in the +estimate. RelativeError, the StandardError as a percentage of the mean, +is convenient for assessing simulation error and acceptable levels are +typically expressed in percentage terms. StandardError is derived from +the ANOVA metrics described below.
+LowerCI and UpperCI represent the absolute lower and upper thresholds +for the confidence interval for the AAL estimate, indicating the range +of losses within a specified confidence level. A higher confidence level +results in a wider confidence interval.
Variance components:
-VarElementHaz and VarElementVuln arise from attributing variance in the annual loss to hazard effects (variation due to event intensity across years) and vulnerability effects (variation due to sampling from exposure's damage uncertainty distributions). This is done using a one-factor effects model and standard analysis of variance 'ANOVA' on the annual loss samples.
-In the one-factor model, annual loss in year i and sample m, denoted L(i,m), is expressed as:
+VarElementHaz and VarElementVuln arise from attributing variance in +the annual loss to hazard effects (variation due to event intensity +across years) and vulnerability effects (variation due to sampling from +exposure's damage uncertainty distributions). This is done using a +one-factor effects model and standard analysis of variance 'ANOVA' on +the annual loss samples.
+In the one-factor model, annual loss in year i and sample m, denoted +L(i,m), is expressed as:
L(i,m) = AAL + h(i) + v(i,m)
where;
Total variance in annual loss is partitioned into independent hazard and vulnerability effects:
+Total variance in annual loss is partitioned into independent hazard +and vulnerability effects:
Var(L) = Var(h) + Var(v)
-ANOVA is used to estimate the variance components Var(h) and Var(v). For standard Oasis models, since the events are fixed across years, the simulation error in the AAL estimate arises only from the vulnerability component.
-The StandardError of the AAL estimate in ALCT follows from the calculation of the Variance of the AAL estimate as follows;
+ANOVA is used to estimate the variance components Var(h) and Var(v). +For standard Oasis models, since the events are fixed across years, the +simulation error in the AAL estimate arises only from the vulnerability +component.
+The StandardError of the AAL estimate in ALCT follows from the +calculation of the Variance of the AAL estimate as follows;
Var(AAL estimate) = VarElementVuln = Var(v) / (I * M)
StandardErrorVuln = sqrt(VarElementVuln)
StandardError = StandardErrorVuln
-Finally, ALCT provides statistics for multiple increasing sample subsets, showing the convergence of the AAL estimate with increasing sample sizes. These subsets are non-overlapping and fixed, starting with SampleSize=1 (m=1), SampleSize=2 (m=2 to 3), SampleSize=4 (m=4 to 7), up to the maximum subset size. The final row gives statistics for the total samples M, using all available samples.
-An additional feature of aalcalc is available to vary the relative importance of the period losses by providing a period weightings file to the calculation. In this file, a weight can be assigned to each period make it more or less important than neutral weighting (1 divided by the total number of periods). For example, if the neutral weight for period 1 is 1 in 10000 years, or 0.0001, then doubling the weighting to 0.0002 will mean that period's loss reoccurrence rate would double and the loss contribution to the average annual loss would double.
-All period_nos must appear in the file from 1 to P (no gaps). There is no constraint on the sum of weights. Periods with zero weight will not contribute any losses to the AAL.
-This feature will be invoked automatically if the periods.bin file is present in the input directory.
+Finally, ALCT provides statistics for multiple increasing sample +subsets, showing the convergence of the AAL estimate with increasing +sample sizes. These subsets are non-overlapping and fixed, starting with +SampleSize=1 (m=1), SampleSize=2 (m=2 to 3), SampleSize=4 (m=4 to 7), up +to the maximum subset size. The final row gives statistics for the total +samples M, using all available samples.
+An additional feature of aalcalc is available to vary the relative +importance of the period losses by providing a period weightings file to +the calculation. In this file, a weight can be assigned to each period +make it more or less important than neutral weighting (1 divided by the +total number of periods). For example, if the neutral weight for period +1 is 1 in 10000 years, or 0.0001, then doubling the weighting to 0.0002 +will mean that period's loss reoccurrence rate would double and the loss +contribution to the average annual loss would double.
+All period_nos must appear in the file from 1 to P (no gaps). There +is no constraint on the sum of weights. Periods with zero weight will +not contribute any losses to the AAL.
+This feature will be invoked automatically if the periods.bin file is +present in the input directory.
-Optional parameter for aalcalc;
In cases where events have been distributed to multiple processes, the output files can be concatenated to standard output.
-In cases where events have been distributed to multiple processes, +the output files can be concatenated to standard output.
+Optional parameters are:
The sort by event ID option assumes that events have not been distributed to processes randomly and the list of event IDs in events.bin is sequential and contiguous. Should either of these conditions be false, the output will still contain all events but sorting cannot be guaranteed.
-$ kat [parameters] [file]... > [stdout component]
-
-$ kat -d pltcalc_output/ > pltcalc.csv
+The sort by event ID option assumes that events have not been
+distributed to processes randomly and the list of event IDs in
+events.bin is sequential and contiguous. Should either of these
+conditions be false, the output will still contain all events but
+sorting cannot be guaranteed.
+Usage
+$ kat [parameters] [file]... > [stdout component]
+Examples
+$ kat -d pltcalc_output/ > pltcalc.csv
$ kat eltcalc_P1 eltcalc_P2 eltcalc_P3 > eltcalc.csv
$ kat -s eltcalc_P1 eltcalc_P2 eltcalc_P3 > eltcalc.csv
-$ kat -s -d eltcalc_output/ > eltcalc.csv
-
-Files are concatenated in the order in which they are presented on the command line. Should a file path be specified, files are concatenated in alphabetical order. When asked to sort by event ID, the order of input files is irrelevant.
+$ kat -s -d eltcalc_output/ > eltcalc.csv +Files are concatenated in the order in which they are presented on +the command line. Should a file path be specified, files are +concatenated in alphabetical order. When asked to sort by event ID, the +order of input files is irrelevant.
-The output parquet files from multiple processes can be concatenated to a single parquet file. The results are automatically sorted by event ID. Unlike kat, the ORD table name for the input files must be specified on the command line.
-The output parquet files from multiple processes can be concatenated +to a single parquet file. The results are automatically sorted by event +ID. Unlike kat, the ORD table name for the input files +must be specified on the command line.
+$ katparquet [parameters] -o [filename.parquet] [file]...
-
-$ katparquet -d mplt_files/ -M -o MPLT.parquet
-$ katparquet -q -o QPLT.parquet qplt_P1.parquet qplt_P2.parquet qplt_P3.parquet
-
+$ katparquet [parameters] -o [filename.parquet] [file]...
+$ katparquet -d mplt_files/ -M -o MPLT.parquet
+$ katparquet -q -o QPLT.parquet qplt_P1.parquet qplt_P2.parquet qplt_P3.parquet
-Go to 4.3 Data conversion components section
- - - - +Go to 4.3 Data conversion +components section
+ + + diff --git a/docs/html/Overview.html b/docs/html/Overview.html index f75ae0c6..7b1f879c 100644 --- a/docs/html/Overview.html +++ b/docs/html/Overview.html @@ -1,395 +1,234 @@ - - - -
This is the general data streaming framework showing the core components of the toolkit.
-

This is the general data streaming framework showing the core +components of the toolkit.
+
The architecture consists of;
The conversion of input data to binary format is shown in the diagram as occurring outside of the compute server, but this could be performed within the compute server. ktools provides a full set of binary conversion tools from csv input files which can be deployed elsewhere.
-The in-memory data streams are initiated by the process 'eve' (meaning 'event emitter') and shown by solid arrows. The read/write data flows are shown as dashed arrows.
-The calculation components are getmodel, gulcalc, fmcalc, summarycalc and outputcalc. The streamed data passes through the components in memory one event at a time and are written out to a results file on the compute server. The user can then retrieve the results (csvs) and consume them in their BI system.
-The reference model demonstrates an implementation of the core calculation components, along with the data conversion components which convert binary files to csv files.
-The analysis workflows are controlled by the user, not the toolkit, and they can be as simple or as complex as required.
-The simplest workflow is single or parallel processing to produce a single result. This minimises the amount of disk I/O at each stage in the calculation, which performs better than saving intermediate results to disk. This workflow is shown in Figure 2.
-
However it is possible to stream data from one process into to several processes, allowing the calculation of multiple outputs simultaneously, as shown in Figure 3.
-
For multi-output, multi-process workflows, Linux operating systems provide 'named pipes' which in-memory data streams can be diverted to and manipulated as if they were files, and 'tee' which sends a stream from one process into multiple processes. This means the core calculation is not repeated for each output, as it would be if several single-output workflows were run.
- - - - - +The conversion of input data to binary format is shown in the diagram +as occurring outside of the compute server, but this could be performed +within the compute server. ktools provides a full set of binary +conversion tools from csv input files which can be deployed +elsewhere.
+The in-memory data streams are initiated by the process 'eve' +(meaning 'event emitter') and shown by solid arrows. The read/write data +flows are shown as dashed arrows.
+The calculation components are getmodel, gulcalc, +fmcalc, summarycalc and outputcalc. The +streamed data passes through the components in memory one event at a +time and are written out to a results file on the compute server. The +user can then retrieve the results (csvs) and consume them in their BI +system.
+The reference model demonstrates an implementation of the core +calculation components, along with the data conversion components which +convert binary files to csv files.
+The analysis workflows are controlled by the user, not the toolkit, +and they can be as simple or as complex as required.
+The simplest workflow is single or parallel processing to produce a +single result. This minimises the amount of disk I/O at each stage in +the calculation, which performs better than saving intermediate results +to disk. This workflow is shown in Figure 2.
+
However it is possible to stream data from one process into to +several processes, allowing the calculation of multiple outputs +simultaneously, as shown in Figure 3.
+
For multi-output, multi-process workflows, Linux operating systems +provide 'named pipes' which in-memory data streams can be diverted to +and manipulated as if they were files, and 'tee' which sends a stream +from one process into multiple processes. This means the core +calculation is not repeated for each output, as it would be if several +single-output workflows were run.
+ + + + diff --git a/docs/html/README.html b/docs/html/README.html index 19bfee58..6ea91dc5 100644 --- a/docs/html/README.html +++ b/docs/html/README.html @@ -1,447 +1,335 @@ - - - -
-
+
+
+
+
+
+
+ 
This is the POSIX-compliant Oasis LMF In-Memory Kernel toolkit.
Please click here to download the latest release.
-The source code will change on a regular basis but only the releases are supported. Support enquiries should be sent to support@oasislmf.org.
+Please click here to download +the latest release.
+The source code will change on a regular basis but only the releases +are supported. Support enquiries should be sent to support@oasislmf.org.
There are build instructions for Windows 64-bit executables.
-Note that the dynamic random number option in the Windows build uses a deterministic seed due to a bug in the mingw compiler. We recommend the random number file option (gulcalc -r) should be used in Windows.
-This issue will be handled in future releases by implementing the rdrand random number generator in all environments.
+Note that the dynamic random number option in the Windows build uses +a deterministic seed due to a bug in the mingw compiler. We recommend +the random number file option (gulcalc -r) should be used in +Windows.
+This issue will be handled in future releases by implementing the +rdrand random number generator in all environments.
The g++ compiler build-essential, libtool, zlib1g-dev, autoconf, pkg-config on debian distros or 'Development Tools' and zlib-devel on red hat needs to be installed in Linux.
-To enable Parquet format outputs (optional), version 7.0.0 of the Arrow Apache library is required. The recommended method is to build the library from source as follows;
-$ mkdir build
+The g++ compiler build-essential, libtool, zlib1g-dev, autoconf,
+pkg-config on debian distros or 'Development Tools' and zlib-devel on
+red hat needs to be installed in Linux.
+To enable Parquet format outputs (optional), version 7.0.0 of the
+Arrow Apache library is required. The recommended method is to build the
+library from source as follows;
+$ mkdir build
$ cd build
$ git clone https://github.com/apache/arrow.git -b release-7.0.0
$ mkdir -p arrow/cpp/build-release
$ cd build/arrow/cpp/build-release
$ cmake -DARROW_PARQUET=ON -DARROW_BUILD_STATIC=ON -DARROW_OPTIONAL_INSTALL=ON ..
$ make -j$(nproc)
-$ make install
-
-More information on Arrow Apache.
+$ make install +More +information on Arrow Apache.
Copy ktools-[version].tar.gz onto your machine and untar.
-$ tar -xvf ktools-[version].tar.gz
-
-Go into the ktools folder and autogen using the following command;
-$ cd ktools-[version]
-$ ./autogen.sh
-
+$ tar -xvf ktools-[version].tar.gzGo into the ktools folder and autogen using the following +command;
+$ cd ktools-[version]
+$ ./autogen.shConfigure using the following command;
-$ ./configure
-
-The configure script will attempt to find and link the appropriate Apache libraries to enable Parquet format output. This search for these libraries can be disabled manually with an extra flag:
-$ ./configure --disable-parquet
-
+$ ./configureThe configure script will attempt to find and link the appropriate +Apache libraries to enable Parquet format output. This search for these +libraries can be disabled manually with an extra flag:
+$ ./configure --disable-parquet
On OS X add an extra flag:
-$ ./configure --enable-osx
-
+$ ./configure --enable-osxMake using the following command;
-$ make
-
-Next run the automated test to check the build and numerical results;
-$ make check
-
+$ makeNext run the automated test to check the build and numerical +results;
+$ make checkFinally, install the executables using the following command;
-$ [sudo] make install
-
-The installation is complete. The executables are located in /usr/local/bin.
-If installing the latest code from the git repository, clone the ktools repository onto your machine.
-Go into the ktools folder and autogen using the following command;
-$ cd ktools
-$ ./autogen.sh
-
+$ [sudo] make installThe installation is complete. The executables are located in +/usr/local/bin.
+If installing the latest code from the git repository, clone the +ktools repository onto your machine.
+Go into the ktools folder and autogen using the following +command;
+$ cd ktools
+$ ./autogen.shFollow the rest of the process as described above.
Install Cmake from either system packages or cmake.org.
-# create the build directory within ktools directory
-$ mkdir build && cd build
-$ ktools_source_dir=~/ktools
-# Generate files and specify destination (here is in ./local/bin)
-$ cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=~/.local $ktools_source_dir
-
-# Build
-$ make all test
-
-# If all is OK, install to bin subdir of the specified install prefix
-$ make install
-
+Install Cmake from either system packages or cmake.org.
+# create the build directory within ktools directory
+$ mkdir build && cd build
+$ ktools_source_dir=~/ktools
+# Generate files and specify destination (here is in ./local/bin)
+$ cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=~/.local $ktools_source_dir
+
+# Build
+$ make all test
+
+# If all is OK, install to bin subdir of the specified install prefix
+$ make installMSYS2 64-bit is required for the Windows native build. MSYS2 is a Unix/Linux like development environment for building and distributing windows applications. -https://www.msys2.org/
+MSYS2 64-bit is required for the Windows native build. MSYS2 is a +Unix/Linux like development environment for building and distributing +windows applications. https://www.msys2.org/
Download and run the set-up program for MSYS2 64-bit.
Open a MSYS2 terminal and perform the updates before continuing.
The following add-in packages are required;
@@ -454,44 +342,52 @@These packages can be installed at the MSYS2 terminal command line.
-$ pacman -S autoconf automake git libtool make mingw-w64-x86_64-toolchain python
-
-
These packages can be installed at the MSYS2 terminal command +line.
+$ pacman -S autoconf automake git libtool make mingw-w64-x86_64-toolchain python
Clone the github repository at the MSYS2 terminal command line
-$ git clone https://github.com/OasisLMF/ktools.git
-
-Go into the ktools folder and run autogen using the following command;
-$ cd ktools
-$ ./autogen.sh
-
+$ git clone https://github.com/OasisLMF/ktools.gitGo into the ktools folder and run autogen using the following +command;
+$ cd ktools
+$ ./autogen.shConfigure using the following command;
-$ ./configure
-
+$ ./configureMake using the following command;
-$ make
-
-Next run the automated test to check the build and numerical results;
-$ make check
-
+$ makeNext run the automated test to check the build and numerical +results;
+$ make checkFinally, install the executables using the following command;
-$ make install
-
-The installation is complete. The executables are located in /usr/local/bin.
+$ make installThe installation is complete. The executables are located in +/usr/local/bin.
There is sample data and six example scripts which demonstrate how to invoke ktools in the /examples folder. These are written in python v2.
-For example, to run the eltcalc_example script, go into the examples folder and run the following command (you must have python installed):
-$ cd examples
-$ python eltcalc_example.py
-
+There is sample data and six example scripts which demonstrate how to +invoke ktools in the /examples folder. These are written in python +v2.
+For example, to run the eltcalc_example script, go into the examples +folder and run the following command (you must have python +installed):
+$ cd examples
+$ python eltcalc_example.pyTo build linux docker image do following command
-docker build --file Dockerfile.ktools.alpine -t alpine-ktools .
-
+docker build --file Dockerfile.ktools.alpine -t alpine-ktools .
Email support@oasislmf.org
+Email support@oasislmf.org
The code in this project is licensed under BSD 3-clause license.
- - - + + diff --git a/docs/html/RandomNumbers.html b/docs/html/RandomNumbers.html index 1cff1686..829fee60 100644 --- a/docs/html/RandomNumbers.html +++ b/docs/html/RandomNumbers.html @@ -1,431 +1,276 @@ - - - -
Simple uniform random numbers are assigned to each event, group and sample number to sample ground up loss in the gulcalc process. A group is a collection of items which share the same group_id, and is the method of supporting spatial correlation in ground up loss sampling in Oasis and ktools.
+ + + + + + +
Simple uniform random numbers are assigned to each event, group and +sample number to sample ground up loss in the gulcalc process. A group +is a collection of items which share the same group_id, and is the +method of supporting spatial correlation in ground up loss sampling in +Oasis and ktools.
Items (typically representing, in insurance terms, the underlying risk coverages) that are assigned the same group_id will use the same random number to sample damage for a given event and sample number. Items with different group_ids will be assigned independent random numbers. Therefore sampled damage is fully correlated within groups and fully independent between groups, where group is an abstract collection of items defined by the user.
-The item_id, group_id data is provided by the user in the items input file (items.bin).
+Items (typically representing, in insurance terms, the underlying +risk coverages) that are assigned the same group_id will use the same +random number to sample damage for a given event and sample number. +Items with different group_ids will be assigned independent random +numbers. Therefore sampled damage is fully correlated within groups and +fully independent between groups, where group is an abstract collection +of items defined by the user.
+The item_id, group_id data is provided by the user in the items input +file (items.bin).
The method of assigning random numbers in gulcalc uses an random number index (ridx), an integer which is used as a position reference into a list of random numbers. S random numbers corresponding to the runtime number of samples are drawn from the list starting at the ridx position.
-There are three options in ktools for choosing random numbers to apply in the sampling process.
-The method of assigning random numbers in gulcalc uses an random +number index (ridx), an integer which is used as a position reference +into a list of random numbers. S random numbers corresponding to the +runtime number of samples are drawn from the list starting at the ridx +position.
+There are three options in ktools for choosing random numbers to +apply in the sampling process.
+Use -R{number of random numbers} as a parameter. Optionally you may use -s{seed} to make the random numbers repeatable.
+Use -R{number of random numbers} as a parameter. Optionally you may +use -s{seed} to make the random numbers repeatable.
$ gulcalc -S00 -R1000000 -i -
-
-This will run 100 samples drawing from 1 million dynamically generated random numbers. They are simple uniform random numbers.
-$ gulcalc -S00 -s123 -R1000000 -i -
-
-This will run 100 samples drawing from 1 million seeded random numbers (repeatable)
+$ gulcalc -S00 -R1000000 -i -
+This will run 100 samples drawing from 1 million dynamically +generated random numbers. They are simple uniform random numbers.
+$ gulcalc -S00 -s123 -R1000000 -i -
+This will run 100 samples drawing from 1 million seeded random +numbers (repeatable)
Random numbers are sampled dynamically using the Mersenne twister psuedo random number generator (the default RNG of the C++ v11 compiler). -A sparse array capable of holding R random numbers is allocated to each event. The ridx is generated from the group_id and number of samples S using the following modulus function;
+Random numbers are sampled dynamically using the Mersenne twister +psuedo random number generator (the default RNG of the C++ v11 +compiler). A sparse array capable of holding R random numbers is +allocated to each event. The ridx is generated from the group_id and +number of samples S using the following modulus function;
ridx= mod(group_id x P1, R)
This formula pseudo-randomly assigns ridx indexes to each group_id between 0 and 999,999.
-As a ridx is sampled, the section in the array starting at the ridx position of length S is populated with random numbers unless they have already been populated, in which case the existing random numbers are re-used.
-The array is cleared for the next event and a new set of random numbers is generated.
-This formula pseudo-randomly assigns ridx indexes to each group_id +between 0 and 999,999.
+As a ridx is sampled, the section in the array starting at the ridx +position of length S is populated with random numbers unless they have +already been populated, in which case the existing random numbers are +re-used.
+The array is cleared for the next event and a new set of random +numbers is generated.
+Use -r as a parameter
-$ gulcalc -S100 -r -i -
-
-This will run 100 samples using random numbers from file random.bin in the static sub-directory.
-The random number file(s) is read into memory at the start of the gulcalc process.
-The ridx is generated from the sample index (sidx), event_id and group_id using the following modulus function;
+$ gulcalc -S100 -r -i -
+This will run 100 samples using random numbers from file random.bin +in the static sub-directory.
+The random number file(s) is read into memory at the start of the +gulcalc process.
+The ridx is generated from the sample index (sidx), event_id and +group_id using the following modulus function;
ridx= sidx + mod(group_id x P1 x P3 + event_id x P2, R)
This formula pseudo-randomly assigns a starting position index to each event_id and group_id combo between 0 and R-1, and then S random numbers are drawn by incrementing the starting position by the sidx.
-This formula pseudo-randomly assigns a starting position index to +each event_id and group_id combo between 0 and R-1, and then S random +numbers are drawn by incrementing the starting position by the sidx.
+Default option
-$ gulcalc -S100 -i -
-
-This option will produce repeatable random numbers seeded from a combination of the event_id and group_id. The difference between this option and method 1 with the fixed seed is that there is no limit on the number of random numbers generated, and you do not need to make a decision on the buffer size. This will impact performance for large analyses.
-For each event_id and group_id, the seed is calculated as follows;
-s1 = mod(group_id * 1543270363, 2147483648);
-s2 = mod(event_id * 1943272559, 2147483648);
-seed = mod(s1 + s2 , 2147483648)
$ gulcalc -S100 -i -
+This option will produce repeatable random numbers seeded from a +combination of the event_id and group_id. The difference between this +option and method 1 with the fixed seed is that there is no limit on the +number of random numbers generated, and you do not need to make a +decision on the buffer size. This will impact performance for large +analyses.
+For each event_id and group_id, the seed is calculated as +follows;
+s1 = mod(group_id * 1543270363, 2147483648);
+s2 = mod(event_id * 1943272559, 2147483648); seed = mod(s1 + s2 ,
+2147483648)

This section provides an overview of the reference model, which is an implementation of each of the components in the framework.
-There are five sub-sections which cover the usage and internal processes of each of the reference components;
+ + + + + + +
This section provides an overview of the reference model, which is an +implementation of each of the components in the framework.
+There are five sub-sections which cover the usage and internal +processes of each of the reference components;
The set of core components provided in this release is as follows;
+The set of core +components provided in this release is as follows;
The standard input and standard output data streams for the core components are covered in the Specification.
-Figure 1 shows the core components workflow and the required data input files.
-
The model static data for the core workflow, shown as red source files, are the event footprint, vulnerability, damage bin dictionary and random number file. These are stored in the 'static' sub-directory of the working folder.
-The user / analysis input data for the core workflow, shown as blue source files, are the events, items, coverages, fm programme, fm policytc, fm profile, fm xref, fm summary xref and gul summary xref files. These are stored in the 'input' sub-directory of the working folder.
-These are all Oasis kernel format data objects with prescribed formats. Note that the events are a user input rather than a static input because the user could choose to run a subset of the full list of events, or even just one event. Usually, though, the whole event set will be run.
-The output components are various implementations of outputcalc, as described in general terms in the Specification. The results are written directly into csv file as there is no downstream processing.
+The standard input and standard output data streams for the core +components are covered in the Specification.
+Figure 1 shows the core components workflow and the required data +input files.
+
The model static data for the core workflow, shown +as red source files, are the event footprint, vulnerability, damage bin +dictionary and random number file. These are stored in the +'static' sub-directory of the working folder.
+The user / analysis input data for the core +workflow, shown as blue source files, are the events, items, coverages, +fm programme, fm policytc, fm profile, fm xref, fm summary xref and gul +summary xref files. These are stored in the 'input' +sub-directory of the working folder.
+These are all Oasis kernel format data objects with prescribed +formats. Note that the events are a user input rather than a static +input because the user could choose to run a subset of the full list of +events, or even just one event. Usually, though, the whole event set +will be run.
+The output +components are various implementations of outputcalc, as +described in general terms in the Specification. The results are written +directly into csv file as there is no downstream processing.
The files required for the output components are shown in Figure 2.
-
The data conversion components section covers the formats of all of the required data files and explains how to convert data in csv format into binary format, and vice versa.
-The stream conversion components section explains how to convert the binary data stream output to csv, plus how to convert gulcalc data in csv format into binary format. These components are useful when working with individual components at a more detailed level.
-The validation components section explains how to use the validation components to check the validity of the static and input files in csv format, before they are converted to binary format. There are both validation checks on individual files and cross checks for consistency across files.
-The version of the installed components can be found by using the command line parameter -v. For example;
-$ gulcalc -v
-gulcalc : version: 3.0.7
-
+The files required for the output components are shown in Figure +2.
+
The data conversion +components section covers the formats of all of the +required data files and explains how to convert data in csv format into +binary format, and vice versa.
+The stream conversion +components section explains how to convert the binary data +stream output to csv, plus how to convert gulcalc data in csv format +into binary format. These components are useful when working with +individual components at a more detailed level.
+The validation +components section explains how to use the validation +components to check the validity of the static and input files in csv +format, before they are converted to binary format. There are both +validation checks on individual files and cross checks for consistency +across files.
+The version of the installed components can be found by using the +command line parameter -v. For example;
+$ gulcalc -v
+gulcalc : version: 3.0.7
Component usage guidance is available using the parameter -h
-$ fmcalc -h
+$ fmcalc -h
-a set allocrule (default none)
-M max level (optional)
-p inputpath (relative or full path)
@@ -417,12 +300,12 @@ Figure 2. Output workflows
-O Alloc rule2 optimization off
-d debug
-v version
--h help
-
-The components have additional command line parameters depending on their particular function. These are described in detail in the following pages.
+-h help +The components have additional command line parameters depending on +their particular function. These are described in detail in the +following pages.
-Go to 4.1 Core Components section
- - - - +Go to 4.1 Core Components section
+ + + diff --git a/docs/html/Specification.html b/docs/html/Specification.html index 8b312e52..688daebe 100644 --- a/docs/html/Specification.html +++ b/docs/html/Specification.html @@ -1,374 +1,180 @@ - - - -

This section specifies the data stream structures and core components in the in-memory kernel.
+This section specifies the data stream structures and core components +in the in-memory kernel.
The data stream structures are;
The stream data structures have been designed to minimise the volume flowing through the pipeline, using data packet 'headers' to remove redundant data. For example, indexes which are common to a block of data are defined as a header record and then only the variable data records that are relevant to the header key are part of the data stream. The names of the data fields given below are unimportant, only their position in the data stream is important in order to perform the calculations defined in the program.
+The stream data structures have been designed to minimise the volume +flowing through the pipeline, using data packet 'headers' to remove +redundant data. For example, indexes which are common to a block of data +are defined as a header record and then only the variable data records +that are relevant to the header key are part of the data stream. The +names of the data fields given below are unimportant, only their +position in the data stream is important in order to perform the +calculations defined in the program.
The components are;
The components have a standard input (stdin) and/or output (stdout) data stream structure. eve is the stream-initiating component which only has a standard output stream, whereas "outputcalc" (a generic name representing an extendible family of output calculation components) is a stream-terminating component with only a standard input stream.
-An implementation of each of the above components is provided in the Reference Model, where usage instructions and command line parameters are provided. A functional overview is given below.
+The components have a standard input (stdin) and/or output (stdout) +data stream structure. eve is the stream-initiating component which only +has a standard output stream, whereas "outputcalc" (a generic name +representing an extendible family of output calculation components) is a +stream-terminating component with only a standard input stream.
+An implementation of each of the above components is provided in the +Reference Model, where usage +instructions and command line parameters are provided. A functional +overview is given below.
The architecture supports multiple stream types. Therefore a developer can define a new type of data stream within the framework by specifying a unique stream_id of the stdout of one or more of the components, or even write a new component which performs an intermediate calculation between the existing components.
-The stream_id is the first 4 byte header of the stdout streams. The higher byte is reserved to identify the type of stream, and the 2nd to 4th bytes hold the identifier of the stream. This is used for validation of pipeline commands to report errors if the components are not being used in the correct order.
+The architecture supports multiple stream types. Therefore a +developer can define a new type of data stream within the framework by +specifying a unique stream_id of the stdout of one or more of the +components, or even write a new component which performs an intermediate +calculation between the existing components.
+The stream_id is the first 4 byte header of the stdout streams. The +higher byte is reserved to identify the type of stream, and the 2nd to +4th bytes hold the identifier of the stream. This is used for validation +of pipeline commands to report errors if the components are not being +used in the correct order.
The current reserved values are as follows;
Higher byte;
| Byte 1 | -Stream name | +Byte 1 | +Stream name |
|---|---|---|---|
| 0 | -cdf | +0 | +cdf |
| 1 | -gulcalc (deprecated) | +1 | +gulcalc (deprecated) |
| 2 | -loss | +2 | +loss |
| 3 | -summary | +3 | +summary |
Reserved stream_ids;
| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 0 | +0 | 1 | -cdf - Oasis format effective damageability CDF output | +cdf - Oasis format effective damageability +CDF output |
| 1 | +1 | 1 | -gulcalc - Oasis format item level ground up loss sample output (deprecated) | +gulcalc - Oasis format item level ground +up loss sample output (deprecated) |
| 1 | +1 | 2 | -gulcalc - Oasis format coverage level ground up loss sample output (deprecated) | +gulcalc - Oasis format coverage level +ground up loss sample output (deprecated) |
| 2 | +2 | 1 | -loss - Oasis format loss sample output (any loss perspective) | +loss - Oasis format loss sample output +(any loss perspective) |
| 3 | +3 | 1 | -summary - Oasis format summary level loss sample output | +summary - Oasis format summary level loss +sample output |
The supported standard input and output streams of the reference model components are summarized here;
+The supported standard input and output streams of the reference +model components are summarized here;
| Component | -Standard input | -Standard output | -Stream option parameters | +Component | +Standard input | +Standard output | +Stream option parameters |
|---|---|---|---|---|---|---|---|
| getmodel | -none | -0/1 cdf | -none | +getmodel | +none | +0/1 cdf | +none |
| gulcalc | -0/1 cdf | -2/1 loss | --i -a{} | +gulcalc | +0/1 cdf | +2/1 loss | +-i -a{} |
| fmcalc | -2/1 loss | -2/1 loss | -none | +fmcalc | +2/1 loss | +2/1 loss | +none |
| summarycalc | -2/1 loss | -3/1 summary | --i input from gulcalc, -f input from fmcalc | +summarycalc | +2/1 loss | +3/1 summary | +-i input from gulcalc, -f input from +fmcalc |
| outputcalc | -3/1 summary | -none | -none | +outputcalc | +3/1 summary | +none | +none |
Stream header packet structure
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| stream_id | +stream_id | int | 1/3 | -Identifier of the data stream type. | -0/1 | +Identifier of the data stream type. | +0/1 |
Data header packet structure
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| areaperil_id | +areaperil_id | int | 4 | -Oasis areaperil_id | -345456 | +Oasis areaperil_id | +345456 |
| vulnerability_id | +vulnerability_id | int | 4 | -Oasis vulnerability_id | -345 | +Oasis vulnerability_id | +345 |
| no_of_bins | +no_of_bins | int | 4 | -Number of records (bins) in the data package | -20 | +Number of records (bins) in the data +package | +20 |
Data packet structure (record repeated no_of_bin times)
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| prob_to | +prob_to | float | 4 | -The cumulative probability at the upper damage bin threshold | -0.765 | +The cumulative probability at the upper +damage bin threshold | +0.765 |
| bin_mean | +bin_mean | float | 4 | -The conditional mean of the damage bin | -0.45 | +The conditional mean of the damage +bin | +0.45 |
Stream header packet structure
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| stream_id | +stream_id | int | 1/3 | -Identifier of the data stream type. | -2/1 | +Identifier of the data stream type. | +2/1 |
| no_of_samples | +no_of_samples | int | 4 | -Number of samples | -100 | +Number of samples | +100 |
Data header packet structure
-| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| item_id /output_id | +item_id /output_id | int | 4 | -Oasis item_id (gulcalc) or output_id (fmcalc) | -300 | +Oasis item_id (gulcalc) or output_id +(fmcalc) | +300 |
Data packet structure
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| sidx | +sidx | int | 4 | -Sample index | -10 | +Sample index | +10 |
| loss | +loss | float | 4 | -The loss for the sample | -5625.675 | +The loss for the sample | +5625.675 |
The data packet may be a variable length and so a sidx of 0 identifies the end of the data packet.
+The data packet may be a variable length and so a sidx of 0 +identifies the end of the data packet.
There are five values of sidx with special meaning as follows;
| sidx | -Meaning | +sidx | +Meaning | Required / optional |
|---|---|---|---|---|
| -5 | -maximum loss | +-5 | +maximum loss | optional |
| -4 | -chance of loss | +-4 | +chance of loss | optional |
| -3 | -impacted exposure | +-3 | +impacted exposure | required |
| -2 | -numerical integration standard deviation loss | +-2 | +numerical integration standard deviation +loss | optional |
| -1 | -numerical integration mean loss | +-1 | +numerical integration mean loss | required |
sidx -5 to -1 must come at the beginning of the data packet before the other samples in ascending order (-5 to -1).
+sidx -5 to -1 must come at the beginning of the data packet before +the other samples in ascending order (-5 to -1).
-Stream header packet structure
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| stream_id | +stream_id | int | 1/3 | -Identifier of the data stream type. | -3/1 | +Identifier of the data stream type. | +3/1 |
| no_of_samples | +no_of_samples | int | 4 | -Number of samples | -100 | +Number of samples | +100 |
| summary_set | +summary_set | int | 4 | -Identifier of the summary set | -2 | +Identifier of the summary set | +2 |
Data header packet structure
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| summary_id | +summary_id | int | 4 | -Oasis summary_id | -300 | +Oasis summary_id | +300 |
| exposure_value | +exposure_value | float | 4 | -Impacted exposure (sum of sidx -3 losses for summary_id) | -987878 | +Impacted exposure (sum of sidx -3 losses +for summary_id) | +987878 |
Data packet structure
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| sidx | +sidx | int | 4 | -Sample index | -10 | +Sample index | +10 |
| loss | +loss | float | 4 | -The loss for the sample | -5625.675 | +The loss for the sample | +5625.675 |
The data packet may be a variable length and so a sidx of 0 identifies the end of the data packet.
+The data packet may be a variable length and so a sidx of 0 +identifies the end of the data packet.
The sidx -1 mean loss may be present (if non-zero)
| sidx | -Meaning | +sidx | +Meaning | Required / optional | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -1 | -numerical integration mean loss | +-1 | +numerical integration mean loss | optional |
| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 0 | +0 | 1 | -cdf | +cdf |
A binary file of the same format can be piped into cdftocsv.
$ [stdin component] | cdftocsv > [output].csv
-$ cdftocsv < [stdin].bin > [output].csv
-
+$ [stdin component] | cdftocsv > [output].csv
+$ cdftocsv < [stdin].bin > [output].csv
$ eve 1 1 | getmodel | cdftocsv > cdf.csv
-$ cdftocsv < getmodel.bin > cdf.csv
-
+$ eve 1 1 | getmodel | cdftocsv > cdf.csv
+$ cdftocsv < getmodel.bin > cdf.csv
Csv file with the following fields;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| areaperil_id | +areaperil_id | int | 4 | -Oasis areaperil_id | -345456 | +Oasis areaperil_id | +345456 |
| vulnerability_id | +vulnerability_id | int | 4 | -Oasis vulnerability_id | -345 | +Oasis vulnerability_id | +345 |
| bin_index | +bin_index | int | 4 | -Damage bin index | -20 | +Damage bin index | +20 |
| prob_to | +prob_to | float | 4 | -The cumulative probability at the upper damage bin threshold | -0.765 | +The cumulative probability at the upper +damage bin threshold | +0.765 |
| bin_mean | +bin_mean | float | 4 | -The conditional mean of the damage bin | -0.45 | +The conditional mean of the damage +bin | +0.45 |
A component which converts the gulcalc item or coverage stream, or binary file with the same structure, to a csv file.
-A component which converts the gulcalc item or coverage stream, or +binary file with the same structure, to a csv file.
+| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 1 | +1 | 1 | -gulcalc item | +gulcalc item |
| 1 | +1 | 2 | -gulcalc coverage | +gulcalc coverage |
A binary file of the same format can be piped into gultocsv.
-$ [stdin component] | gultocsv > [output].csv
-$ gultocsv < [stdin].bin > [output].csv
-
-$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | gultocsv > gulcalcc.csv
-$ gultocsv < gulcalci.bin > gulcalci.csv
-
-$ [stdin component] | gultocsv > [output].csv
+$ gultocsv < [stdin].bin > [output].csv
+$ eve 1 1 | getmodel | gulcalc -r -S100 -c - | gultocsv > gulcalcc.csv
+$ gultocsv < gulcalci.bin > gulcalci.csv
+Csv file with the following fields;
gulcalc item stream 1/1
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| item_id | +item_id | int | 4 | -Oasis item_id | -300 | +Oasis item_id | +300 |
| sidx | +sidx | int | 4 | -Sample index | -10 | +Sample index | +10 |
| loss | +loss | float | 4 | -The ground up loss value | -5675.675 | +The ground up loss value | +5675.675 |
gulcalc coverage stream 1/2
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| coverage_id | +coverage_id | int | 4 | -Oasis coverage_id | -150 | +Oasis coverage_id | +150 |
| sidx | +sidx | int | 4 | -Sample index | -10 | +Sample index | +10 |
| loss | +loss | float | 4 | -The ground up loss value | -5675.675 | +The ground up loss value | +5675.675 |
A component which converts the fmcalc output stream, or binary file with the same structure, to a csv file.
-A component which converts the fmcalc output stream, or binary file +with the same structure, to a csv file.
+| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 2 | +2 | 1 | -loss | +loss |
A binary file of the same format can be piped into fmtocsv.
-$ [stdin component] | fmtocsv > [output].csv
-$ fmtocsv < [stdin].bin > [output].csv
-
-$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc | fmtocsv > fmcalc.csv
-$ fmtocsv < fmcalc.bin > fmcalc.csv
-
-$ [stdin component] | fmtocsv > [output].csv
+$ fmtocsv < [stdin].bin > [output].csv
+$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc | fmtocsv > fmcalc.csv
+$ fmtocsv < fmcalc.bin > fmcalc.csv
+Csv file with the following fields;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| output_id | +output_id | int | 4 | -Oasis output_id | -5 | +Oasis output_id | +5 |
| sidx | +sidx | int | 4 | -Sample index | -10 | +Sample index | +10 |
| loss | +loss | float | 4 | -The insured loss value | -5375.675 | +The insured loss value | +5375.675 |
A component which converts the summarycalc output stream, or binary file with the same structure, to a csv file.
-A component which converts the summarycalc output stream, or binary +file with the same structure, to a csv file.
+| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 3 | +3 | 1 | -summary | +summary |
A binary file of the same format can be piped into summarycalctocsv.
-$ [stdin component] | summarycalctocsv > [output].csv
-$ summarycalctocsv < [stdin].bin > [output].csv
-
-$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc | summarycalc -f -1 - | summarycalctocsv > summarycalc.csv
-$ summarycalctocsv < summarycalc.bin > summarycalc.csv
-
-A binary file of the same format can be piped into +summarycalctocsv.
+$ [stdin component] | summarycalctocsv > [output].csv
+$ summarycalctocsv < [stdin].bin > [output].csv
+$ eve 1 1 | getmodel | gulcalc -r -S100 -a1 -i - | fmcalc | summarycalc -f -1 - | summarycalctocsv > summarycalc.csv
+$ summarycalctocsv < summarycalc.bin > summarycalc.csv
+Csv file with the following fields;
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| summary_id | +summary_id | int | 4 | -Oasis summary_id | -3 | +Oasis summary_id | +3 |
| sidx | +sidx | int | 4 | -Sample index | -10 | +Sample index | +10 |
| loss | +loss | float | 4 | -The insured loss value | -5375.675 | +The insured loss value | +5375.675 |
A component which converts gulcalc data in csv format into gulcalc binary item stream (1/1).
+A component which converts gulcalc data in csv format into gulcalc +binary item stream (1/1).
| Name | +Name | Type | Bytes | -Description | -Example | +Description | +Example |
|---|---|---|---|---|---|---|---|
| event_id | +event_id | int | 4 | -Oasis event_id | -4545 | +Oasis event_id | +4545 |
| item_id | +item_id | int | 4 | -Oasis item_id | -300 | +Oasis item_id | +300 |
| sidx | +sidx | int | 4 | -Sample index | -10 | +Sample index | +10 |
| loss | +loss | float | 4 | -The ground up loss value | -5675.675 | +The ground up loss value | +5675.675 |
-S, the number of samples must be provided. This can be equal to or greater than maximum sample index value that appears in the csv data. --t, the stream type of either 1 for the deprecated item stream or 2 for the loss stream. This is an optional parameter with default value 2.
-$ gultobin [parameters] < [input].csv | [stdin component]
-$ gultobin [parameters] < [input].csv > [output].bin
-
-$ gultobin -S100 < gulcalci.csv | fmcalc > fmcalc.bin
+-S, the number of samples must be provided. This can be equal to or
+greater than maximum sample index value that appears in the csv data.
+-t, the stream type of either 1 for the deprecated item stream or 2 for
+the loss stream. This is an optional parameter with default value 2.
+Usage
+$ gultobin [parameters] < [input].csv | [stdin component]
+$ gultobin [parameters] < [input].csv > [output].bin
+Example
+$ gultobin -S100 < gulcalci.csv | fmcalc > fmcalc.bin
$ gultobin -S100 < gulcalci.csv > gulcalci.bin
$ gultobin -S100 -t1 < gulcalci.csv > gulcalci.bin
-$ gultobin -S100 -t2 < gulcalci.csv > gulcalci.bin
-
+$ gultobin -S100 -t2 < gulcalci.csv > gulcalci.bin
| Byte 1 | +Byte 1 | Bytes 2-4 | -Description | +Description |
|---|---|---|---|---|
| 1 | +1 | 1 | -gulcalc item | +gulcalc item |
| 2 | +2 | 1 | -gulcalc loss | +gulcalc loss |
Go to 4.5. Validation Components
- - - - +Go to 4.6. Validation +Components
+ + + diff --git a/docs/html/ValidationComponents.html b/docs/html/ValidationComponents.html index 92a7f91b..86e80c07 100644 --- a/docs/html/ValidationComponents.html +++ b/docs/html/ValidationComponents.html @@ -1,383 +1,197 @@ - - - -

The following components run validity checks on csv format files:
Model data files
Oasis input files
interval_type column included.The checks can be performed on damage_bin_dict.csv from the command line:
$ validatedamagebin < damage_bin_dict.csv
-
-The checks are also performed by default when converting damage bin dictionary files from csv to binary format:
-$ damagebintobin < damage_bin_dict.csv > damage_bin_dict.bin
+The checks can be performed on damage_bin_dict.csv from
+the command line:
+$ validatedamagebin < damage_bin_dict.csv
+The checks are also performed by default when converting damage bin
+dictionary files from csv to binary format:
+$ damagebintobin < damage_bin_dict.csv > damage_bin_dict.bin
-# Suppress validation checks with -N argument
-$ damagebintobin -N < damage_bin_dict.csv > damage_bin_dict.bin
-
+# Suppress vaidation checks with -N argument
+$ damagebintobin -N < damage_bin_dict.csv > damage_bin_dict.bin
The following checks are performed on the event footprint:
@@ -413,68 +227,91 @@Should all checks pass, the maximum value of intensity_bin_index is given, which is a required input for footprinttobin.
The checks can be performed on footprint.csv from the command line:
$ validatefootprint < footprint.csv
-
-The checks are also performed by default when converting footprint files from csv to binary format:
-$ footprinttobin -i {number of intensity bins} < footprint.csv
+Should all checks pass, the maximum value of
+intensity_bin_index is given, which is a required input for
+footprinttobin.
+The checks can be performed on footprint.csv from the
+command line:
+$ validatefootprint < footprint.csv
+The checks are also performed by default when converting footprint
+files from csv to binary format:
+$ footprinttobin -i {number of intensity bins} < footprint.csv
# Suppress validation checks with -N argument
-$ footprinttobin -i {number of intensity bins} -N < footprint.csv
-
+$ footprinttobin -i {number of intensity bins} -N < footprint.csv
The following checks are performed on the vulnerability data:
Should all checks pass, the maximum value of damage_bin_id is given, which is a required input for vulnerabilitytobin.
The checks can be performed on vulnerability.csv from the command line:
$ validatevulnerability < vulnerability.csv
-
-The checks are also performed by default when converting vulnerability files from csv to binary format:
-$ vulnerabilitytobin -d {number of damage bins} < vulnerability.csv > vulnerability.bin
+Should all checks pass, the maximum value of
+damage_bin_id is given, which is a required input for
+vulnerabilitytobin.
+The checks can be performed on vulnerability.csv from
+the command line:
+$ validatevulnerability < vulnerability.csv
+The checks are also performed by default when converting
+vulnerability files from csv to binary format:
+$ vulnerabilitytobin -d {number of damage bins} < vulnerability.csv > vulnerability.bin
# Suppress validation checks with -N argument
-$ vulnerabilitytobin -d {number of damage bins} -N < vulnerability.csv > vulnerability.bin
-
+$ vulnerabilitytobin -d {number of damage bins} -N < vulnerability.csv > vulnerability.bin
The following checks are performed across the damage bin dictionary, event footprint and vulnerability data:
+The following checks are performed across the damage bin dictionary, +event footprint and vulnerability data:
The checks can be performed on damage_bin_dict.csv, footprint.csv and vulnerability.csv from the command line:
$ crossvalidation -d damage_bin_dict.csv -f footprint.csv -s vulnerability.csv
-
+The checks can be performed on damage_bin_dict.csv,
+footprint.csv and vulnerability.csv from the
+command line:
$ crossvalidation -d damage_bin_dict.csv -f footprint.csv -s vulnerability.csv
The following checks are performed across the coverages, items, fm policytc, fm programme and fm profile data:
+The following checks are performed across the coverages, items, fm +policytc, fm programme and fm profile data:
agg_id in fm_programme.csv and item_id in items.csv when level_id = 1.coverage_id in items.csv matches those in coverages.csv.policytc_id in fm_policytc.csv matches those in fm_profile.csv.level_id, agg_id) pairs in fm_policytc.csv are present as (level_id, to_agg_id) pairs in fm_programme.csv.level_id = n > 1, from_agg_id corresponds to a to_agg_id from level_id = n - 1.agg_id in
+fm_programme.csv and item_id in
+items.csv when level_id = 1.coverage_id in items.csv matches those in
+coverages.csv.policytc_id in fm_policytc.csv matches
+those in fm_profile.csv.level_id, agg_id) pairs in
+fm_policytc.csv are present as (level_id,
+to_agg_id) pairs in fm_programme.csv.level_id = n > 1, from_agg_id
+corresponds to a to_agg_id from
+level_id = n - 1.The checks can be performed on coverages.csv, items.csv, fm_policytc.csv, fm_programme.csv and fm_profile.csv from the command line, specifying the directory these files are located in:
$ validateoasisfiles -d path/to/output/directory
-
-The Ground Up Losses (GUL) flag g can be specified to only perform checks on items.csv and coverages.csv:
$ validateoasisfiles -g -d /path/to/output/directory
-
+The checks can be performed on coverages.csv,
+items.csv, fm_policytc.csv,
+fm_programme.csv and fm_profile.csv from the
+command line, specifying the directory these files are located in:
$ validateoasisfiles -d path/to/output/directory
+The Ground Up Losses (GUL) flag g can be specified to
+only perform checks on items.csv and
+coverages.csv:
$ validateoasisfiles -g -d /path/to/output/directory
-
-
-
-
-
+
+
+
+
diff --git a/docs/html/Workflows.html b/docs/html/Workflows.html
index 4520683f..bfd055cc 100644
--- a/docs/html/Workflows.html
+++ b/docs/html/Workflows.html
@@ -1,457 +1,319 @@
-
-
-
-
ktools is capable of multiple output workflows. This brings much greater flexibility, but also more complexity for users of the toolkit.
-This section presents some example workflows, starting with single output workflows and then moving onto more complex multi-output workflows. There are some python scripts provided which execute some of the illustrative workflows using the example data in the repository. It is assumed that workflows will generally be run across multiple processes, with the number of processes being specified by the user.
-In this example, the core workflow is run through fmcalc into summarycalc and then the losses are summarized by summary set 2, which is "portfolio" summary level. -This produces multiple output files when run with multiple processes, each containing a subset of the event set. The output files can be concatinated together at the end.
-eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | eltcalc > elt_p1.csv
-eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | eltcalc > elt_p2.csv
-
+
+
+
+
+
+
+ 
ktools is capable of multiple output workflows. This brings much +greater flexibility, but also more complexity for users of the +toolkit.
+This section presents some example workflows, starting with single +output workflows and then moving onto more complex multi-output +workflows. There are some python scripts provided which execute some of +the illustrative workflows using the example data in the repository. It +is assumed that workflows will generally be run across multiple +processes, with the number of processes being specified by the user.
+In this example, the core workflow is run through fmcalc into +summarycalc and then the losses are summarized by summary set 2, which +is "portfolio" summary level. This produces multiple output files when +run with multiple processes, each containing a subset of the event set. +The output files can be concatinated together at the end.
+eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | eltcalc > elt_p1.csv
+eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | eltcalc > elt_p2.csv

See example script eltcalc_example.py
-This is very similar to the first example, except the summary samples are run through pltcalc instead. The output files can be concatinated together at the end.
-eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | pltcalc > plt_p1.csv
-eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | pltcalc > plt_p2.csv
-
+
See example script eltcalc_example.py ***
+This is very similar to the first example, except the summary samples +are run through pltcalc instead. The output files can be concatinated +together at the end.
+eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | pltcalc > plt_p1.csv
+eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - | pltcalc > plt_p2.csv

See example script pltcalc_example.py
-In this example, the summary samples are calculated as in the first two examples, but the results are output to the work folder. Until this stage the calculation is run over multiple processes. Then, in a single process, leccalc reads the summarycalc binaries from the work folder and computes two loss exceedance curves in a single process. Note that you can output all eight loss exceedance curve variants in a single leccalc command.
-eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - > work/summary2/p1.bin
+
+See example script pltcalc_example.py ***
+3.
+Portfolio summary level full uncertainty aggregate and occurrence loss
+exceedance curves
+In this example, the summary samples are calculated as in the first
+two examples, but the results are output to the work folder. Until this
+stage the calculation is run over multiple processes. Then, in a single
+process, leccalc reads the summarycalc binaries from the work folder and
+computes two loss exceedance curves in a single process. Note that you
+can output all eight loss exceedance curve variants in a single leccalc
+command.
+eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - > work/summary2/p1.bin
eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 - > work/summary2/p1.bin
-leccalc -Ksummary2 -F lec_full_uncertainty_agg.csv -f lec_full_uncertainty_occ.csv
-
+leccalc -Ksummary2 -F lec_full_uncertainty_agg.csv -f lec_full_uncertainty_occ.csv

See example script leccalc_example.py
-Similarly to lec curves, the samples are run through to summarycalc, and the summarycalc binaries are output to the work folder. Until this stage the calculation is run over multiple processes. Then, in a single process, aalcalc reads the summarycalc binaries from the work folder and computes the aal output.
-eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 work/summary2/p1.bin
+
+See example script leccalc_example.py ***
+4. Portfolio
+summary level average annual loss
+Similarly to lec curves, the samples are run through to summarycalc,
+and the summarycalc binaries are output to the work folder. Until this
+stage the calculation is run over multiple processes. Then, in a single
+process, aalcalc reads the summarycalc binaries from the work folder and
+computes the aal output.
+eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 work/summary2/p1.bin
eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -2 work/summary2/p2.bin
-aalcalc -Ksummary2 > aal.csv
-
+aalcalc -Ksummary2 > aal.csv

See example script aalcalc_example.py
-
See example script aalcalc_example.py ***
gulcalc can generate two output streams at once: item level samples to pipe into fmcalc, and coverage level samples to pipe into summarycalc. This means that outputs for both ground up loss and insured loss can be generated in one workflow.
-This is done by writing one stream to a file or named pipe, while streaming the other to standard output down the pipeline.
eve 1 2 | getmodel | gulcalc -r -S100 -i gulcalci1.bin -c - | summarycalc -g -2 - | eltcalc > gul_elt_p1.csv
+5. Ground up and insured
+loss workflows
+gulcalc can generate two output streams at once: item level samples
+to pipe into fmcalc, and coverage level samples to pipe into
+summarycalc. This means that outputs for both ground up loss and insured
+loss can be generated in one workflow.
+This is done by writing one stream to a file or named pipe, while
+streaming the other to standard output down the pipeline.
+eve 1 2 | getmodel | gulcalc -r -S100 -i gulcalci1.bin -c - | summarycalc -g -2 - | eltcalc > gul_elt_p1.csv
eve 2 2 | getmodel | gulcalc -r -S100 -i gulcalci2.bin -c - | summarycalc -g -2 - | eltcalc > gul_elt_p2.csv
fmcalc < gulcalci1.bin | summarycalc -f -2 - | eltcalc > fm_elt_p1.csv
-fmcalc < gulcalci2.bin | summarycalc -f -2 - | eltcalc > fm_elt_p2.csv
-
-Note that the gulcalc item stream does not need to be written off to disk, as it can be sent to a 'named pipe', which keeps the data in-memory and kicks off a new process. This is easy to do in Linux (but harder in Windows).
+fmcalc < gulcalci2.bin | summarycalc -f -2 - | eltcalc > fm_elt_p2.csv +Note that the gulcalc item stream does not need to be written off to +disk, as it can be sent to a 'named pipe', which keeps the data +in-memory and kicks off a new process. This is easy to do in Linux (but +harder in Windows).
Figure 5 illustrates an example workflow.
-
See example script gulandfm_example.py
-Summarycalc is capable of summarizing samples to up to 10 different user-defined levels for ground up loss and insured loss. This means that different outputs can be run on different summary levels. In this example, event loss tables for two different summary levels are generated.
-eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -1 s1/p1.bin -2 s2/p1.bin
+Figure 5.
+Ground up and insured loss example workflow
+
+See example script gulandfm_example.py
+***
+6. Multiple summary level
+workflows
+Summarycalc is capable of summarizing samples to up to 10 different
+user-defined levels for ground up loss and insured loss. This means that
+different outputs can be run on different summary levels. In this
+example, event loss tables for two different summary levels are
+generated.
+eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -1 s1/p1.bin -2 s2/p1.bin
eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc | summarycalc -f -1 s1/p2.bin -2 s2/p2.bin
eltcalc < s1/p1.bin > elt_s1_p1.csv
eltcalc < s1/p2.bin > elt_s1_p2.csv
eltcalc < s2/p1.bin > elt_s2_p1.csv
-eltcalc < s2/p2.bin > elt_s2_p2.csv
-
-Again, the summarycalc streams can be sent to named pipes rather than written off to disk.
-Figure 6 illustrates multiple summary level streams, each of which can go to different output calculations.
-
Again, the summarycalc streams can be sent to named pipes rather than +written off to disk.
+Figure 6 illustrates multiple summary level streams, each of which +can go to different output calculations.
+
The fmcalc component can be used recursively in order to apply multiple sets of policy terms and conditions, in order to support reinsurance. Figure 7 shows a simple example workflow of a direct insurance calculation followed by a reinsurance calculation.
-eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc -p direct | fmcalc -p ri1 -n > fmcalc_1.bin
-eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc -p direct | fmcalc -p ri1 -n > fmcalc_2.bin
-
-
Each call of fmcalc requires the same input files, so it is necessary to specify the location of the files for each call using the command line parameter -p and the relative folder path. Figure 8 demonstrates the required files for three consecutive calls of fmcalc.
-
It is possible to generate all of the outputs for each call of fmcalc in the same workflow, enabling multiple financial perspective reports, as shown in Figure 9.
-
The fmcalc component can be used recursively in order to apply +multiple sets of policy terms and conditions, in order to support +reinsurance. Figure 7 shows a simple example workflow of a direct +insurance calculation followed by a reinsurance calculation.
+eve 1 2 | getmodel | gulcalc -r -S100 -i - | fmcalc -p direct | fmcalc -p ri1 -n > fmcalc_1.bin
+eve 2 2 | getmodel | gulcalc -r -S100 -i - | fmcalc -p direct | fmcalc -p ri1 -n > fmcalc_2.bin
+
Each call of fmcalc requires the same input files, so it is necessary +to specify the location of the files for each call using the command +line parameter -p and the relative folder path. Figure 8 demonstrates +the required files for three consecutive calls of fmcalc.
+
It is possible to generate all of the outputs for each call of fmcalc +in the same workflow, enabling multiple financial perspective reports, +as shown in Figure 9.
+
Go to Appendix A Random numbers
- - - - +Go to Appendix A Random numbers
+ + + diff --git a/docs/html/fmprofiles.html b/docs/html/fmprofiles.html index 75051f42..1d57b537 100644 --- a/docs/html/fmprofiles.html +++ b/docs/html/fmprofiles.html @@ -1,535 +1,435 @@ - - - -
This section specifies the attributes and rules for the following list of Financial module profiles.
+ + + + + + +
This section specifies the attributes and rules for the following +list of Financial module profiles.
| Profile description | -calcrule_id | +Profile description | +calcrule_id |
|---|---|---|---|
| Do nothing (pass losses through) | -100 | +Do nothing (pass losses through) | +100 |
| deductible and limit | -1 | +deductible and limit | +1 |
| deductible with attachment, limit and share | -2 | +deductible with attachment, limit and +share | +2 |
| franchise deductible and limit | -3 | +franchise deductible and limit | +3 |
| deductible % TIV and limit | -4 | +deductible % TIV and limit | +4 |
| deductible and limit % loss | -5 | +deductible and limit % loss | +5 |
| deductible % TIV | -6 | +deductible % TIV | +6 |
| deductible, minimum and maximum deductible, with limit | -7 | +deductible, minimum and maximum +deductible, with limit | +7 |
| deductible and minimum deductible, with limit | -8 | +deductible and minimum deductible, with +limit | +8 |
| limit with deductible % limit | -9 | +limit with deductible % limit | +9 |
| deductible and maximum deductible | -10 | +deductible and maximum deductible | +10 |
| deductible and minimum deductible | -11 | +deductible and minimum deductible | +11 |
| deductible | -12 | +deductible | +12 |
| deductible, minimum and maximum deductible | -13 | +deductible, minimum and maximum +deductible | +13 |
| limit only | -14 | +limit only | +14 |
| deductible and limit % loss | -15 | +deductible and limit % loss | +15 |
| deductible % loss | -16 | +deductible % loss | +16 |
| deductible % loss with attachment, limit and share | -17 | +deductible % loss with attachment, limit +and share | +17 |
| deductible % tiv with attachment, limit and share | -18 | +deductible % tiv with attachment, limit +and share | +18 |
| deductible % loss with min and/or max deductible | -19 | +deductible % loss with min and/or max +deductible | +19 |
| reverse franchise deductible | -20 | +reverse franchise deductible | +20 |
| deductible % tiv with min and max deductible | -21 | +deductible % tiv with min and max +deductible | +21 |
| reinsurance % ceded, limit and % placed | -22 | +reinsurance % ceded, limit and % +placed | +22 |
| reinsurance limit and % placed | -23 | +reinsurance limit and % placed | +23 |
| reinsurance excess terms | -24 | +reinsurance excess terms | +24 |
| reinsurance proportional terms | -25 | +reinsurance proportional terms | +25 |
| deductible % loss with min and/or max deductible and limit | -26 | +deductible % loss with min and/or max +deductible and limit | +26 |
| % tiv trigger and % tiv step payout with limit | -27 | +% tiv trigger and % tiv step payout with +limit | +27 |
| % tiv trigger and % loss step payout | -28 | +% tiv trigger and % loss step payout | +28 |
| % tiv trigger and % tiv step payout | -29 | +% tiv trigger and % tiv step payout | +29 |
| % tiv trigger and % limit step payout | -30 | +% tiv trigger and % limit step payout | +30 |
| % tiv trigger and monetary amount step payout | -31 | +% tiv trigger and monetary amount step +payout | +31 |
| monetary amount trigger and % loss step payout with limit | -32 | +monetary amount trigger and % loss step +payout with limit | +32 |
| deductible % loss with limit | -33 | +deductible % loss with limit | +33 |
| deductible with attachment and share | -34 | +deductible with attachment and share | +34 |
| deductible % loss with min and/or max deductible and limit % loss | -35 | +deductible % loss with min and/or max +deductible and limit % loss | +35 |
| deductible with min and/or max deductible and limit % loss | -36 | +deductible with min and/or max deductible +and limit % loss | +36 | +
| % tiv trigger and % loss step payout with +limit | +37 | +||
| conditional coverage payouts based on +prior step payouts | +38 |
| calcrule_id | +calcrule_id | d1 | d2 | d3 | @@ -545,12 +445,12 @@l2 | sc1 | -sc2 | +sc2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100 | +100 | @@ -566,10 +466,10 @@ | - | + | |||||||||||||
| 1 | +1 | x | @@ -585,10 +485,10 @@ | - | + | ||||||||||||
| 2 | +2 | x | @@ -604,10 +504,10 @@ | - | + | ||||||||||||
| 3 | +3 | x | @@ -623,10 +523,10 @@ | - | + | ||||||||||||
| 4 | +4 | x | @@ -642,10 +542,10 @@ | - | + | ||||||||||||
| 5 | +5 | x | @@ -661,10 +561,10 @@ | - | + | ||||||||||||
| 6 | +6 | x | @@ -680,10 +580,10 @@ | - | + | ||||||||||||
| 7 | +7 | x | x | x | @@ -699,10 +599,10 @@- | + | |||||||||||
| 8 | +8 | x | x | @@ -718,10 +618,10 @@ | - | + | |||||||||||
| 9 | +9 | x | @@ -737,10 +637,10 @@ | - | + | ||||||||||||
| 10 | +10 | x | x | @@ -756,10 +656,10 @@- | + | ||||||||||||
| 11 | +11 | x | x | @@ -775,10 +675,10 @@ | - | + | |||||||||||
| 12 | +12 | x | @@ -794,10 +694,10 @@ | - | + | ||||||||||||
| 13 | +13 | x | x | x | @@ -813,10 +713,10 @@- | + | |||||||||||
| 14 | +14 | @@ -832,10 +732,10 @@ | - | + | |||||||||||||
| 15 | +15 | x | @@ -851,10 +751,10 @@ | - | + | ||||||||||||
| 16 | +16 | x | @@ -870,10 +770,10 @@ | - | + | ||||||||||||
| 17 | +17 | x | @@ -889,10 +789,10 @@ | - | + | ||||||||||||
| 18 | +18 | x | @@ -908,10 +808,10 @@ | - | + | ||||||||||||
| 19 | +19 | x | x | x | @@ -927,10 +827,10 @@- | + | |||||||||||
| 20 | +20 | x | @@ -946,10 +846,10 @@ | - | + | ||||||||||||
| 21 | +21 | x | x | x | @@ -965,10 +865,10 @@- | + | |||||||||||
| 22 | +22 | @@ -984,10 +884,10 @@ | - | + | |||||||||||||
| 23 | +23 | @@ -1003,10 +903,10 @@ | - | + | |||||||||||||
| 24 | +24 | @@ -1022,10 +922,10 @@ | - | + | |||||||||||||
| 25 | +25 | @@ -1041,10 +941,10 @@ | - | + | |||||||||||||
| 26 | +26 | x | x | x | @@ -1060,10 +960,10 @@- | + | |||||||||||
| 27 | +27 | x | @@ -1079,10 +979,10 @@ | x | x | -x | +x | ||||||||||
| 28 | +28 | x | @@ -1098,10 +998,10 @@ | x | x | -x | +x | ||||||||||
| 29 | +29 | x | @@ -1117,10 +1017,10 @@ | x | x | -x | +x | ||||||||||
| 30 | +30 | x | @@ -1136,10 +1036,10 @@ | x | x | -x | +x | ||||||||||
| 31 | +31 | x | @@ -1155,10 +1055,10 @@ | x | x | -x | +x | ||||||||||
| 32 | +32 | @@ -1174,10 +1074,10 @@ | x | x | -x | +x | |||||||||||
| 33 | +33 | x | @@ -1193,13 +1093,31 @@ | - | + | ||||||||||||
| 34 | +34 | +x | ++ | + | x | +x | + | + | + | + | + | + | + | + | + | ||
| 35 | +x | +x | x | x | @@ -1212,10 +1130,11 @@- | + | + | ||||||||||
| 35 | +36 | x | x | x | @@ -1231,15 +1150,29 @@- | + | |||||||||||
| 36 | +37 | +x | ++ | + | + | x | ++ | + | + | x | x | x | x | x | +x | +x | +|
| 38 | @@ -1248,531 +1181,608 @@ | + | x | +x | +x | - | + | x | +x | +x |
The fields with an x are those which are required by the profile. The full names of the fields are as follows;
+The fields with an x are those which are required by the profile. The +full names of the fields are as follows;
| Short name | -Profile field name | +Short name | +Profile field name |
|---|---|---|---|
| d1 | -deductible_1 | +d1 | +deductible_1 |
| d2 | -deductible_2 | +d2 | +deductible_2 |
| d3 | -deductible_3 | +d3 | +deductible_3 |
| a1 | -attachment_1 | +a1 | +attachment_1 |
| l1 | -limit_1 | +l1 | +limit_1 |
| sh1 | -share_1 | +sh1 | +share_1 |
| sh2 | -share_2 | +sh2 | +share_2 |
| sh3 | -share_3 | +sh3 | +share_3 |
| st | -step_id | +st | +step_id |
| ts | -trigger_start | +ts | +trigger_start |
| te | -trigger_end | +te | +trigger_end |
| ps | -payout_start | +ps | +payout_start |
| pe | -payout_end | +pe | +payout_end |
| l2 | -limit_2 | +l2 | +limit_2 |
| sc1 | -scale_1 | +sc1 | +scale_1 |
| sc2 | -scale_2 | +sc2 | +scale_2 |
An allocation rule can be assigned to each call of fmcalc, which determines whether calculated losses should be back-allocated to the contributing items, and if so how. This is specified via the command line parameter -a.
+An allocation rule can be assigned to each call of fmcalc, which +determines whether calculated losses should be back-allocated to the +contributing items, and if so how. This is specified via the command +line parameter -a.
The choices are as follows;
| Allocrule description | -allocrule_id | +Allocrule description | +allocrule_id |
|---|---|---|---|
| Don't back-allocate losses (default if no parameter supplied) | -0 | +Don't back-allocate losses (default if no +parameter supplied) | +0 |
| Back allocate losses to items in proportion to input loss | -1 | +Back allocate losses to items in +proportion to input loss | +1 |
| Back-allocate losses to items in proportion to prior level loss | -2 | +Back-allocate losses to items in +proportion to prior level loss | +2 |
| Back-allocate losses to items in proportion to prior level loss (reinsurance) | -3 | +Back-allocate losses to items in +proportion to prior level loss (reinsurance) | +3 |
Often there are more than one hierarchal levels with deductibles, and there is a choice of methods of accumulation of deductibles through the hierarchy. Whenever a rule with a deductible is used in the loss calculation then it is accumulated through the calculation in an effective_deductible variable. The effective deductible is the smaller of the deductible amount and the loss.
-All deductibles amounts calculated from the deductible_1 field are simply additive through the hierarchy.
-Ay any level, the user can specify a calcrule using a minimum and/or maximum deductible which changes the way that effective deductibles are accumulated.
-For a minimum deductible specified in calcrules using the deductible_2 field, the calculation increases the effective_deductible carried forward from the previous levels up to the minimum deductible if it is smaller.
-For a maximum deductible specified in calcrules using the deductible_3 field, the calculation decreases the effective_deductible carried forward from the previous levels down to the maximum deductible if it is larger.
-Loss adjustments due to minimum and maximum deductibles may lead to breaching or falling short of prior level limits. For instance, an increase in loss due a policy maximum deductible being applied can lead to a breach of site limit that applied at the prior calculation level. Conversely, a decrease in loss due to policy minimum deductible can leave the loss falling short of a site limit applied at the prior calculation level. In these situations the prior level limits are carried through and reapplied in all calcrules that have minimum and/or maximum deductibles.
+Often there are more than one hierarchal levels with deductibles, and +there is a choice of methods of accumulation of deductibles through the +hierarchy. Whenever a rule with a deductible is used in the loss +calculation then it is accumulated through the calculation in an +effective_deductible variable. The effective deductible +is the smaller of the deductible amount and the loss.
+All deductibles amounts calculated from the deductible_1 field are +simply additive through the hierarchy.
+Ay any level, the user can specify a calcrule using a minimum and/or +maximum deductible which changes the way that effective deductibles are +accumulated.
+For a minimum deductible specified in calcrules using the +deductible_2 field, the calculation increases the effective_deductible +carried forward from the previous levels up to the minimum deductible if +it is smaller.
+For a maximum deductible specified in calcrules using the +deductible_3 field, the calculation decreases the effective_deductible +carried forward from the previous levels down to the maximum deductible +if it is larger.
+Loss adjustments due to minimum and maximum deductibles may lead to +breaching or falling short of prior level limits. For instance, an +increase in loss due a policy maximum deductible being applied can lead +to a breach of site limit that applied at the prior calculation level. +Conversely, a decrease in loss due to policy minimum deductible can +leave the loss falling short of a site limit applied at the prior +calculation level. In these situations the prior level limits are +carried through and reapplied in all calcrules that have minimum and/or +maximum deductibles.
We introduce the following variables;
The over and under limit variables are initialised when there exist prior level limits. The possible cases are;
+The over and under limit variables are initialised when there exist +prior level limits. The possible cases are;
| Case | +Case | Under limit | Over limit | -Meaning | +Meaning |
|---|---|---|---|---|---|
| 1 | +1 | 0 | 0 | -All prior level losses are exactly at their limits | +All prior level losses are exactly at +their limits |
| 2 | +2 | 0 | >0 | -Some prior level losses are over limit and none are under limit | +Some prior level losses are over limit and +none are under limit |
| 3 | +3 | >0 | 0 | -Some prior level losses are under limit and none are over limit | +Some prior level losses are under limit +and none are over limit |
| 4 | +4 | >0 | >0 | -Some prior level losses are over limit and some are under limit | +Some prior level losses are over limit and +some are under limit |
When the loss delta is positive;
When the loss delta is negative;
Current calculation level limits may also apply and these are used to update the over limit and under limit measures to carry through to the next level.
+Current calculation level limits may also apply and these are used to +update the over limit and under limit measures to carry through to the +next level.
In the following notation;
| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -1 | +calcrule_id | +1 |
| deductible_1 | -50000 | +deductible_1 | +50000 |
| limit_1 | -900000 | +limit_1 | +900000 |
loss = x.loss - deductible_1;
-if (loss < 0) loss = 0;
-if (loss > limit_1) loss = limit_1;
-
-loss = x.loss - deductible_1;
+if (loss < 0) loss = 0;
+if (loss > limit_1) loss = limit_1;| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -2 | +calcrule_id | +2 |
| deductible_1 | -70000 | +deductible_1 | +70000 |
| attachment_1 | -0 | +attachment_1 | +0 |
| limit_1 | -1000000 | +limit_1 | +1000000 |
| share_1 | -0.1 | +share_1 | +0.1 |
loss = x.loss - deductible_1
-if (loss < 0) loss = 0;
-if (loss > (attachment_1 + limit_1)) loss = limit_1;
- else loss = loss - attachment_1;
-if (loss < 0) loss = 0;
-loss = loss * share_1;
-
-loss = x.loss - deductible_1
+if (loss < 0) loss = 0;
+if (loss > (attachment_1 + limit_1)) loss = limit_1;
+ else loss = loss - attachment_1;
+if (loss < 0) loss = 0;
+loss = loss * share_1;| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -3 | +calcrule_id | +3 |
| deductible_1 | -100000 | +deductible_1 | +100000 |
| limit_1 | -1000000 | +limit_1 | +1000000 |
if (x.loss < deductible_1) loss = 0;
- else loss = x.loss;
-if (loss > limit_1) loss = limit_1;
-
-if (x.loss < deductible_1) loss = 0;
+ else loss = x.loss;
+if (loss > limit_1) loss = limit_1;| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -5 | +calcrule_id | +5 |
| deductible_1 | -0.05 | +deductible_1 | +0.05 |
| limit_1 | -0.3 | +limit_1 | +0.3 |
loss = x.loss - (x.loss * deductible_1);
-if (loss > (x.loss * limit_1)) loss = x.loss * lim;
-
-loss = x.loss - (x.loss * deductible_1);
+if (loss > (x.loss * limit_1)) loss = x.loss * lim;| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -9 | +calcrule_id | +9 |
| deductible_1 | -0.05 | +deductible_1 | +0.05 |
| limit_1 | -100000 | +limit_1 | +100000 |
loss = x.loss - (deductible_1 * limit_1);
-if (loss < 0) loss = 0;
-if (loss > limit_1) loss = limit_1;
-
+loss = x.loss - (deductible_1 * limit_1);
+if (loss < 0) loss = 0;
+if (loss > limit_1) loss = limit_1;If the effective deductible carried forward from the previous level exceeds the maximum deductible, the effective deductible is decreased to the maximum deductible value
+If the effective deductible carried forward from the previous level +exceeds the maximum deductible, the effective deductible is decreased to +the maximum deductible value
| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -10 | +calcrule_id | +10 |
| deductible_3 | -40000 | +deductible_3 | +40000 |
if (x.effective_deductible > deductible_3) {
- loss = x.loss + x.effective_deductible - deductible_3;
- if (loss < 0) loss = 0;
- }
-else {
- loss = x.loss;
- }
-
+if (x.effective_deductible > deductible_3) {
+ loss = x.loss + x.effective_deductible - deductible_3;
+ if (loss < 0) loss = 0;
+ }
+else {
+ loss = x.loss;
+ }If the effective deductible carried forward from the previous level is less than the minimum deductible, the deductible is increased to the total loss or the minimum deductible value, whichever is greater.
+If the effective deductible carried forward from the previous level +is less than the minimum deductible, the deductible is increased to the +total loss or the minimum deductible value, whichever is greater.
| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -11 | +calcrule_id | +11 |
| deductible_2 | -70000 | +deductible_2 | +70000 |
if (x.effective_deductible < deductible_2) {
- loss = x.loss + x.effective_deductible - deductible_2;
- if (loss < 0) loss = 0;
- }
-else {
- loss = x.loss;
- }
-
+if (x.effective_deductible < deductible_2) {
+ loss = x.loss + x.effective_deductible - deductible_2;
+ if (loss < 0) loss = 0;
+ }
+else {
+ loss = x.loss;
+ }| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -12 | +calcrule_id | +12 |
| deductible_1 | -100000 | +deductible_1 | +100000 |
loss = x.loss - deductible_1;
-if (loss < 0) loss = 0;
-
+loss = x.loss - deductible_1;
+if (loss < 0) loss = 0;| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -14 | +calcrule_id | +14 |
| limit | -100000 | +limit | +100000 |
loss = x.loss;
-if (loss > limit_1) loss = limit_1;
-
-loss = x.loss;
+if (loss > limit_1) loss = limit_1;| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -15 | +calcrule_id | +15 |
| limit_1 | -0.3 | +limit_1 | +0.3 |
loss = x.loss * limit_1;
-
-loss = x.loss * limit_1;| Attributes | -Example | +Attributes | +Example |
|---|---|---|---|
| policytc_id | -1 | +policytc_id | +1 |
| calcrule_id | -16 | +calcrule_id | +16 |
| deductible_1 | -0.05 | +deductible_1 | +0.05 |
loss = x.loss - (x.loss * deductible_1);
-if (loss < 0) loss = 0;
-
+loss = x.loss - (x.loss * deductible_1);
+if (loss < 0) loss = 0;Go to Appendix C Multi-peril model support
- - - - +Go to Appendix C Multi-peril model +support
+ + + diff --git a/docs/md/DataConversionComponents.md b/docs/md/DataConversionComponents.md index 28e179e6..eeb29d6b 100644 --- a/docs/md/DataConversionComponents.md +++ b/docs/md/DataConversionComponents.md @@ -4,15 +4,20 @@ The following components convert input data in csv format to the binary format required by the calculation components in the reference model; **Static data** +* **[aggregatevulnerabilitytobin](#aggregatevulnerability)** converts the aggregate vulnerability data. * **[damagebintobin](#damagebins)** converts the damage bin dictionary. * **[footprinttobin](#footprint)** converts the event footprint. +* **[lossfactorstobin](#lossfactors)** converts the lossfactors data. * **[randtobin](#rand)** converts a list of random numbers. * **[vulnerabilitytobin](#vulnerability)** converts the vulnerability data. +* **[weightstobin](#weights)** converts the weights data. A reference [intensity bin dictionary](#intensitybins) csv should also exist, although there is no conversion component for this file because it is not needed for calculation purposes. **Input data** +* **[amplificationtobin](#amplifications)** converts the amplifications data. * **[coveragetobin](#coverages)** converts the coverages data. +* **[ensembletobin](#ensemble)** converts the ensemble data. * **[evetobin](#events)** converts a list of event_ids. * **[itemtobin](#items)** converts the items data. * **[gulsummaryxreftobin](#gulsummaryxref)** converts the gul summary xref data. @@ -24,19 +29,25 @@ A reference [intensity bin dictionary](#intensitybins) csv should also exist, al * **[occurrencetobin](#occurrence)** converts the event occurrence data. * **[returnperiodtobin](#returnperiod)** converts a list of return periods. * **[periodstobin](#periods)** converts a list of weighted periods (optional). +* **[quantiletobin](#quantile)** converts a list of quantiles (optional). These components are intended to allow users to generate the required input binaries from csv independently of the original data store and technical environment. All that needs to be done is first generate the csv files from the data store (SQL Server database, etc). The following components convert the binary input data required by the calculation components in the reference model into csv format; **Static data** +* **[aggregatevulnerabilitytocsv](#aggregatevulnerability)** converts the aggregate vulnerability data. * **[damagebintocsv](#damagebins)** converts the damage bin dictionary. * **[footprinttocsv](#footprint)** converts the event footprint. +* **[lossfactorstocsv](#lossfactors)** converts the lossfactors data. * **[randtocsv](#rand)** converts a list of random numbers. * **[vulnerabilitytocsv](#vulnerability)** converts the vulnerability data. +* **[weightstocsv](#weights)** converts the weights data. **Input data** +* **[amplificationtocsv](#amplifications)** converts the amplifications data. * **[coveragetocsv](#coverages)** converts the coverages data. +* **[ensembletocsv](#ensemble)** converts the ensemble data. * **[evetocsv](#events)** converts a list of event_ids. * **[itemtocsv](#items)** converts the items data. * **[gulsummaryxreftocsv](#gulsummaryxref)** converts the gul summary xref data. @@ -48,11 +59,43 @@ The following components convert the binary input data required by the calculati * **[occurrencetocsv](#occurrence)** converts the event occurrence data. * **[returnperiodtocsv](#returnperiod)** converts a list of return periods. * **[periodstocsv](#returnperiod)** converts a list of weighted periods (optional). +* **[quantiletocsv](#quantile)** converts a list of quantiles (optional). These components are provided for the convenience of viewing the data and debugging. ## Static data + +### aggregate vulnerability +*** +The aggregate vulnerability file is required for the gulmc component. It contains the conditional distributions of damage for each intensity bin and for each vulnerability_id. This file must have the following location and filename; + +* static/aggregate_vulnerability.bin + +##### File format + +The csv file should contain the following fields and include a header row. + + +| Name | Type | Bytes | Description | Example | +|:-------------------------------|--------|--------| :---------------------------------------------|------------:| +| aggregate_vulnerability_id | int | 4 | Oasis vulnerability_id | 45 | +| vulnerability_id | int | 4 | Oasis vulnerability_id | 45 | + +If this file is present, the weights.bin or weights.csv file must also be present. The data should not contain nulls. + +##### aggregatevulnerabilitytobin +``` +$ aggregatevulnerabilitytobin < aggregate_vulnerability.csv > aggregate_vulnerability.bin +``` + +##### aggregatevulnerabilitytocsv +``` +$ aggregatevulnerabilitytocsv < aggregate_vulnerability.bin > aggregate_vulnerability.csv +``` + +[Return to top](#dataconversioncomponents) + ### damage bin dictionary *** @@ -197,6 +240,37 @@ $ footprinttocsv -z > footprint.csv [Return to top](#dataconversioncomponents) + +### Loss Factors +*** +The lossfactors binary maps the event_id/amplification_id pairs with post loss amplification factors, and is supplied by the model providers. The first 4 bytes are preserved for future use and the data format is as follows. It is required by Post Loss Amplification (PLA) workflow must have the following location and filename; + +* static/lossfactors.bin + +#### File format +The csv file should contain the following fields and include a header row. + +| Name | Type | Bytes | Description | Example | +|:------------------|--------|--------| :---------------------------------------------------------|------------:| +| event_id | int | 4 | Event ID | 1 | +| count | int | 4 | Number of amplification IDs associated with the event ID | 1 | +| amplification_id | int | 4 | Amplification ID | 1 | +| factor | float | 4 | The uplift factor | 1.01 | + +All fields must not have null values. The csv file will not contain the count, and the conversion tools will add/remove this count. + +##### lossfactorstobin +``` +$ lossfactorstobin < lossfactors.csv > lossfactors.bin +``` + +##### lossfactorstocsv +``` +$ lossfactorstocsv < lossfactors.bin > lossfactors.csv +``` + +[Return to top](#dataconversioncomponents) + ### Random numbers *** @@ -294,8 +368,67 @@ $ vulnerabilitytocsv -z > vulnerability.csv ``` [Return to top](#dataconversioncomponents) + +### Weights +*** +The vulnerability weights binary contains the the weighting of each vulnerability function in all areaperil IDs. The data format is as follows. It is required by gulmc with the aggregate_vulnerability file and must have the following location and filename; + +* static/weights.bin + +#### File format +The csv file should contain the following fields and include a header row. + +| Name | Type | Bytes | Description | Example | +|:------------------|--------|--------| :---------------------------------------------------------|------------:| +| areaperil_id | int | 4 | Areaperil ID | 1 | +| vulnerability_id | int | 4 | Vulnerability ID | 1 | +| weight | float | 4 | The weighting factor | 1.0 | + +All fields must not have null values. + +##### weightstobin +``` +$ weightstobin < weights.csv > weights.bin +``` + +##### weightstocsv +``` +$ weightstocsv < weights.bin > weights.csv +``` + +[Return to top](#dataconversioncomponents) + ## Input data + +### Amplifications +*** +The amplifications binary contains the list of item IDs mapped to amplification IDs. The data format is as follows. It is required by Post Loss Amplification (PLA) workflow must have the following location and filename; + +* input/amplifications.bin + +#### File format +The csv file should contain the following fields and include a header row. + +| Name | Type | Bytes | Description | Example | +|:------------------|--------|--------| :---------------------------------------------|------------:| +| item_id | int | 4 | Item ID | 1 | +| amplification_id | int | 4 | Amplification ID | 1 | + +The item_id must start from 1 and must be contiguous and not have null values. The binary file only contains the amplification IDs and assumes the item_ids would start from 1 and are contiguous. + +##### amplificationtobin +``` +$ amplificationtobin < amplifications.csv > amplifications.bin +``` + +##### amplificationtocsv +``` +$ amplificationtocsv < amplifications.bin > amplifications.csv +``` + +[Return to top](#dataconversioncomponents) + ### Coverages *** @@ -325,6 +458,31 @@ $ coveragetocsv < coverages.bin > coverages.csv [Return to top](#dataconversioncomponents) + +### ensemble +*** +The ensemble file is used for ensemble modelling (multiple views) which maps sample IDs to particular ensemble ID groups. It is an optional file for use with AAL and LEC. It must have the following location and filename; +* input/ensemble.bin + +##### File format +The csv file should contain a list of event_ids (integers) and include a header. + +| Name | Type | Bytes | Description | Example | +|:------------------|--------|--------| :-------------------|------------:| +| sidx | int | 4 | Sample ID | 1 | +| ensemble_id | int | 4 | Ensemble ID | 1 | + +##### ensembletobin +``` +$ ensembletobin < ensemble.csv > ensemble.bin +``` + +##### ensembletocsv +``` +$ ensembletocsv < ensemble.bin > ensemble.csv +``` +[Return to top](#dataconversioncomponents) + ### events *** @@ -818,6 +976,34 @@ $ periodstocsv < periods.bin > periods.csv [Return to top](#dataconversioncomponents) + +### Quantile +*** +The quantile binary file contains a list of user specified quantile floats. The data format is as follows. It is optionally used by the Quantile Event/Period Loss tables and must have the following location and filename; + +* input/quantile.bin + +#### File format +The csv file should contain the following fields and include a header row. + +| Name | Type | Bytes | Description | Example | +|:------------------|--------|--------| :---------------------------------------------------------|------------:| +| quantile | float | 4 | Quantile float | 0.1 | + +All fields must not have null values. + +##### quantiletobin +``` +$ quantiletobin < quantile.csv > quantile.bin +``` + +##### quantiletocsv +``` +$ quantiletocsv < quantile.bin > quantile.csv +``` + +[Return to top](#dataconversioncomponents) + [Go to 4.5 Stream conversion components section](StreamConversionComponents.md) [Back to Contents](Contents.md) diff --git a/docs/pdf/ktools.pdf b/docs/pdf/ktools.pdf new file mode 100644 index 00000000..5593d010 Binary files /dev/null and b/docs/pdf/ktools.pdf differ