Appunti lezioni
Di cosa parla
- Core Objective: Understand and model stochastic mechanisms generating insurance data (e.g., number/cost of accidents) to facilitate ex-ante risk assessment and premium computation.
- Data Analysis Process: Utilizes past information from databases, computes descriptive statistics, visualizes data through histograms, and makes hypotheses about underlying distributions.
- Key Distributions:
- Poisson Distribution: Ideal for modeling the number of accidents, characterized by a single parameter (lambda) representing both mean and variance, under assumptions of constant, independent, and rare occurrences.
- Lognormal Distribution: Suited for describing the costs of accidents, derived from a normally distributed random variable.
- Gamma Distribution: Another flexible distribution used in insurance with shape and scale parameters.
- Gaussian distribution is generally avoided due to its unlimited domain and symmetry, which may not fit insurance losses.
- Parameter Estimation: The Likelihood Principle is the primary method to identify unknown distribution parameters by maximizing the probability of observing the collected data (Maximum Likelihood Estimation).
- Model Validation and Simulation:
- After hypothesizing a distribution, diagnostic checks (e.g., comparing mean and variance for Poisson, Q-plots) validate its fit to empirical data.
- Once validated, the stochastic process can be used to simulate future events, providing insights for scenario planning.
- Premium Calculation:
- Crucial for an insurance company's financial stability.
- The Fair Premium is typically the expected value of losses, E(X).
- Modern approaches involve Personalization of Premium, using customer profiles and covariates to precisely assess individual risk.
- Random Number Generation: Utilizes the inverse transform sampling method, where random numbers from a Uniform(0,1) distribution are transformed using the inverse of a target distribution's CDF to simulate outcomes (e.g., future accidents).
- Linear Models:
- Simple Linear Model (Y = a + bX + ε): Explores relationships between a dependent variable (Y) and a single independent variable (X), accounting for randomness (ε). Parameters (a, b) are estimated using Ordinary Least Squares (OLS).
- Multiple Linear Model (Y = Xβ + ε): Extends to multiple explanatory variables using matrix notation, where X includes a column of ones for the intercept.
- Fit vs. Predict: "Fitting" involves using known data to model existing risks, while "Predicting" uses a trained model on new data to forecast behavior or risk for new customers.
- Central Limit Theorem (CLT): Essential for understanding portfolio-level risk, stating that the distribution of sample means for a large number of independent risks tends towards a normal distribution, regardless of the individual risk distributions.
- Dummy Variables: Transforms categorical data (e.g., smoker/non-smoker) into binary (0/1) numerical representations, enabling their inclusion in regression models as "factors" or "levels."