Methodology & Data

The raw data are given by 1-minute prices (not adjusted for splits and dividends) for the constituents of the Dow Jones Industrial Average Index. We update the available stocks to match changes in the index composition, providing the also the excluded stocks for comparability reasons. Ideally, the updates of the stocks available in the CaPiRe dataset is made at least twice per year.

Equally spaced 1-minute prices (previous price used for minutes without trading - if no trading occurs at the opening, 9:00, the first open price of the day is used for the missing intervals) are converted into 1-minute returns using close-to-close returns apart for the first 1-minute interval of the day where we use open-to-close returns. We make use of log-returns, and we consider data from 9:00 to 15:59 (starting time of the 1-minutes intervals) for a daily total of 390 intervals.

The following variables are available for each asset:
- Realized Variance
- Realized Quarticity
- Bipower Variation
- BNS test statistic
- Good Variance
- Bad Variance


We use the baseline estimator for the Realized Variance and the Realized Quarticity, while for the Bipower Variation we follow Barndorf-Nielsen and Shephard (2004). Good and Bad variance follow from Patton and Sheppard (2015), while the BNS test statistic for jumps is computed according to Huang and Tauchen (2005). All quantities are computed both with 1-minute data as well as with 5-minutes data. Aggregation of 1-minute returns is used to produce 5-minutes returns.

From the December 2024 data release, the RV estimates based on sub-sampling are made available (Ait-Sahalia and Jacod, 2014). In this case, 1-minute data is used with a sampling every 5-minutes.

In addition, again from the December 2024 release, the daily Realized Covariance matrices are added to the dataset. These are based on 1-minute data and with both the baseline estimator and sub-sampling (similar to the RV case). The RCOV are based on the subset of assets available for the entire sample period.