Big Data. Bureau of Labor Statistics. Survey data. Employment Big Data.  Those are all things that calculating worklife expectancy for U.S. workers requires.  Worklife expectancy is similar to life expectancy and indicates how long a person can be expected to be active in the workforce over their working life.  The worklife expectancy figure takes into account the anticipated to time out of the market due to unemployment, voluntary leaves, attrition, etc.
Overall the goal of our recent work is to update the Millimet et al (2002) worklife expectancy paper and account for more recent CPS data. Their paper uses data from  the 1992 to 2000 time period. Our goal is to update that paper using data from 2000 to 2013. The main goal of the paper is to see if estimating the Millimet et al (2002) econometric worklife models with more recent data changes the results in the 2002 paper in any substantive way.
In addition we also wanted to supplement and expand on a few additional topics. The additional topics included looking at different definitions of educational attainment,  adding in reported disability, and looking at occupational effects on worklife expectancy.
Our approach is two fold.  First we matched the BLS data cohorts based on the Millimet et al. (2002) and Peracchi and Welch (1995) papers. In a nutshell the CPS matching routine involves matching incoming and outgoing cohorts across a given year.  Once the data is matched, we then look at the work status of the individuals to determine if they were active or in active across the year that they were interviewed by the BLS.
Using this matched data we next replicated the work of Millimet et al. (2002)  using the 1992 to 2000 CPS data as they did in their paper. In general the Millimet et al. (2002) econometric model uses a standard logistic regression framework to estimate transitional probabilities based on a two state labor market  framework where a person is either active or in active in the workforce.
The methodology begins by estimating logistic regression using individuals who were active when first interviewed.  Independent variables such as the occupation, gender, marital status and number of children are included in the logistic regression. A separate regression is estimated for individuals who are inactive at the start of the BLS interview.  Separate active and inactive regressions are also estimated for certain factors of interest, such as education attainment level and reported disability status.
The logistic regression equations provide the probabilities that are conditional on the labor force attachment of the individual at the time of the interview. The conditional probabilities yield the transitional probabilities for initially active or in active individuals. For example, a person who is active at the start of a period could be either active or inactive in the next period.  The transitional probabilities obtained from the logistic regression is used to calculate the probability that a person who is active at the start of a period could be either active or inactive in the next period in this example.
As described in the Millimet et al (2002) paper, the expected work life for each age is obtained recursively by working backwards from an assumed terminal year (T+ 1).  The terminal year is the year in which after no one is assumed to be active.  In the analysis a terminal age of 80 or 85 is used.
Using the model we began by replicating the Millimet et al. (2002) econometric model.  After we replicated the model, we then performed some additional work and expanded logistic regression worklife equations.  The results of our estimation are shown in the tables that are attached.
As for the results, overall there are several findings. First we were able to create a match CPS data set of 201,797 individuals where as the Millimet et al. (2002) found 200,916 matched individuals.
Overall we match their results very closely as well.  For example Millimet et al. (2002) found that a male who was 26 years old with a less than a high school education had a 27.27 years WLE remaining while we found that person had 26.319 years remaining during our replication. They found that the same age person with a high school had 32.89 years remaining while we found 32.728 years remaining. The replication was particularly good for both less than high school and high school levels of educational attainment.
The WLE  numbers are close but not quite as close for college and some college. This is primarily due to the fact that we use different definitions of some college and college then Millimet et al. (2002)  did in their 2002 paper.
Overall, the worklife expectancy estimated using more recent data from 2000-2013 is shorter then in the earlier time period (1992-2000) data set. This is true for younger worker (18-early 40’s); younger workers from the more recent cohorts have a shorter expected work life then younger workers in the earlier cohorts.  Conversely, while older workers in their 40s and 50s have a slightly longer worklife expectancy in the later time period data set. We are in the process of determining the statistical significance of these differences.
We also looked at the worklife expectancy for individuals with and without a reported disability. Disability was not covered in the Millimet et al. (2002) paper. As has been well reported, the disability measure in the BLS data is very general in nature. Accordingly the applicability of the BLS disability measure to litigation is somewhat limited. However it is interesting to note that there is a substantial reduction in worklife expectancy exhibited by individuals who reported have a disability. On average the difference is about 10 years of work life. This is consistent with other studies on disability that a relied on the BLS data. Other factors such as occupation and geographical region do not appear to have much impact on WLE estimates.