The Unescapable Chores

I spend probably a third to a half of my statistical analysis time performing data cleaning and normalization.  My experience of this type of work ranges from dreary repetitiveness to zen-like absorption.

By data cleaning, I mean the tasks of formatting or reformatting the data to allow statistical analysis, identifying and fixing incorrect data, and merging or subsetting data as needed.

By normalization I mean the tasks of setting data to standard units, identifying useful or necessary categories, and changing variable names or data to standard values.  It also includes setting up the correct data structure and file formats.

Leave a Reply