Read More about Transformations

For a nice discussion from the statistical point of view of the logarithmic transformation and how to interpret it, check out these topics on Cross Validated:

For some additional discussion about the case where you have zeroes:

For some more information about the use of the Box-Cox transformation and the aftermath, check out these topics on Cross Validated:

And, for some general discussion about transformations, see:

The online Engineering Statistics Handbook discusses some of the operational details of finding data transformations in Section 4.6.3.3 Transformations to Improve Fit.

Responding to Reviewer Comments and Criticism

I recommend the following course of action:

First, collect all the reviewer comments into a single Word document.  You may organize them in any way that seems reasonable, but I usually keep them organized by reviewer.  So, create a separate heading for each reviewer in the document.  I strongly suggest that you do this by cutting and pasting the exact text from the reviewer comments.

Next, edit the reviewer comments to keep only the exact text that needs to be addressed.

Reformat everything to get rid of bold, italics, underlining, and the like.

Title this document “Reviewer Response YYYY-MM-DD”.

Next:

1. Identify the changes in the statistical analysis that need to be  made first.  This is essentially an addendum to the Statistical Analysis Plan.

2. Identify the changes in the Discussion that need to be made.  Review  all of these changes in light of the changes in the Results that  may have occurred.

3. Identify the other types of changes (editing, wordsmithing, general carping or cavilling).

Now you are ready to re-run the statistical analysis having made the necessary changes in methodology.  Of course, you will keep this completely separate from your previous work, to ensure that you can always trace back your work.

Finally, you are ready to make changes in the manuscript.  Before starting this process, get your Reviewer Response document ready.

1. Make changes in the Results section as necessary.

2. Then, make changes in the Discussion section accordingly.  These can be done in parallel if that is easier, such as when you have many sub-sections in the Results.

3. Make changes in the Statistical Methods section as needed.

4. Make editorial changes that have been indicated.

As you make these changes, indicate in the Reviewer Response  document under each item what you have done to fix things.  I usually  format the reviewer comment in italics and my response as normal  text.  So an example would look like this:

Reviewer A:

p4 l5 You misspelled the word “thorough” as “through”.

This was corrected.

p6 l20 It is standard practice to include the p-value for this test.

The p-value for the test was included.  This was also changed  in the other sections of the manuscript.

If the reviewers identified some global or larger issues, you may want to break those out separately in an initial section.

Finally, write an introduction to the Reviewer Response.

Outcome of Statistical Consultation

What outcome do you want from your consultation?  This depends a bit on where you are in the process.

In the design stage, you want an efficient design in terms of time and money that takes subject knowledge into account and that will answer your research question with a high degree of probability.

If you are planning the design, here is what you want to get as an outcome from the statistical consultation or perhaps over several consultations:

  • A design that addresses your research hypotheses.
  • Alternative designs that also address your research hypotheses.
  • An outline of how to carry out the design.
  • The basic outline of the statistical analysis that will be performed  on the data.

If you are at the end, you essentially want a statistical analysis plan.

  • Statistical analysis plan.
  • Implementation details such as the packages to be used or the  programming language to be used.

You might also want help performing some specific statistical analysis or several statistical analyses.  This could take the form of coaching to perform the analysis yourself, giving you some code snippets so you can do the coding yourself, or even carrying out a limited bit of analysis that you can then plan to extend or repeat cookie-cutter fashion.

If you are being coached, you probably want the statistical analysis exactly specified so you can do the work correctly.

Or, you may want the statistician to actually perform the statistical analysis for  you.  Then, you should plan to have:

  • An estimate of the billable time as well as the calendar time that  will be needed.
  • An agreement on what communication needs to take place and when,  such as handing off finalized data sets.
  • The actual statistical analysis output result.
  • Ideally, the code that was used to produce the analysis.
  • A copy of the data that were actually analyzed.
  • An outline or synopsis of the results at a minimum.
  • Some sample verbiage for the Statistical Methods section, the  Results section, and potentially the Discussion section.

Consulting with a Statistician After Your Study is Done

All right.  Well, to be honest, this is when most people come to the statistician, because they are now having trouble due to their lack of planning.

And, this is the hardest time to help you.  After all, nothing can be changed.  You already designed the study and collected the data.  In the worst case, the data you collected will be worthless for actually answering your research question!

Now is not the time to be coy.  You need to lay all your cards on the table so that you can get effective help to move forward as efficiently as possible.

So, you need to provide the following to the statistician:

  • At least minimal background on the subject of the study.
  • A clear explanation of your research hypotheses.
  • A concise summary of the design of the study.
  • A description of how the study actually went.
  • A way of looking at the data structure.

The last point in some ways is most important.    This can often be accomplished by simply bringing the data along for display on a laptop or having a printout of the data.  In cases where there is too much data to print, it could still be useful to print out at least a page or two of data.

For a statistician, looking at data is pleasurable, perhaps even more so than talking with you, so do not worry about boring anyone.   More seriously, though, since the data are where the rubber meets the road for statisticians, having the ability to review the data structure is going to enable a much more focused and useful conversation to take place.

Information Flow

It will help you to understand the overall flow of information for your study.  Most people concentrate only on one small part, the statistical test, and forget the rest.

Starting with the physical phenomena that are being studied, we have to use some sort of measurement tool to quantify the aspects we want to study.  Then, some sort of data collection tool is used to capture the information.

Then, we create a master data set or sets from the data collection tools that will be frozen.  “Frozen” means that you will not ever edit those data again.  Ever.

If any data changes need to be made, they will be made in an analysis data set that is derived from the master data set.  Ideally, this will be done using some sort of automated system or programming steps, so that all changes from the master data set are documented.

If the analysis data set is created by hand, then you need to make notes of what changes were made.  I suggest including a text document with the analysis data set.  At some point, you are going to want to freeze the analysis data set so that you can follow the principle of analyzing only one data set.

Next, the data are manipulated using either a statistical package, a statistical language, or some other software  in order to produce statistical analyses, listings, tables, and figures.  Note that you may need to create further derived data sets in these tools in order to accomplish these tasks.

Again, if any manual steps are performed here, they need to be documented.

Finally, the statistical analyses, listings, tables, and figures are used to write the thesis or dissertation, to create presentations, or to write for publication.

TGSGTS Progress Report

At this point, The Graduate Student’s Guide to Statistics is at about the 31% completion mark.  The primary structure of the book is laid out, and perhaps a third of the chapters have useful content.  For the next little while, I’ll be posting excerpts from the book in development.

At some point, it will probably make sense to start offering the book for sale, even in an unfinished state.  That should produce several good results:

  • The material that is of use will be available to actually be used by you.
  • You can give me feedback to help improve my writing and content.
  • You can help me determine which topics need more work.

Right now, I am thinking that the 50% mark would probably be a good place to start that process.  In the meantime, using this blog as a platform should also help accomplish these goals!

The Principle of Analyzing Only One Data Source

Do not violate this principal.  Violations of this principal are a number one cause of a wide variety of problems, but at a minimum can almost be guaranteed to waste your time, effort, and money, if not your credibility.

Usually, violations of this principal occur due to either poor planning or to a perceived need for haste (which is probably a symptom of poor planning).  It can also happen when more than one person is working on analyzing the same data.  It can also be “Just the way things are done around here.”

Here is what we especially do not want:  Different analysis data sets floating around with different statistical analyses attached to them.  We have no way of knowing whether the numbers in different analysis data sets are different because the analyses are different or because the data sets underneath the analyses are different.  We may not even know which data are correct!

A typical way this happens is as follows:  You start by creating some simple summary statistical tables by hand using Excel.  Since you need a variety of different summary statistics, you copy the data into a worksheet and have at it.  A while later, you have manually selected a lot of data, created summary statistics, and manually copied the numbers over to a nice little table.

Later, you start working on some statistical analysis, and notice some outlier points.  You decide after reviewing the data that a couple of the points were entered wrong.  Good catch!  You fix those data in your statistical analysis program, and keep on working.  But, maybe you remember to change all of the numbers in the Excel summary table and maybe you do not.

Now you have two different data sets floating around.  Multiply this by the other software that you might be using, where you make changes like aggregating categories, or creating filters on the data that will be analyzed.

Here are some symptoms of not analyzing only one data source:

  • You have multiple different copies of your analysis data in different formats.
  • Your figures do not match your tables.  Or, neither your figures nor your tables match your statistical results.
  • Your numbers are slightly off when you decide to cross-check some simple stuff like sample size or means between programs.
  • You start to wonder where you got a particular number.
  • You cannot figure out where you got a particular number.

The Absolute Goal

The absolute goal is to graduate and get the degree.

Practically, this means that your job is to get the necessary signatures from your thesis or dissertation committee so you can get your degree, graduate, and move on with life!  Also, you want to graduate in the shortest possible time frame.

Trust me, you can probably do things differently and save a lot of time and pain on the way to getting your thesis or dissertation approved.  It is too easy to become comfortable with a lack of progress and end up taking a year longer than necessary.

Some Problem Personality Types

With enough experience, you start to see the same problems cropping up.  Although this list and the descriptions may change after further reflection, here are some problem personality types that seem to have difficulty getting to the end of graduate school:

The perfectionists, who think that everything has to be black and white and perfect before they graduate.  Nope.  These are the absolute worst to deal with, they always have a reason to not graduate.

The idealists, who think that their work has to be ground-breaking before they graduate.  Nope.  Not everyone gets to be that lucky.  If you are, awesome; if not, save it for post-graduate work.

The procrastinators, who seem to think that the work is going to be too hard and that graduation will happen simply by “hanging in there” and keeping warm and breathing. Nope.  You have to get off your duff and start performing the tasks.

The starry-eyed students, who are now thoroughly disillusioned with academia and/or their mentors and advisors and academia in general, and are ready to give up after four or more years of work.  Nope.  It is your life, of course, but you will probably have more options if you finish the degree out.