Code review
Reviewing code is at least as important as reviewing writing. Just as with writing, having someone else review your work and (potentially) provide extensive feedback can induce anxiety (for more on this topic, see the Code Review Anxiety Workbook). The first draft of code is almost never free of logical errors, unintended behavior, or extraneous code. More code review and check points throughout the code writing process will slow you down but will save you time in the long run. Since code builds on itself, small errors early on code can build on themselves to create inaccuracies or errors that are much harder to debug later. Thoroughly reviewing your own code and having someone review your code is important for both the reliability of the code and as a reproducibility test.
Self code review
Step away from the code for a bit. You should not complete your review immediately after writing it because what you were trying to do will still be fresh in your head. With a little distance, you will have to rely more on the comments and code logic in the script (just as another person or your future self would have to).
Revisit the overall folder and code structure. - Make sure that any unnecessary files are moved to untracked folders. - Ensure all raw data files are included in the data folder. - Make sure scripts are clearly numbered and named so users know the order to run files in. - Add a README file to provide an overview of the code, including a brief description of the files and the ordering/logic to them.
Clear your environment and source or knit the file(s) in the intended order. This is a test of whether you have a clean run that the code reviewer should be able to replicate.
Review any errors or warnings. For errors, these must be resolved for the code to fully execute. For warnings, step through the code to review them one-by-one to confirm that they are not indicative of a problem. For example, if it warns there is a many-to-many match in a join, think carefully about whether this should be the case. If you are 100% sure you can suppress the warning and add a comment explaining your logic.
Check that you have appropriate tests throughout your code. What tests are appropriate depend on your specific analysis, but some common checks to build in include:
- Break apart a long pipe sequence to check for NAs following joins. When joining data, consider whether the join is one-to-one, one-to-many, many-to-on, or many-to-many. Based on the join type, determine whether rows should be added. If they should not, then check the number of rows before and after a join to confirm no rows were added.
- If there should only be one row per combination of variables, group by those variables and tally them to see the count for each group.
- Compare data frames that should be equivalent or should sum to the same values by creating a test data frame that is a full join, then calculating the absolute value of the difference and filtering to any observations with a difference greater than your chosen threshold. Inspect any differences carefully and document if it is reasonable for them to be different.
- If using proportions, confirm they still sum back to one with the appropriate groupings after other operations.
- If a variable should not be negative or exceed a certain value, create a test data frame that filters observations based on that threshold.
- If there are any known or reasonable values for a particular data point or summary statistic, generate that to ensure the data still looks realistic. It is always helpful to step back and use basic number sense to consider whether the numbers you are generating are realistic.
- Read through the code carefully to:
- Check for any outstanding notes/reminders to yourself (sometimes marked as FIXIT or similar).
- Add detailed comments throughout to ensure the logic of each line code is obvious.
- Remove any chunks of code that are not actually used. Extra code just creates more risk of causing errors or confusing the reviewer (or yourself) down the road.
After you have finished preparing your code for review, be sure to do a final full run of the code with a clean environment to ensure that it is still a complete package before handing it off to the code reviewer. Then, make sure your reviewer has access to the full set of code (either providing them GitHub access or sending a zip folder of the entire project) and necessary data.
Note that it is increasingly common to provide the data and code with a paper submission for peer review. Following the above steps throughout your development process will help you prepare your code for submission.
Reviewing other people’s code
Reviewing others’ code is both an important service and a learning opportunity as you may discover new useful functions or learn to see a problem in a different way. When reviewing other people’s code, here are some potentially helpful steps:
Check for a README and be sure to review it first.
Execute or knit the entire script(s)/file(s) in their intended order. This will allow you to immediately see whether all the pieces are there for you to conduct a full review. If the code does not run, contact the author with the line number and error so they can resolve it. If the code does run, then move onto the next steps.
Conduct a high-level reading of the code. You might not even run the code at this point, but read through the comments and skim the code to get a general picture of the logic and steps. If you are confused about any of the general objectives or logic of the code, clarify with the author before moving on. Once you feel you have a clear idea about what the code is trying to do, move onto the next step.
Clear your environment and start from the top, running the code line-by-line. At each line, consider whether the step is both clear and logical.
As you move through the code, consider whether there are any checks that should be added that are not already present. Refer to the section above for examples of common tests.
If you discover points that need addressed or steps that are unclear, either flag them in notes to the author (with line numbers) or include as a comment or edit in the code itself if it is a collaborative coding project. If added as comments, it is generally helpful to include a standard string like FIXIT so the author can search for all instances. If you are using GitHub, the edits will also be clear in the commit.
If you made edits to the script, make sure the code still runs (i.e., make sure you didn’t break anything) before passing it back to the original author. Also include a summary of the changes and any feedback the author might find valuable in the future.