Coding Style Guide
R
DIRECTORIES
- If data and output folders are located within the project folder, set their directories at the top of the code and do this dynamically using
getwd()
+file.path()
- If these folders are located outside the project folder (e.g., they are too big to live on GitHub), then just be sure to provide the file path at the top
DOCUMENTATION
- Top of code should describe what the code does
- Use a space after
#
(Mac:Command + Shift + C
; Windows:Ctrl + Shift + C
) - For “internal” or “personal” functions (e.g., not part of a package that will be released by the lab), include name and email at the top – just good practice and assigns “authorship” to you (when you share code with people they know who to contact with questions)
- While code is in development, use
# FIX IT
(followed by additional notes) to flag areas of the code that need more attention - It’s OK to break up long sections of piped (
%>%
) code with documentation (e.g., creation ofoutput_df
incommod_commod_matching.R
)
CODE
- Assignment arrows should point left, not right
- Pay attention to indentation (convention is two spaces), use package
styler
(can be accessed as an Add-In in RStudio) and use default “tidyverse style” to clean indentation - Use spaces around
=
,<-
, and%>%
Example
Avoid:
%>%dplyr::summarize(max_petal=max(Petal.Width))->results iris
Prefer:
<- iris %>%
Results ::summarize(max_petal = max(Petal.Width)) dplyr
NAMING OBJECTS
- All lower case
- Prefer underscores (
seafood_species
) over camel case (seafoodSpecies
) for all object names - For data frame columns, can use
.
or_
as a sep in column names, just be consistent - Function names should begin with a verb (e.g.,
clean_cols
,tidy_data
,calc_distance
), be concise, and use underscores to separate words - Object/Variable names should be nouns and descriptive (e.g.,
df_clean
orclean_df
)
Example:
Avoid: buildFishMatrix
Prefer: build_fish_matrix
Avoid: naming objects with different capitalization (e.g., HS_codes
, hs_Codes
, hsCodes
)
Prefer: hs_codes_raw
, hs_codes_filtered
, hs_codes_cleaned
, etc.
NAMING OUTPUT FILES
- If outputs are going to be different with each run (e.g., running different analyses), start filename of outputs with a date
- Follow international date convention (
YYYY-MM-DD
) (Can useSys.Date()
for this) - Use delimiters with intention (e.g.,
_
vs-
) so they are machine readable - Use
-
for spaces and_
for separating metadata
Example:
Prefer: YYYY-MM-DD_name-of-file_i_alpha-4-cutoff.csv
Avoid: NameOfFileResults-DD-MM-YYYY-i_alpha_4_cutoff.csv
PACKAGES
Each .R file in the R folder should be a separate function