"An Australian Research Data Commons report recently identified that over 47% of research publications involve software development. At least this percentage and maybe more PhD students will likely be involved in developing computer code for their thesis."
Ensuring PhD students submit their thesis on time is a never-ending challenge. Investing in tools and training to help PhD students stay on track lifts research income and important organisational research outputs, such as publications.
Writing computer code is a growing requirement for completing a PhD. While research higher degree students can take courses in statistics, data analysis, or how to overcome writer's block, there are very few courses focussed on research higher degree students writing and managing good code.
Good coding practices are integral to software development companies. They know the importance of productivity, efficiency, teamwork, testing and reliability, quality assurance and delivering high-quality outputs. I would argue that all of these attributes are essential for a PhD student to successfully complete a thesis on time and to build habits they take forward in their research career.
Imagine if when a new research higher degree student started their PhD, they were provided with training and a framework before they began collecting data and writing code to generate models and analyse data. What if this framework was built around good software development practices targeting the critical elements of efficiency, productivity, teamwork and quality assurance? This might help students with their computer coding and create a broader framework that helps ensure they submit their thesis on time. What are some of the things we can do to support research students as they start to develop research computer code? Here are a few tips:
Many supervisors and research higher degree students don't appreciate how valuable computer code is. Often, students will download open-source software and use a free integrated development environment (IDE) to write and run their code locally on their desktop or laptop. The output is not the code but usually some analysis or a graph that gets exported to a word processing package to go in a paper or the thesis. As supervisors and collaborators provide feedback and ask for fresh analysis, the code gets copied, and new files are generated.
Unlike the thesis, which is valued as it evolves towards a final submission, the code is seen as a temporary asset that delivers outputs and has no long-term value.
Journals increasingly ask authors to include declarations on how and what code, such as generative AI, is used in a submission. This request recognises that the code itself is an essential component of research and is a valuable asset. Encouraging students to think of code as an asset will help them adopt good coding habits.
There are plenty of courses on statistics and using programming languages such as R and Python for research (we think Joachim and his team at Statistics Globe do an excellent job!), but writing the code is only one part of the equation!
Students need to develop broader software development skills linked to areas such as:
These skills will not only help them in their academic careers, but they are also highly sought after by companies.
Research higher degree students need to see the benefits of better computer code management. Encouraging a student to submit a journal paper that includes the code behind the research article will demonstrate the importance and value of developing an open code framework.
Being open to scrutiny will build confidence and help students prepare their computer code as part of their thesis. Ultimately, students should provide examiners access to their code as an API, a link to a repository, or an appendix (and preferably all 3!)
Developing computer code to solve complex problems is not linear and often requires code refactoring and change. Without a predefined framework and clear work practices, writing code becomes disorganised even after the best initial intentions.
When students present a research plan, they should include details on managing their computer code. What coding language do they plan to use? How will they manage version control? How will they ensure modules of code can be run independently but still be integrated as part of an overall thesis? Encourage students to use notebooks and markdown syntax to help with explanations and code reviews.
Code that forms part of a PhD is intrinsically inward-looking. In other words, it is focused on satisfying the student's thesis. This is fine, but the student misses the opportunity to use their computer coding to get feedback. This feedback can be based on the code's performance and broader value. This value could be for the broader research community and non-research users, such as businesses.
This doesn't mean the student has to share their code, although sometimes this can be helpful. However, developing packages or API connections allows users to access the insights without accessing the direct code.
The quality of the computer code is reflected in the quality of the outputs. Feedback from users can help refine the code and enhance the insights. Most importantly, this feedback allows an RHD student to deliver robust insights more efficiently in their final thesis.