Sven Kreiss ←Articles

Python at a University Lab

I am a post-doc at VITA lab at EPFL university.

Good software engineering practices are important to me. They are foundational for open and reproducible science. You — a current or future member of a Machine Learning lab — are not necessarily expected to be a computer scientist or software engineer, but the tools must be familiar. Expert-level familiar. You will use these tools most of the day. You should not be clumsy with or afraid of these tools. Below is an ambitious list, so don’t worry if you are not familiar with most of it yet.

Python Core: The best way to get started and a great resource for advanced users is the tutorial on the official Python webpage. Here are my notes more specific to ML.

Packaging: As soon as you need to re-use code in a second file, you will want to import one of your own files. You should use relative imports and to use those properly you need to package your code. Packaging in Python used to be a mess, but much progress has been made over recent years. When you search for help related to packaging, filter for results within the last year.

Style / Unit Tests / Continuous Testing: Standard practices for software engineers are useful for ML projects, too. They help. Beyond testing code, unit tests in combination with continuous integration give a robust and reproducible starting point for anyone picking up your project (including yourself in one year).

  • PEP8 is the Style Guide for Python and contains explanations. Don’t skip A Foolish Consistency is the Hobgoblin of Little Minds. Still, I follow almost all of PEP8 to the letter. Additional:
    • no abbreviations for variables, classes, functions, etc.
    • do not iterate over indices in Python: don’t do for i in range(len(mylist))
  • pylint generally provides good advice beyond checking for PEP8
  • pytest to run all your tests
  • CircleCI and TravisCI can automatically run your tests on every commit to git

Red Flags: When browsing open source code, this is when I get worried.

  • no eval metrics/scripts for that implementation
  • have to change the PATH or PYTHONPATH variables
  • have to copy or symlink folders with code

Closing: Infrastructures at large companies are different from a university lab and your laptop: containers, distributed storage, build systems, custom DNS, mono repositories. A small piece in the middle of the stack is open source and has been duct-taped to make it work without the rest of the infrastructure. Don’t follow blindly what seems like a standard practice of the best software engineers in the world. It might be an artifact of porting it to the open source world.

Feedback is welcome on Twitter @svenkreiss.

Related Posts

Go Top