Introduction


There is a common misconception that writing code for data science involves sitting in a dark room and effortlessly typing Matrix-esque syntax onto a vast array of monitor screens. In reality coding is generally a slow, non-linear, often frustrating and iterative process involving back tracking, second guessing, Google searching, posting questions on Stack Overflow and (most commonly in my experience) copy/pasting and modifying old code that you or someone else has previously written.

This post is a running log of Python syntax that I constantly find myself referring back to and that has proven to be a useful reference for my team and other colleagues.


Implementing in Python


Link to my GitHub repo: Useful-Python-Syntax

Some of the syntax examples covered are:

  • Scheduling Python functions
  • Return vs. Print
  • Loops, % and .format syntax
  • The 4 inbuilt data structures of Python
  • if __name__ == '__main__'
  • Random seeds
  • Subsetting a DataFrame
  • Importing multiple files with glob
  • Working with large data 1: Chunking
  • Working with large data 2: Random Sampling
  • Scaling data
  • Using an API in Python
  • Lambda functions
  • Creating DataFrames in a loop
  • Plotting with Matplotlib