Home Practicum 1 Practicum 2 Practicum 3 Practicum 4 Practicum 5 Practicum 6 Practicum 7 Practicum 8 Practicum 9 Practicum 10

Jupyter notebook

Jupyter Notebook is a tool which you will use to deliver your final project. It allows writing and executing python code, as well as inserting rich text and images in the same document. When we're grading your final projects we will (hopefully) be able to run the code you wrote as we read the report.

Getting started

  1. Jupyter Notebook came with your Anaconda python distribution. To start it, go to the terminal (command line) and type jupyter notebook.
  2. Your Internet browser should now have opened with the list of files in the folder from which you ran the command. Use it to navigate to your DS2001.
  3. Create a new Python3 notebook using the menu in the top right. You should now see something like this:
  4. Change the name of the notebook from Untitled to Practicum 10.
  5. Verify that everything works by typing print('hello') into the cell. To run a cell, press Shift+Enter.
  6. You can see the code executed and you were moved to a new cell. You call still go back to previous cells, add more code, or rerun them whenever necessary.
  7. Make sure you save your changes by pressing Ctrl+S (Cmd+S on mac)!
  8. To quit the notebook (don't do it now!) go to the terminal where you started it, press Ctrl+C (both Windows and Mac), then type y and press Enter.

Crash course on markdown

Text can be formated in the notebook using what's called Markdown. Here are a few basic things you'll need to know:

  1. To start using markdown, change the type of the cell from Code to Markdown
  2. Use # hash symbols to create headings, the more hashes the lower the level of the heading. Remember to put a space between the hash and the actual heading:
  3. After you're done editing a markdown cell, run it by pressing Shift+Enter.
  4. You can write equations in the TeX format by putting the math in between $ dollar signs, like that: , even in the middle of the sentece. $P(E) = {n \choose k} p^k (1-p)^{ n-k}$
  5. You can link to an external webpage like this: [What the user will see in square brackets](https://example.com/)
  6. You can create bullet point lists by starting the line with the * asterisk character.

Excercise 1: Markdown

Create a markdown cell in your notebook. In that cell:

  1. Make a level-1 heading that says "Exercise 1"
  2. As bullet points write out:
    • the formula for the Pythagorean theorem;
    • the formula for the gravitational force ()
  3. Create a link to this instruction.
  4. Save the notebook by pressing (Ctrl+S or Cmd+S on a Mac).

Crash course on data visualization

In this part we will learn basic types of plots: line plot, histogram, and scatter. We will need to import the plotting library, matplotlib.
  1. Create a new cell of type Code (not Markdown).
  2. In that cell paste the following code:
    import matplotlib.pyplot as plt
    %matplotlib inline
  3. The first line imports the plotting library and gives it an easy to use alias plt. The second one instructs the jupyter notebook to place the figures directly in the notebook (not in a new window as is by default).
  4. Run this cell and go to a new one.
  5. Now we will need some data to plot. Let's use these three lists of min, average, and max temperatures over one month in Boston (copy and paste them into your new cell):
    lo_temp = [10,14,27,35,40,33,35,38,25,24,26,19,33,31,31,31,24,21,18,21,26,36,30,34,30,20,13]
    hi_temp = [23,35,39,63,65,44,42,57,36,37,37,38,43,40,52,49,34,32,34,31,48,46,38,47,42,32,26]
    avg_temp = [17,25,33,49,53,39,39,48,31,31,32,29,38,36,42,40,29,27,26,26,37,41,34,41,36,26,20]
    days = range(1, len(avg_temp)+1)
  6. Run that cell and go to a new one.
  7. Let's plot the average temperature for each day. Run this code:
    plt.plot(days, avg_temp)
  8. As you see the first argument to this function is what will be on the x-axis, the second is what will be on the y-axis. We know this because we wrote the code, but remember ALWAYS describe your axes!
  9. Let's modify the code above to label the axes:
    plt.plot(days, avg_temp)
    plt.xlabel('Day of the month')
    plt.ylabel('Temperature')
  10. We can have more lines in one plot by calling plt.plot(x, y) more than one time in the same cell. Let's add the lows and the highs:
    plt.plot(days, avg_temp)
    plt.plot(days, lo_temp)
    plt.plot(days, hi_temp)
    plt.xlabel('Day of the month')
    plt.ylabel('Temperature')
  11. Our plot looks more interesting but also more confusing - a reader might not understand what the three lines stand for. We need to label them and show the legend, like this:
    plt.plot(days, avg_temp, label='avg')
    plt.plot(days, lo_temp, label='low')
    plt.plot(days, hi_temp, label='hi')
    plt.xlabel('Day of the month')
    plt.ylabel('Temperature')
    plt.legend()

Excercise 2: Changing the line colors

  1. Use this webpage to learn how to change the color of the line plots.
  2. Configure your lines so that average temperature is in black, lows are blue, and the highs are red.

Excercise 3: Histograms

We use histograms to learn about the distribution of the data. Create a new code cell in your notebook. Then:

  1. Use plt.hist(data) (histogram does not take x, just the data) to create a histogram of daily average temperatures.
  2. Make sure you label the axes correctly!
  3. plt.hist() by default groups the data into 10 bins. You can change that by specifying the optional parameter like this:plt.hist(data, bins=11). Experiment with the number of bins and find a number that you think works best: it should be large enough to show the variety of temperatures, but not too large: when every day is in it's own bin we don't really learn anything from the histogram!

Excercise 4: Scatter plots

We can visualize that two variables are correlated (or not) using a scatter plot. Have a look at the example scatter plots of correlated and uncorrelated variables here.

Part #1

Let's find out if daily lows and highs are correlated. Create a new code cell in your notebook. Then:

  1. Use plt.scatter(variable1, variable2) to find out whether daily lows and highs are correlated.
  2. Make sure you label the axes correctly!

Part #2

OK. That was pretty obvious. What's more interesting is finding out if we can predict the temperature of tomorrow knowing the temperature today. To do that, create a new cell and:

  1. Use plt.scatter(variable1, variable2) in a way where variable1 and variable2 both represent the daily average temperatures, but variable2 is offset by a day. That means variable1 will be the data for all the days apart from the last, and variable2 is the data for all the days apart from the first.
  2. Make sure you label the axes correctly!