import altair as alt
from vega_datasets import data
= alt.UrlData(
movies
data.movies.url,format=alt.DataFormat(parse={"Release_Date":"date"})
)= ['G', 'NC-17', 'PG', 'PG-13', 'R']
ratings = ['Action', 'Adventure', 'Black Comedy', 'Comedy',
genres 'Concert/Performance', 'Documentary', 'Drama', 'Horror', 'Musical',
'Romantic Comedy', 'Thriller/Suspense', 'Western']
= alt.Chart(movies, width=200, height=200).mark_point(filled=True).transform_calculate(
base = "floor(datum.IMDB_Rating)",
Rounded_IMDB_Rating = "datum.Production_Budget > 100000000.0 ? 100 : 10",
Hundred_Million_Production = "year(datum.Release_Date)"
Release_Year
).transform_filter(> 0
alt.datum.IMDB_Rating
).transform_filter(='MPAA_Rating', oneOf=ratings)
alt.FieldOneOfPredicate(field
).encode(=alt.X('Worldwide_Gross:Q', scale=alt.Scale(domain=(100000,10**9), clamp=True)),
x='IMDB_Rating:Q',
y="Title:N"
tooltip
)
# A slider filter
= alt.binding_range(min=1969, max=2018, step=1)
year_slider = alt.selection_single(bind=year_slider, fields=['Release_Year'], name="Release Year_")
slider_selection
= base.add_selection(
filter_year
slider_selection
).transform_filter(
slider_selection="Slider Filtering")
).properties(title
# A dropdown filter
= alt.binding_select(options=genres)
genre_dropdown = alt.selection_single(fields=['Major_Genre'], bind=genre_dropdown, name="Genre")
genre_select
= base.add_selection(
filter_genres
genre_select
).transform_filter(
genre_select="Dropdown Filtering")
).properties(title
#color changing marks
= alt.binding_radio(options=ratings)
rating_radio
= alt.selection_single(fields=['MPAA_Rating'], bind=rating_radio, name="Rating")
rating_select = alt.condition(rating_select,
rating_color_condition 'MPAA_Rating:N', legend=None),
alt.Color('lightgray'))
alt.value(
= base.add_selection(
highlight_ratings
rating_select
).encode(=rating_color_condition
color="Radio Button Highlighting")
).properties(title
# Boolean selection for format changes
= alt.binding_checkbox()
input_checkbox = alt.selection_single(bind=input_checkbox, name="Big Budget Films")
checkbox_selection
= alt.condition(checkbox_selection,
size_checkbox_condition 25),
alt.SizeValue('Hundred_Million_Production:Q')
alt.Size(
)
= base.add_selection(
budget_sizing
checkbox_selection
).encode(=size_checkbox_condition
size="Checkbox Formatting")
).properties(title
| filter_genres) & (highlight_ratings | budget_sizing ) ( filter_year
Communicating code: Notebooks
Using the notebook format to communicate code and data
Notebooks General
Jupyter Notebooks
From TalkPython: Awesome Jupyter Libraries and Extensions
Jupyter is an amazing environment for exploring data and generating executable reports with Python. But there are many external tools, extensions, and libraries to make it so much better and make you more productive.
- A notebook consists of two parts
- markdown part where we can:
- write text, add images, links, html, LaTeX etc
- code part which runs and displays output of code
- markdown part where we can:
Some links:
Example of a notebook
An example notebook
Notebooks Markdown and Code
Markdown in a notebook
Some useful commands:
# Notebooks Markdown and Code
and## Markdown in a notebook
![](ghtop_images/pest.png)
looks like this
- And the same with a mp4 file
![](ghtop_images/revealjs.mp4)
> If we want text like this
If we want text like this
- Or if we want code use `a = b + c`
or:
```
a = b
a = a + c
```
a = b + c
- HTML works too
<img src="ghtop_images/pest.png"></img>
Code in a notebook
Example interactive format using altair:
Simpler code output
x = np.arange(0,np.pi,.01)
y = np.sin(x)
plt.plot(x,y)
import matplotlib.pyplot as plt
import numpy as np
= np.arange(0,np.pi,.01)
x = np.sin(x)
y plt.plot(x,y)
Notebooks Opinion
Notebooks: my view
Although notebooks have their validish detractors I don’t like notebooks.- Joel Grus Youtube I think if you approach them in the right way they are a super powerful tool.
The negatives seem to be:
- encourage bad practice in code (a genuine problem)
- issues around order of what cell is run (easily got around with good practice)
- issues around lack of auto complete (I don’t see the issue, use in visual studio autocomplete is there)
- no grammar/spelling correction
- issues with using git and version control
- there are ways around this though
Notebook Benefits
- Notebooks are intuitive
- You have the code then the result of the code
- Plus can add details of how code works
- And it’s linear
- Can get things up and working quickly
- Aid with communicating code
- Encourages Writing
- and writing things down aids thinking in the now and understanding what you did and why in the future
- FastAI guide for better blogs
- and writing things down aids thinking in the now and understanding what you did and why in the future
- Can use shell commands e.g.
!pip install pandas
- Can use magic commands e.g.
%%time
to time a cell - Easy to convert code to a pipeline
With many companies moving towards Python/R from Excel and a varied level of skills. The first of these is particularly important to aid communicating code
Communicating when code is a large element of what is being presented
- Microsoft Word/ppt- type methods aren’t set-up well to include code
- Programming files (e.g.
.py
) aren’t set-up well to share - Videoing code with outputs is an option, but don’t translate to other formats (i.e. we may also need to do a written format of this)
- Apps (e.g. streamlit) can be good.
- But the code is hidden
- Programming notebooks (e.g.
.ipynb
) offer a good and easy to share code but with some limitations
An easier way is to convert the notebooks to html
- e.g. maybe someone doesn’t have python installed
Example Useage
Example: Documenting Code
Here is my website for my research project on pesticides in UK food.
This is not the same as documentation for a package but there are parallels
This does a few things:
- Documents the analysis steps I have taken including the code and outputs
- Useful for data transparency, useability of the code if needs modifiying/adapting, and why I did XYZ
- Provides a way to present the data
- There is a streamlit app, but sometimes I like to be able to see the code
Example: Tool to aid learning
A big area I have been using Jupyter Notebooks for is to aid learning
- If you want to understand something it helps to write it down
- Having the code next to it is a big advantage
- And if stored on github you can access it anywhere
Example: Debugging Code
- Since starting at ONS I have been working with understanding an existing project and latterly adding code to it
- The project consists of multiple python files across several folders
- My Python was good but lots of the functions and their useage weren’t immediately obvious to me
- break-points in VS Studio is really good to step through the code and work out what happens in the code.
- I had not used before with Python (but had lots with MATLAB), and it’s really useful
- But it can be limited what you can do
- difficult to probe code if want to write more than 1 line of code
- the experience/knowledge exists as you go through it but no documentation to refer to later, e.g. function X does this when I give it Y etc
- By copying and pasting code into Jupyter cells I could see and document how they worked (e.g. changing inputs)
- This (copying and pasting) would get around code changes too (which would be an issue if modules were just imported)
- because this was all done in Jupyter notebook I can have a ipynb code file and a html file showing how the code works
- I could even save a pickle file of the variables at a particularly point to understand how the code would work from this point