talks/2018-pyconfr-emergence.md

369 lines
6.6 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# The emergence of consensus in Python
<!-- .slide: data-background="static/background.jpg" -->
<br/>
<b>Julien Palard</b>
<tt>PyCon Fr 2018</tt>
----
There should be one
-- and preferably only one --
obvious way to do it. (Tim Peters)
## The emergence of consensus in Python
This is a study about undocumented consensus in the Python community,
so you don't have to do it.
# Julien Palard
- Python documentation translator
- Teaching Python at
- Sup'Internet
- CRI-Paris
- Makina Corpus
-
- julien@python.org, @sizeof, https://mdk.fr
- Yes I write Python sometimes too…
## Julien Palard
![](static/emergence-language-switcher.png)
# Digression
In one year, we went from 25.7% translated to 30% translated!
While japanese translation is at 79.2% and korean 14.6%.
Notes: Want news about the translation?
Thanks to Christophe, Antoine, Glyg, HS-157, and 27 other translators!
PLZ HELP
# What did I do?
Crawled [pypi.org](https://pypi.org) to get some Python projects,
cloned their github repo (around 4k repositories at the time of
writing).
Then... played with the data ^^.
## But why?
To answer all those questions a human or a search engine won't be able to answer.
Notes:
- For my README, should I use `rst` or `md`?
- unittest, nose, or pytest?
- ``setup.py``, ``requirements.txt``, ``Pipfile``, ...?
- ...
## Is it data science?
Hell no! It's biased, I only crawled projects published on *pypi.org* AND
hosted on *github.com*, so I'm hitting a very specific subset of the population.
Note:
### I mean
- me: Hey consensus is to use the MIT license!
- you: You crawled only open source projects...
- me: Oh wait...
## Digression
I used Jupyter, if you still don't have tried it, please take
a look at it.
![](static/emergence-jupyterpreview.png)
## Meta-Digression
If you're using Jupyter Notebook, and never tried Jupyter Lab, please try it.
JupyterLab will replace Jupyter notebooks, so maybe start using it.
```bash
pip install jupyterlab
jupyter-lab
```
Notes:
I know you like digressions, so I'm putting digressions in my
digression so I can digress while I digress.
## Meta-Digression
![](https://jupyterlab.readthedocs.io/en/stable/_images/jupyterlab.png)
# 10 years of data
I do not have enough data past this so graphs tends to get messy.
```python
stats = raw_data.loc['2008-01-01':,:].resample('6M')
```
Notes: While Python is 28 years old (older than git, than github, even than Java).
## Digression (again)
I used Pandas, if you never tried it...
![](static/emergence-pandas.png)
Notes:
It's a matrix of scatter plots.
# README files
```python
readmes = (stats['file:README.rst',
'file:README.md',
'file:README',
'file:README.txt'].mean().plot())
```
# README files
![](static/emergence-readme.png)
Notes:
## Consensus
10 years ago, people used ``README`` and ``README.txt``.
It changed around 2011, now we use ``README.md`` and ``README.rst`` files.
``Markdown`` won. I bet for its simplicity, readability, and also
people may know it from elsewhere.
## Consensus
But pypi.python.org don't support Markdown!
Yes, but...
## Consensus
pypi.org does!
```python
long_description_content_type='text/markdown'
```
See:
https://pypi.org/project/markdown-description-example/
So use ``README.md`` files!
# Requirements
```python
setups = stats['file:setup.cfg',
'file:setup.py',
'file:requirements.txt',
'file:Pipfile',
'file:Pipfile.lock'].mean().plot()
```
## Requirements
![](static/emergence-requirements.png)
Notes:
Nothing really interesting here :( We see the rise of Pipfile, but
still can't say much about it...
## Requirements
For dependency managment I've seen a lot of philosophies. and it
really depends on "are you packaging", "is it an app or a library",
## Digression
### The future
PEP 517 and PEP 518
```
[build-system]
requires = ["flit"]
build-backend = "flit.api:main"
```
Notes:
are introducing a way to completly remove
setuptools and distutils from being a requirement, it make them a
choice:
# Tests
```python
tests = (raw_data.groupby('test_engine')
.resample('Y')['test_engine']
.size()
.unstack()
.T
.fillna(0)
.apply(lambda line: 100 * line / float(line.sum()), axis=1)
.plot())
```
## Tests
![](static/emergence-tests.png)
Notes:
## Sorry nose.
# Documentation directory
```python
docs = stats['dir:doc/', 'dir:docs/'].mean().plot()
```
## Documentation directory
![](static/emergence-docs.png)
Note: Some of you are not documenting at all!
Concensus emmerged around 2011 towards **docs/** instead of **doc/**, let's stick to it (please, please, no **Docs/**, I see you, cpython).
# **src/** or not **src/**
```python
src = pd.DataFrame(stats['dir:src/'].mean()).plot()
```
## **src/** or not **src/**
![](static/emergence-src.png)
Notes:
This one was slow, but the concensus is to drop the use of a `src/` directory.
I used it a lot, convinced it would allow me to spot earlier an import bug ("." being in PYTHONPATH but not "src/"). But that's way overkill for a small win.
# **tests/** or **test/**?
<br/>
```python
has_tests = stats['dir:tests/', 'dir:test/', ].mean().plot()
```
## **tests/** or **test/**?
![](static/emergence-testdir.png)
Note: First thing I see... Not everyone is writing tests.
I'm glad the concensus is as for **docs/** and **doc/**, plural clearly wins. I bet it's semantically better, as the folder does not contain a test, but multiple tests.
pyproject.toml: to declare dependencies of your setup.py
# Shebangs
```python
shebangs = (raw_data.loc['2008-01-01':,raw_data.columns
.map(lambda col: col.startswith('shebang:'))].sum())
```
```python
top_shebangs = shebangs.sort_values().tail(4).index
```
```python
shebangs_plot = (raw_data.loc['2008-01-01':, top_shebangs]
.fillna(value=0).resample('6M').mean().plot())
```
## Shebangs
![](static/emergence-shebang.png)
Notes:
I'm glad there's not so much `#!/usr/bin/env python2.7` here.
I'm not sure it's a good idea to specify a version in the shebang, but...
# Licenses
```python
top_licenses = raw_data.groupby('license').size().sort_values().tail(10)
licenses = (raw_data.groupby('license')
.resample('Y')['license']
.size()
.unstack()
.T
.fillna(0)
.loc[:, list(top_licenses.index)]
.apply(lambda line: 100 * line / float(line.sum()), axis=1)
.plot())
```
## Licenses
![](static/emergence-licenses.png)
## Digression
https://choosealicense.com/
# Questions?
<br/><br/>
- julien@python.org
- Twitter @sizeof
- https://mdk.fr